dcreager.net

Background processes in redo

A recent conversation with a coworker reminded me of a make replacement named redo. On a lark I decided to update a couple of my projects to use redo — in particular, the repo that builds my personal website from a collection of gemtext files.

redo

While working on new posts, I often like to view them locally before syncing them up to the public-facing web server. Because I use vanity paths and index.html files liberally, I can't just load the HTML files directly via file: URLs. Instead I spin up a lightweight HTTP server (lighttpd) to serve files from the local output directory contain the built site content. And crucially, I have an additional make serve target that spins up the server in the background for me, so that I don't have to remember the magic lighttpd incantation to use. This takes advantage of how lighttpd will daemonize itself as a background process by default: the make target's rule invokes lighttpd, which spins up a background process, but make does not wait for that background process to exit before make itself exits.

lighttpd

When porting that make target over to redo, I discovered that redo handles background jobs differently, in a way that required some additional tweaking. Make is presumably waiting for the shell process that it invokes to exit, but does not wait for any background child processes to exit. (I haven't verified this, but I assume it actually waits for the shell process's entire process group to exit, but lighttpd's daemonization logic would remove the HTTP server process from its parent's process group.)

Redo wasn't doing this—invoking redo serve would fire off the background HTTP server as expected, but then redo would block waiting for the server to complete. I was very confused how this could be happening!

Redo seems to do something more complicated than make. It does call waitpid for each target's subprocess (but not its process group), but only uses that to clean up the child process entry in its process table. That's not the mechanism that it uses to wait for the process to finish. Instead, redo creates a pipe for each job subprocess that it creates, and uses a select to call to wait for the read ends of any subprocess pipe to be readable.

redo job control file descriptors

redo waiting for a target process

Those pipes are never written into, and so they only become “readable” when the write file descriptor is closed. This is similar to a common shutdown notification pattern in Go using channels.

Starting and stopping things with a signal channel

However, Unix pipes add a wrinkle, because file descriptors are inherited by child processes when you fork. You have to close all copies of the pipe's write file descriptor before the select call unblocks anything waiting on the read file descriptor.

And that file descriptor inheritance even carries over to the HTTP server's background process! The server inherits a copy of the write file descriptor, and since lighttpd itself doesn't know anything about it, it never closes it explicitly—it gets closed by default when the process exits, just like all file descriptors do. And since redo's select call won't unblock until every copy of the write fd is closed, redo ends up waiting for the server process to exit.

To get around this, I had to update the serve.do job file to create a subshell where I close all file descriptors other than stdin, stdout, and stderr; and then invoke lighttpd from that subshell. (I can't close the file descriptors in the parent shell, since I do still want redo to wait for the parent shell to finish!) That ensures that the background process does not inherit a copy of the write file descriptor, and therefore that redo will not block waiting for it to exit.

Closing file descriptors in bash

Invoke lighttpd from a subshell