When you are building a configuration management system, one of the things you learn quickly is that you are going to hit a ton of edge cases. Many of these edge cases are the result of applications that were not written with automation in mind. As but one example, many applications do not change their exit code when they have an error – making automation very challenging.
We hit one particularly pernicious edge case recently while developing the Chef service provider for Red Hat. Whenever we would call out to /sbin/service
to start/stop/restart a service, Chef would block forever waiting to read the output of the command. The bug is not unique to Red Hat. Indeed, we saw a similar bug that we fixed in a much less elegant way with Ubuntu’s CouchDB package. This is that bugs’ story.
The setup
We were using a CentOS 5.2 system to develop the provider. For this particular issue, we used /etc/init.d/gpm restart
as our test case – but it appeared to happen with every init script we tried.
Here is a simple test case in Ruby:
#!/usr/bin/ruby
output = IO.popen("/sbin/service gpm restart")
puts "Reading output..."
puts output.read
puts "Would be nice to get here, but never going to happen."
Run that code snippet on any CentOS 5.2 desktop (which will have gpm installed) and you’ll notice that it blocks forever.
When you examine the process table, you’ll notice that the process we spawned (/sbin/service
) has gone Zombie:
root 12586 Z+ 02:29 0:00 [service] <defunct>
For those of you unfamiliar with Zombies: when a process is spawned, its parent is expected to clean up after it.
Typically, this is done by issuing one of the variations on the wait
system call, or catching SIGCHLD
. If the child has exited, but the parent has not done its duty in cleaning up after the child yet, the process is marked as a Zombie. If the parent exits, the child will be inherited by init and reaped immediately.
What’s happening?
When a child is forked, it inherits the file handles of its parent, in particular STDIN
, STDOUT
, and STDERR
. You can see this behavior with this test script:
#!/usr/bin/ruby
STDOUT.puts "I am the parent @ #{Process.pid}"
STDERR.puts "I am the parents stderr"
cid = fork do
STDOUT.puts "I am the child @ #{Process.pid}"
STDERR.puts "I am the childs stderr"
exit 42
end
cid, status = Process.waitpid2(cid)
puts "Child #{cid} had exit status #{status.exitstatus}"
All we are doing here is the simplest fork
– we spawn a new process, and print to STDOUT and STDERR. The output gets to the terminal because the child is using the same file descriptors as the parent. Because we’re a good unix citizen, we are running waitpid2
in our parent – we care about our children!
When dealing with pipes (or file descriptors), its important to remember one thing:
The read end of a pipe will never issue an end of file if a write end is still open
In the case of our errant init script, the issue comes down to this: one of our children has spawned a grandchild, and that grandchild has inherited one or both of our file descriptors, and neither cleans up properly – they fail to close their end of the Pipe.
This leaves the child zombied, because the grand child has neither closed their end of the pipe or exited – and the parent blocks forever on read.
What do we do about it?
Ideally, everyone would be a good unix citizen and clean up after themselves when they spawn children – in practice, they often aren’t. One way to get around this issue would be to send our output to a temporary file, rather than read directly from a Pipe. The child won’t know the difference, and reading files is not the same as having the read end of a pipe open, we can then read the temp file at our leisure, knowing that if our child is dead, its output is in the file.
#!/usr/bin/ruby
require 'tempfile'
pin = IO.pipe
outfile = Tempfile.new("chef-exec")
errfile = Tempfile.new("chef-exec")
cid = fork
if cid
pin.last.close
outfile.close
errfile.close
cid, status = Process.waitpid2(cid)
puts "Child #{cid} has exit status #{status.exitstatus}"
puts IO.read(outfile.path)
puts IO.read(errfile.path)
# Tempfile cleans up automatically when the objects
# go out of scope
else
pin.last.close
STDIN.reopen pin.first
pin.first.close
STDOUT.reopen outfile
outfile.close
STDERR.reopen errfile
errfile.close
exec("/sbin/service gpm restart")
end
While this is straightforward, it is a hack. For every process we spawn, we create a pair of temp files (one for STDOUT
and one for STDIN
)… and, if we need the output, we have to open, read, and close those temp files, then clean up after ourselves. That’s a lot of work for such simple functionality.
A better answer is to flip the order in which we do things just a little bit, and sprinkle a bit of O_NONBLOCK
on our pipes. Avoiding the use of IO.popen
this time (because we need a bit more control than it provides), here is an example that will not block, and still return all the output:
#!/usr/bin/ruby
require 'fcntl'
require 'io/wait'
pin, pout, perr = IO.pipe, IO.pipe, IO.pipe
cid = fork
if cid
[pin.first, pout.last, perr.last].each{ |fd| fd.close }
pin.last.close
pout.first.fcntl(Fcntl::F_SETFL, pout.first.fcntl(Fcntl::F_GETFL) | Fcntl::O_NONBLOCK)
perr.first.fcntl(Fcntl::F_SETFL, perr.first.fcntl(Fcntl::F_GETFL) | Fcntl::O_NONBLOCK)
cid, status = Process.waitpid2(cid)
puts "Child #{cid} has exit status #{status.exitstatus}"
puts pout.first.read if pout.first.ready?
puts perr.first.read if perr.first.ready?
else
pin.last.close
STDIN.reopen pin.first
pin.first.close
pout.first.close
STDOUT.reopen pout.last
pout.last.close
perr.first.close
STDERR.reopen perr.last
perr.last.close
exec("/sbin/service gpm restart")
end
That code can seem a little magical, so here it is step by step:
- Creating new pipes for our child’s
STDIN
,STDOUT
andSTDERR
. - Forking our child, and re-opening those file descriptors with the childs end of the pipes.
- Exec-ing our program.
- In the parent, we close our end of the pipes that duplicate the ones we gave our child.
- We set the
STDOUT
andSTDERR
pipes to be non-blocking. - We wait for our child to exit, implying (through his death) that he has nothing more to say to us on either pipe.
- Reading from the two pipes, assuming they have data for us.
That’s quite a bit more code than just calling IO.popen! But it does not ever cause your application to block just to read the STDOUT
and STDERR
of another program, regardless of how badly behaved its children might be. It also avoids the significant overhead of streaming the output to temporary files just to open, read, and close them again.
I want to give a special thanks to the many kind souls who helped us debug this problem (you know who you are – thank you.) In particular, Benjamin Black endured no end of my griping about “how you maybe couldn’t fix it, and the temp file trick wasn’t that bad, right?” and Artur Bergman pointed us in the direction of changing the order you wait for the child in, which was crucial in making things work.