home :: hacks :: forkftw.txt

RSS 0.91 feeds Subscribe and save over 75% off newsstand price!

car - 9
games - 2
hacks - 21
  flac - 4
  photo - 5
  slim - 3
journal - 153
misc - 5
news - 9
pomona - 2
rants - 6

For the biographers:

2005 - 2006 - 2007 - 2008 - 2009

Now playing

Décollage (Cristobal Paz & Leo di Giusto remix)
from Remixed by Bajofondo
at Monday, December 14, 2009 12:51 AM

How this works

Hot Live Webcam

Yes, really.

Recently from Google

from Richmond CA, searching for squeezebox blackberry plugin

from Seattle WA, searching for michael dickerson girlfriend fan site

How this works

Contact me

mikey@singingtree.com
4171622
mdickers47
mike.dickerson@pomona.edu
pomonamikey
wii number7808 7239 7724 0213
blackberry pin204db8e4

On Notice

  • The sun
  • Librarians
  • Blue
  • Purple
  • David Gray
  • Whirlpool
  • Taco Bell

As seen on The Colbert Report.

This site does not represent my employer.

May look horrible in Internet Explorer.

blosxom logo   Creative Commons License

gotta
knock
a
little
harder

obstrepero.us banner
obstrepero.us mistakes you can learn from

fork and exec ftw

filed under: /hacks

One of the reasons I put off the upgrade of my work computer was that it required me to face the fact that some library named "pyflac," which I used to write flacenstein, has disappeared. Nobody wanted to maintain it, because maintenance is boring, so instead at least two other people have written new and completely different libraries from scratch. This is so typical that it has a name (CADT), and it drives me up the wall.

So I downloaded some other guy's new python tag library, which is not really any less hacky or better documented (writing documentation is boring), but after a few minutes of guessing at how to use it, it occurred to me that you know what, flacenstein also depends on cdparanoia, yet that part has always worked just fine and has never required me to touch it at all.

How is that possible? Because I heeded the advice in the cdparanoia code and never screwed around with libparanoia or any library bindings at all. I just fork and run the cdparanoia process and read its output.

I see this pattern every day in another place. You may have heard of GFS, which is the "Google File System," not unlike Amazon S3. It doesn't actually work like a file system; you can't mount it. You have this enormous and complicated file-like API that changes all the time. Or, you have a command line tool that provides most of the file API. Binding to the file API requires a recompile every couple of weeks, occasional busywork introduced by changes to the API, and many MB of bloat dragged around with your binaries. Even when you "only" have to recompile, this really sucks when you have to ever-so-carefully distribute the new binary to many thousands of machines. Whereas the tool flags change much less often, and it stays up to date for free, because it's Somebody Else's Problem.

So I now advocate os.popen for the win. It screamed "hack" at me when I had been more freshly educated in Professional Computer Science. But you know what, it works just fine and has a number of advantages:

  • The parent-child process interface is old and well understood and well defined. I tell the child what I want it to do by giving it arguments. It tells me how it turned out with its exit code. If it wants to tell me anything else, it writes to stdout or stderr. All of this is trivially easy to handle in any programming language ever written.
  • It gives me fault isolation in a simple and well defined way. No matter how badly the child craps itself, the kernel cleans it up and changes the sheets and politely informs me that my child died on a signal 11. Ever try to use a swigged-up library that has a habit of SIGABORTing itself on errors?
  • The command line interface is stable. Far more than most people's library interfaces. Guess why—because people use the command line interface. Changing flags and such is seen as expensive, because it inflicts usually-pointless pain on the poor ignorant users. I get the same benefit if my automation pretends to be a poor ignorant user.
  • It is dead simple to debug. If any of my programs has trouble, I print the command it tried to run and die (or whatever is appropriate). Copy and paste command into shell, find out why it doesn't work, done.
  • It is similarly easy to explain and test dependencies. Try running this, and this, and this, and if it all works, my script will work for you. Trying to solve this problem for linked code leads to apt-get or /usr/bin/ports. These things are fine when they are set up just right, but getting your code set up just right is a ton of work that I never bother to do. (packaging is boring.)
  • Fork and exec just isn't very expensive. Look at the hash Apache made of it when they tried to eliminate fork in apache2, the better to appeal to the average Windows idiot. It has been 8 years, does anybody use apache2 yet?

Linking is overrated. Fork and exec ftw.

13 Jun 2008 01:59 PT - persistent link - trackback - 1 comment

Copyright © 2005-06 Michael A. Dickerson