<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>dcreager.net</title>
  <link href="http://dcreager.net/atom.xml" rel="self"/>
  <link href="http://dcreager.net/"/>
  <updated>2010-05-13T14:36:03-04:00</updated>
  <id>http://dcreager.net/</id>
  <author>
    <name>Douglas Creager</name>
    <email>dcreager@dcreager.net</email>
  </author>

  
  <entry>
    <title>Installing Ubuntu Lucid on a PowerPC QEMU virtual machine</title>
    <link href="http://dcreager.net/2010/05/13/powerpc-qemu-lucid/"/>
    <updated>2010-05-13T00:00:00-04:00</updated>
    <id>http://dcreager.net/2010/05/13/powerpc-qemu-lucid</id>
    <content type="html">&lt;p&gt;Part of the software I help develop at &lt;a href='http://www.redjack.com/'&gt;RedJack&lt;/a&gt; needs to be tested on both little-endian and big-endian machines. Little-endian machines are easy, since everyone and their mother is running on a little-endian Intel or AMD x86 chip. It used to be that big-endian was pretty easy to test, too — just break out your trusty Apple Powerbook G4 and you&amp;#8217;re good to go. Since Apple has shifted over to Intel chips, though, the situation has changed.&lt;/p&gt;

&lt;p&gt;Luckily, &lt;a href='http://wiki.qemu.org/'&gt;QEMU&lt;/a&gt; has PowerPC as one of the targets that it can emulate, so in theory, I can still easily test my code on a big-endian machine by creating a QEMU PowerPC virtual machine. There&amp;#8217;s already a writeup about trying to install Debian onto a QEMU VM &lt;a href='http://machine-cycle.blogspot.com/2009/05/running-debian-on-qemu-powerpc.html'&gt;here&lt;/a&gt;. &lt;a href='http://www.aurel32.net/'&gt;Aurélien Jarno&lt;/a&gt; has graciously put together downloadable disk images with Debian preinstalled. If that&amp;#8217;s good enough for your purposes, just go download those! You won&amp;#8217;t need any of the rest of the information on this page.&lt;/p&gt;

&lt;p&gt;Unfortunately, I didn&amp;#8217;t want to run stock Debian; my little-endian build machine is running Ubuntu Lucid, and for consistency, I wanted my big-endian VM to be running the same. As it turns out, this also required a fair dose of masochism on my part. There are several issues that you&amp;#8217;ll encounter if you try to do this by hand. Here is my cheat sheet for getting around these issues.&lt;/p&gt;

&lt;p&gt;Note that this isn&amp;#8217;t a full step-by-step account of how to install Lucid onto a QEMU VM. For now, I&amp;#8217;m just trying to get my notes down into a more permanent form.&lt;/p&gt;

&lt;h2 id='getting_qemu'&gt;Getting QEMU&lt;/h2&gt;

&lt;p&gt;Note that I&amp;#8217;m using Ubuntu Lucid as both the host and the guest OS for this virtual machine; if you&amp;#8217;re running QEMU on a non-Ubuntu host, then you can skip this section.&lt;/p&gt;

&lt;p&gt;It seems that there&amp;#8217;s a bug with the current QEMU packages in Lucid. If you try to run &lt;code&gt;qemu-system-ppc&lt;/code&gt;, you&amp;#8217;ll get an error message about missing the PowerPC BIOS image. Joy.&lt;/p&gt;

&lt;p&gt;Easiest way to get around this is to install QEMU from source. Download the latest version from &lt;a href='http://download.savannah.gnu.org/releases/qemu/'&gt;here&lt;/a&gt;. Once you&amp;#8217;ve unpacked it, use the following to build:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo apt-get build-dep qemu
$ ./configure --prefix=/usr/local \
    --enable-sdl --enable-curses --enable-curl \
    --enable-kvm --enable-nptl --enable-uuid \
    --enable-linux-aio --enable-io-thread \
    --audio-drv-list=alsa
$ make
$ sudo make install&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first command is just an easy way to ensure that all of the prerequisite libraries are installed.&lt;/p&gt;

&lt;h2 id='booting_the_installation_cd'&gt;Booting the installation CD&lt;/h2&gt;

&lt;p&gt;Once you&amp;#8217;ve got a working QEMU installed, you can find the PowerPC Lucid installation CD &lt;a href='http://cdimage.ubuntu.com/ports/releases/10.04/release/'&gt;here&lt;/a&gt;. I&amp;#8217;ve decided to use the server installation CD; I don&amp;#8217;t really need (or want) X windows running in the VM.&lt;/p&gt;

&lt;p&gt;To install this onto a new VM, it should be as simple as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ qemu-img create -f qcow2 ubuntu-ppc.qcow2 10G
$ qemu-system-ppc -m 1024 -hda ubuntu-ppc.qcow2 \
    -cdrom ubuntu-10.04-server-powerpc.iso -boot d&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This links up the Ubuntu installation CD on the VM&amp;#8217;s CD-ROM drive, and uses a new disk image for the primary hard disk. Oh, and we make sure to give the VM enough RAM to do its business — the default is a paltry 128MB.&lt;/p&gt;

&lt;p&gt;Of course, this doesn&amp;#8217;t work — the Lucid installer suffers from the same problem described &lt;a href='http://mac.linux.be/content/ubuntu-810-installer-fails-detect-cd-rom'&gt;here&lt;/a&gt; for the Intrepid installer. Once you get into the installer, the installation program can&amp;#8217;t find the CD-ROM device, and so it can&amp;#8217;t read the installation packages. Unfortunately, the workaround doesn&amp;#8217;t work for Lucid, since it uses a newer Linux kernel that has &lt;a href='http://www.linux.com/archive/feed/33164'&gt;eliminated the &lt;code&gt;ide-scsi&lt;/code&gt; module&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So, what do we do? Well, QEMU also allows us to mount a disk image as a USB removable disk, but it won&amp;#8217;t let us boot from USB. We end up having to mount the disk image twice: Once as a virtual CD, so that we can boot into the installer, and once as a virtual USB disk, so that the installer can find the installation packages. The QEMU command line becomes:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ qemu-system-ppc -m 1024 -hda ubuntu-ppc.qcow2 \
    -cdrom ubuntu-10.04-server-powerpc.iso -boot d \
    -usb -usbdevice disk:ubuntu-10.04-server-powerpc.iso&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You won&amp;#8217;t have to manually load the &lt;code&gt;usb-storage&lt;/code&gt; module; it gets loaded automatically, and places the USB disk at &lt;code&gt;/dev/sda&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You&amp;#8217;ll still get the error message about not finding the CD; when this happens say &amp;#8220;no&amp;#8221; when it asks whether you need to load a module from a removable disk. Say &amp;#8220;yes&amp;#8221; when it asks if you want to choose a module and device manually; choose &amp;#8220;none&amp;#8221; for the module; then type in &lt;code&gt;/dev/sda&lt;/code&gt; as the device location.&lt;/p&gt;

&lt;h2 id='corrupt_package_files_on_cd'&gt;Corrupt package files on CD&lt;/h2&gt;

&lt;p&gt;Right, so now we have to be good, right? We can start QEMU, we can boot into the installer, and the installer can find all of the packages? Nope! There were several corrupted package files on the CD image I downloaded. If this happens to you, you should certainly try re-downloading the image, to take care of any spurious transmission errors. But if you still end up with some corrupted package files, there are ways around it.&lt;/p&gt;

&lt;p&gt;The installer will try to install its initial set of packages using &lt;code&gt;apt-get&lt;/code&gt;. If you encounter problems with these stages, you&amp;#8217;ll see some informative error messages on console 4, which is where the installer&amp;#8217;s log output is sent. You can get there by pressing &lt;em&gt;Alt-F4&lt;/em&gt; in the VM. (As a warning, don&amp;#8217;t try to shift to console 4 without ensuring that QEMU is grabbing the input. In most window managers, &lt;em&gt;Alt-F4&lt;/em&gt; will close the current window, which will just abruptly stop the VM!)&lt;/p&gt;

&lt;p&gt;By the time the installer tries to install packages, the VM&amp;#8217;s hard disk will be partitioned and formatted, and so we can drop into a shell as necessary. To do so, shift over to console 2 using &lt;em&gt;Alt-F2&lt;/em&gt; — again, make sure that QEMU is grabbing all keyboard and mouse input before switching consoles.&lt;/p&gt;

&lt;p&gt;Once you&amp;#8217;re on console 2, you can &lt;code&gt;chroot&lt;/code&gt; into the new system as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;~ $ mount -o /proc /target/proc
~ $ mount -o /sys /target/sys
~ $ mount -o /dev /target/dev
~ $ chroot /target&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;At this point, you&amp;#8217;ll be &amp;#8220;inside&amp;#8221; the new installation system, and can run whatever &lt;code&gt;apt-get&lt;/code&gt; and &lt;code&gt;dpkg&lt;/code&gt; commands are necessary to fix things up.&lt;/p&gt;

&lt;p&gt;Most likely, you&amp;#8217;ll see &amp;#8220;hash sum mismatch&amp;#8221; errors, indicating that a package file is corrupt. You need to download the correct version from the archive at &lt;em&gt;ports.ubuntu.com&lt;/em&gt;. To do this, you&amp;#8217;ll need a copy of &lt;em&gt;wget&lt;/em&gt; installed.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ apt-get install wget
$ wget -nv http://ports.ubuntu.com/pool/main/PATH_TO_DEB&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&amp;#8217;ll see what to use for the &lt;code&gt;PATH_TO_DEB&lt;/code&gt; part in the error message. Once you&amp;#8217;ve downloaded all of the troublesome package files, install them using:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ dpkg -i *.deb
$ apt-get -f install&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then you can go back into the installer (on console 1) and try to repeat the current step.&lt;/p&gt;

&lt;p&gt;Note that things might be broken early enough that you can&amp;#8217;t install &lt;em&gt;wget&lt;/em&gt;. If this is the case, how do you download the non-corrupt package file? Luckily, Python was already installed at that point, so you can use the Python standard library to &lt;a href='http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python'&gt;emulate &lt;em&gt;wget&lt;/em&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ python
&amp;gt;&amp;gt;&amp;gt; import urllib2
&amp;gt;&amp;gt;&amp;gt; pkg = urllib2.urlopen(&amp;quot;http://ports.ubuntu.com/BLAH_BLAH&amp;quot;)
&amp;gt;&amp;gt;&amp;gt; output = open(&amp;quot;BLAH_BLAH.deb&amp;quot;, &amp;quot;wb&amp;quot;)
&amp;gt;&amp;gt;&amp;gt; output.write(pkg.read())
&amp;gt;&amp;gt;&amp;gt; output.close()
&amp;gt;&amp;gt;&amp;gt; ^D&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can then install the package as above.&lt;/p&gt;

&lt;h2 id='installing_a_bootloader'&gt;Installing a bootloader&lt;/h2&gt;

&lt;p&gt;The installer claims that this architecture doesn&amp;#8217;t support a bootloader, so we have to install one by hand. The usual bootloader for PowerPC machines is &lt;code&gt;yaboot&lt;/code&gt;; it&amp;#8217;s fair&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Parser callbacks in libpush, Part 1 — Streams</title>
    <link href="http://dcreager.net/2010/02/25/libpush-callbacks-part-1/"/>
    <updated>2010-02-25T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/02/25/libpush-callbacks-part-1</id>
    <content type="html">&lt;p&gt;This post is the first in a series that describes the &lt;code&gt;push_callback_t&lt;/code&gt; type in the &lt;a href='http://github.com/dcreager/libpush/'&gt;libpush&lt;/a&gt; library. In these posts, we&amp;#8217;ll walk through a couple of possible ways to implement callbacks under the covers. At each stage, we&amp;#8217;ll encounter problems with the current design. Fixing these problems should lead closer us to the actual implementation in libpush, and along the way, we&amp;#8217;ll gain a good understanding of how our design decisions affect the performance and usability of the library.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;push_callback_t&lt;/code&gt; type is used to define &lt;em&gt;parser callbacks&lt;/em&gt;, which are the basic unit of parsing in libpush. Callbacks are pretty simple: they take in an &lt;em&gt;input value&lt;/em&gt;, read some data from the &lt;em&gt;input stream&lt;/em&gt;, and produce an &lt;em&gt;output value&lt;/em&gt;. (The fact that callbacks take in an input value, in addition to reading from the input stream, is what makes them &lt;a href='http://www.haskell.org/arrows/'&gt;&lt;em&gt;arrows&lt;/em&gt;&lt;/a&gt; instead of &lt;a href='http://en.wikipedia.org/wiki/Monad_%28functional_programming%29'&gt;&lt;em&gt;monads&lt;/em&gt;&lt;/a&gt; — but that&amp;#8217;s a story for a later post).&lt;/p&gt;

&lt;h2 id='first_attempt_callbacks_as_functions'&gt;First attempt: Callbacks as functions&lt;/h2&gt;

&lt;p&gt;Now, with this simple structure, we might try to implement callbacks as regular C functions. For instance, we could use something like the following to read in a single 32-bit integer:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='cp'&gt;#include &amp;lt;stdbool.h&amp;gt;&lt;/span&gt;
&lt;span class='cp'&gt;#include &amp;lt;stdint.h&amp;gt;&lt;/span&gt;
&lt;span class='cp'&gt;#include &amp;lt;stdio.h&amp;gt;&lt;/span&gt;

&lt;span class='n'&gt;bool&lt;/span&gt;
&lt;span class='nf'&gt;parse_uint32&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;void&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;input&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;output&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;stream&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='p'&gt;{&lt;/span&gt;
    &lt;span class='kt'&gt;size_t&lt;/span&gt;  &lt;span class='n'&gt;num_read&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;

    &lt;span class='n'&gt;num_read&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;output&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;stream&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
    &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;num_read&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This callback ignores its input value, reads in four bytes from the input stream, and uses that to output a &lt;code&gt;uint32_t&lt;/code&gt; value. The return value of the function is a boolean, indicating whether the parse was successful or not. This lets us handle &lt;em&gt;parse errors&lt;/em&gt; — for instance, if there are only three bytes left in the stream, we can&amp;#8217;t read in a full integer. We return &lt;code&gt;false&lt;/code&gt; to indicate this error condition.&lt;/p&gt;

&lt;p&gt;We&amp;#8217;ve ignored some details here that aren&amp;#8217;t important for this example — for instance, we don&amp;#8217;t worry about the endianness of the integer, nor do we worry about how the space for the output result is allocated. We just assume that someone will pass in a pointer to a &lt;code&gt;uint32_t&lt;/code&gt; variable, and our callback function will store its output value there.&lt;/p&gt;

&lt;h2 id='drawbacks'&gt;Drawbacks&lt;/h2&gt;

&lt;p&gt;This approach works fine for simple cases, but unfortunately has two drawbacks. First, we&amp;#8217;re limited to parsing from &lt;code&gt;FILE&lt;/code&gt; streams. Any real input source will probably be available as a stream, so this might not seem like a huge problem — though it does rule out parsing from a memory buffer, unless you use a non-portable function like &lt;code&gt;fmemopen&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The second, more important, problem is that the parser callback has full control over when and how much to read from the stream. In this example, we try to read in the full four bytes for the &lt;code&gt;uint32_t&lt;/code&gt; output value. However, there might not be four bytes available in the stream. If this is because we&amp;#8217;re at the end of a file, then we should treat this as a parse error. If we&amp;#8217;re reading from a network socket, though, another chunk of data might arrive if we wait for a bit.&lt;/p&gt;

&lt;p&gt;We could add logic to the callback to read from the stream repeatedly until we got enough data, but then we&amp;#8217;ll start &lt;em&gt;blocking&lt;/em&gt; — so that we can distinguish between &amp;#8220;there&amp;#8217;s no more data here &lt;em&gt;yet&lt;/em&gt;&amp;#8221; from &amp;#8220;there&amp;#8217;s no more data coming &lt;em&gt;at all&lt;/em&gt;&amp;#8221;.&lt;/p&gt;

&lt;p&gt;All of this is bad news. First of all, this extra I/O logic is starting to get rather big, and we don&amp;#8217;t want each and every callback to have to include it. And second, we don&amp;#8217;t want the rest of our program to be held hostage by the callback — it should be up to our I/O code to decide whether it&amp;#8217;s okay to block waiting for more input, or whether to whip up a nice &lt;code&gt;select&lt;/code&gt; loop of some kind to read things more efficiently.&lt;/p&gt;

&lt;p&gt;In the next post, we&amp;#8217;ll describe &lt;em&gt;iteratees&lt;/em&gt;, which give us this capability.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Using LLVM's link-time optimization on Ubuntu Karmic</title>
    <link href="http://dcreager.net/2010/02/17/llvm-lto-karmic/"/>
    <updated>2010-02-17T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/02/17/llvm-lto-karmic</id>
    <content type="html">&lt;p&gt;While playing around with &lt;a href='http://github.com/dcreager/libpush'&gt;libpush&lt;/a&gt; on my MacBook, I was pleasantly surprised to see a huge performance increase when I used the link-time optimization (LTO) feature of the LLVM GCC front end. (It&amp;#8217;s really quite nifty; the new &lt;a href='http://github.com/mxcl/homebrew'&gt;Homebrew package manager&lt;/a&gt; uses it by default when compiling packages.) On MacOS, using LTO is as simple as using &lt;code&gt;llvm-gcc&lt;/code&gt; as your C compiler (or &lt;code&gt;llvm-g++&lt;/code&gt; if you&amp;#8217;re compiling C++), and passing in &lt;code&gt;-O4&lt;/code&gt; as your optimization flag. I use SCons as my builder, so this turns into:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ scons CC=llvm-gcc CCFLAGS=-O4&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This will cause GCC to output LLVM bytecode into the &lt;em&gt;.o&lt;/em&gt; output files, and to perform whole-program optimizations during each linking phase. I was able to see a big performance win simply from the linker being able to inline in copies of small functions that live in “other” compilation units.&lt;/p&gt;

&lt;h2 id='good_news_and_bad_news'&gt;Good news and bad news&lt;/h2&gt;

&lt;p&gt;Intrigued by the results, I wanted to try the same thing on my Linux boxes, which are running Ubuntu Karmic. On the Mac, Apple has made sure to include support for LLVM in all of the standard Xcode build tools. On Linux, you don&amp;#8217;t get this by default right now — though GCC is implementing their own LTO project, which is starting to bear fruit. Part of this is a new “&lt;code&gt;gold&lt;/code&gt;” linker, which supports a plugin architecture. How is this useful to us? Well, LLVM already has a &lt;a href='http://llvm.org/docs/GoldPlugin.html'&gt;plugin&lt;/a&gt; for the new linker, so with everything installed correctly, getting LTO through LLVM on Linux can be just as simple as it was on the Mac.&lt;/p&gt;

&lt;p&gt;Unfortunately, these new tools have only partially made it into the Ubuntu package tree. You can get the new &lt;code&gt;gold&lt;/code&gt; linker by installing the &lt;code&gt;binutils-gold&lt;/code&gt; package, and you can get most of the LLVM pieces by installing the &lt;code&gt;llvm&lt;/code&gt; and &lt;code&gt;llvm-gcc-4.2&lt;/code&gt; packages. Unfortunately, this doesn&amp;#8217;t include the LLVM &lt;code&gt;gold&lt;/code&gt; plugin or the new &lt;code&gt;clang&lt;/code&gt; C/C++ compiler front-end. Things look promising for these features being in the new Lucid packages — which could even lead to a Karmic backport — but for now, if we want the &lt;code&gt;gold&lt;/code&gt; plugin, we have to compile ourselves.&lt;/p&gt;

&lt;h2 id='getting_the_prerequisites'&gt;Getting the prerequisites&lt;/h2&gt;

&lt;p&gt;As mentioned on the LLVM &lt;a href='http://llvm.org/docs/GoldPlugin.html'&gt;linker plugin page&lt;/a&gt;, you need to have the &lt;code&gt;binutils&lt;/code&gt; source lying around somewhere if you want to compile the plugin, since the LLVM source needs to read in &lt;code&gt;binutils&lt;/code&gt;&amp;#8217;s &lt;em&gt;plugin-api.h&lt;/em&gt; file. The easiest way for us to get the &lt;code&gt;binutils&lt;/code&gt; source is using APT:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mkdir -p $HOME/deb
$ cd $HOME/deb
$ apt-get source binutils&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This will place an unpacked copy of the &lt;code&gt;binutils&lt;/code&gt; source into &lt;em&gt;$HOME/deb/binutils-2.20&lt;/em&gt; for you.&lt;/p&gt;

&lt;p&gt;We can also go ahead and install the &lt;code&gt;gold&lt;/code&gt; linker:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo apt-get install binutils-gold&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You&amp;#8217;ll also need to make sure you&amp;#8217;ve got the basic compilation tools installed (though if you&amp;#8217;re at the point where you&amp;#8217;re trying to play around with LTO, I&amp;#8217;ve got to assume you&amp;#8217;ve already taken care of this&amp;#8230;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo apt-get install build-essential&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally, my main Linux box is 64-bit, so I need to install multilib support before we can compile the LLVM GCC front end:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo apt-get install gcc-multilib&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id='compiling_llvm'&gt;Compiling LLVM&lt;/h2&gt;

&lt;p&gt;With all of the prerequisites installed, we can download and unpack LLVM:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mkdir -p $HOME/tmp
$ cd $HOME/tmp
$ wget http://llvm.org/releases/2.6/llvm-2.6.tar.gz
$ wget http://llvm.org/releases/2.6/clang-2.6.tar.gz

$ tar xzvf llvm-2.6.tar.gz
$ tar xzvf clang-2.6.tar.gz&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;clang&lt;/code&gt; is distributed as a separate download, but we actually want to place it into the main LLVM directory; the LLVM build scripts will find it and build it automatically:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mv clang-2.6 llvm-2.6/tools/clang&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;At this point we can do the usual compilation steps:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cd llvm-2.6
$ ./configure \
    --with-binutils-include=$HOME/deb/binutils-2.20/include \
    --enable-optimized \
    --prefix=/usr/local
$ make
$ sudo make install
$ sudo ldconfig&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice how we&amp;#8217;re going to install everything into &lt;em&gt;/usr/local&lt;/em&gt;, so as not to step on the toes of the package manager. This means we have to run &lt;code&gt;ldconfig&lt;/code&gt; so that the system linker knows about the new libraries we just put in &lt;em&gt;/usr/local/lib&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id='compiling_llvmgcc'&gt;Compiling LLVM-GCC&lt;/h2&gt;

&lt;p&gt;At this point, we have the &lt;code&gt;gold&lt;/code&gt; linker installed, and have a copy of LLVM that includes its &lt;code&gt;gold&lt;/code&gt; plugin. Ideally, we could start compiling with &lt;code&gt;clang&lt;/code&gt; and get LTO, but it doesn&amp;#8217;t seem like there&amp;#8217;s currently a way to have &lt;code&gt;clang&lt;/code&gt; pass in the necessary &lt;code&gt;--plugin&lt;/code&gt; option to the linker. So, all we need now is the GCC front end.&lt;/p&gt;

&lt;p&gt;As before, we start by downloading and unpacking:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cd $HOME/tmp
$ wget http://llvm.org/releases/2.6/llvm-gcc-4.2-2.6.source.tar.gz
$ tar xzvf llvm-gcc-4.2-2.6.source.tar.gz&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;em&gt;README.LLVM&lt;/em&gt; file in the source tree gives more detail on the options you have available; for me, the following worked:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ mkdir -p $HOME/tmp/obj
$ cd $HOME/tmp/obj
$ ../llvm-gcc-4.2-2.6.source/configure \
    --prefix=/usr/local \
    --program-prefix=llvm- \
    --enable-llvm=$HOME/tmp/llvm-2.6 \
    --enable-languages=c,c++
$ make
$ sudo make install
$ sudo ldconfig&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The only interesting wrinkle is that we have to do an out-of-source build — the object files will end up in the &lt;em&gt;$HOME/tmp/obj&lt;/em&gt; directory, rather than being created directly in the unpacked source directory.&lt;/p&gt;

&lt;p&gt;As this point we&amp;#8217;re nearly there; we have &lt;code&gt;llvm-gcc&lt;/code&gt; installed, but its &lt;code&gt;-use-gold-plugin&lt;/code&gt; option won&amp;#8217;t work just yet. If you look closely at one sentence on the &lt;a href='http://llvm.org/docs/GoldPlugin.html'&gt;LLVM plugin page&lt;/a&gt;, you&amp;#8217;ll see that the option “looks for the &lt;code&gt;gold&lt;/code&gt; plugin in the same directories as it looks for &lt;code&gt;cc1&lt;/code&gt;”. The LLVM GCC package installed the &lt;code&gt;cc1&lt;/code&gt; program into the &lt;em&gt;/usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1&lt;/em&gt; directory. (The &lt;em&gt;x86_64&lt;/em&gt; will be different if you&amp;#8217;re on a different architecture.) However, the LLVM plugin is in &lt;em&gt;/usr/local/lib&lt;/em&gt;. If you try to use the &lt;code&gt;-use-gold-plugin&lt;/code&gt; parameter, you&amp;#8217;ll get the following error message:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ llvm-gcc -use-gold-plugin \
    -o foo.o -c -O4 -g -Wall -Werror foo.c
llvm-gcc: -use-gold-plugin, but libLLVMgold.so not found.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Not good. The solution (which is admittedly a bit of a hack) is to copy the plugin into the directory that &lt;code&gt;llvm-gcc&lt;/code&gt; expects to find it in:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo cp /usr/local/lib/libLLVMgold.so \
    /usr/local/libexec/gcc/x86_64-unknown-linux-gnu/4.2.1&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id='using_your_new_toy'&gt;Using your new toy&lt;/h2&gt;

&lt;p&gt;Now that we&amp;#8217;ve got all of the pieces installed, you can create libraries and executables that are optimized at link time. The “Quickstart” section at the end of the &lt;a href='http://llvm.org/docs/GoldPlugin.html'&gt;LLVM plugin page&lt;/a&gt; gives you the outline. I use SCons as my build tool, so I have to run the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ scons \
    CC=&amp;quot;llvm-gcc -use-gold-plugin&amp;quot; \
    AR=&amp;quot;ar --plugin libLLVMgold.so&amp;quot; \
    RANLIB=/bin/true \
    CCFLAGS=-O4&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is slightly more than what&amp;#8217;s needed on the Mac, but all in all, not bad. Enjoy!&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Extracting setuptools version numbers from your git repository</title>
    <link href="http://dcreager.net/2010/02/10/setuptools-git-version-numbers/"/>
    <updated>2010-02-10T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/02/10/setuptools-git-version-numbers</id>
    <content type="html">&lt;p&gt;Just like everyone else, we&amp;#8217;re using &lt;a href='http://pypi.python.org/pypi/setuptools'&gt;setuptools&lt;/a&gt; as the core of the build system for our Python-based projects. For the most part, this has been a painless, straightforward process. However, one lingering annoyance is that we&amp;#8217;ve been specifying the version number directly in our &lt;em&gt;setup.py&lt;/em&gt; files:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;from&lt;/span&gt; &lt;span class='nn'&gt;setuptools&lt;/span&gt; &lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='n'&gt;setup&lt;/span&gt;

&lt;span class='n'&gt;setup&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;
    &lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;awesomelib&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='n'&gt;version&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;1.2&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
    &lt;span class='c'&gt;# ...etc&lt;/span&gt;
&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;On our maintenance branches, we get a nice &lt;em&gt;awesomelib-1.2.tar.gz&lt;/em&gt; file when we run &lt;code&gt;python setup.py sdist&lt;/code&gt;. On our development branch, we&amp;#8217;ve also got the following &lt;em&gt;setup.cfg&lt;/em&gt; file:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='ini'&gt;&lt;span class='k'&gt;[egg_info]&lt;/span&gt;
&lt;span class='na'&gt;tag_build&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;dev&lt;/span&gt;
&lt;span class='na'&gt;tag_date&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;That gives us tarballs like &lt;em&gt;awesomelib-1.2dev-20100210.tar.gz&lt;/em&gt; on our development branch. Because we&amp;#8217;re using the &lt;code&gt;dev&lt;/code&gt; suffix, which setuptools considers to be a &amp;#8220;prerelease&amp;#8221;, we have to remember to increment the version number in development whenever we cut a new release. The end result is that we have a longish process for creating releases. If we want to create a new 1.3 release, we have to do the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create a new maintenance branch for 1.3:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ git checkout -b maint-1.3 master&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Update the &lt;em&gt;setup.cfg&lt;/em&gt; file to remove the &lt;code&gt;tag_build&lt;/code&gt; and &lt;code&gt;tag_date&lt;/code&gt; entries. Commit this with a &amp;#8220;Tagging version 1.3&amp;#8221; commit message.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Back on the development branch, update &lt;em&gt;setup.py&lt;/em&gt; to increment the &amp;#8220;development version&amp;#8221; to 1.4.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Granted, this isn&amp;#8217;t horribly difficult, but we can do better.&lt;/p&gt;

&lt;h2 id='calculating_the_version_automatically'&gt;Calculating the version automatically&lt;/h2&gt;

&lt;p&gt;Taking a page from the &lt;a href='http://git.kernel.org/?p=git/git.git;a=blob;f=GIT-VERSION-GEN'&gt;&lt;em&gt;GIT-VERSION-GEN&lt;/em&gt;&lt;/a&gt; script in git&amp;#8217;s source code, we&amp;#8217;re going to use the &lt;code&gt;git describe&lt;/code&gt; command to automatically generate the version number.&lt;/p&gt;

&lt;p&gt;Our logic is implemented in a new &lt;code&gt;get_git_version()&lt;/code&gt; Python function, which you can call directly from your &lt;em&gt;setup.py&lt;/em&gt; scripts. You can find the source code in a &lt;a href='http://gist.github.com/300803'&gt;Github gist&lt;/a&gt;. Our basic strategy is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First, try to use &lt;code&gt;git describe&lt;/code&gt; to create a version number.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;If this fails, then we&amp;#8217;re most likely not in a git working copy. Probably, someone downloaded a release tarball and unpacked it, and we&amp;#8217;re running inside of there. In this case, &lt;code&gt;git describe&lt;/code&gt; can&amp;#8217;t give us a version number. Instead, we&amp;#8217;re going to make sure we include a &lt;em&gt;RELEASE-VERSION&lt;/em&gt; file in every tarball that we create. So, if &lt;code&gt;git describe&lt;/code&gt; fails, we fall back on the contents of this file as our version number.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id='tag_names_as_version_numbers'&gt;Tag names as version numbers&lt;/h3&gt;

&lt;p&gt;One thing to notice about this strategy is that we use the output of &lt;code&gt;git describe&lt;/code&gt; directly as our version number. This means that our tag names should be simple version numbers, without decoration. To create the awesomelib 1.3 release from our example, we&amp;#8217;d just do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ git tag -s 1.3&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Note that the tag needs to be an annotated or signed tag in order to be picked up by &lt;code&gt;git describe&lt;/code&gt;.)&lt;/p&gt;

&lt;p&gt;On our development branch, once we&amp;#8217;ve created new commits on top of the release point, we&amp;#8217;ll start getting output like this from &lt;code&gt;git
describe&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1.3-4-g6f32&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a valid setuptools &amp;#8220;postrelease&amp;#8221; — setuptools will consider this to be a more recent version than &lt;code&gt;1.3&lt;/code&gt;, which is exactly what we want. This eliminates the need to maintain different &lt;em&gt;setup.cfg&lt;/em&gt; files in our development and maintenance branches.&lt;/p&gt;

&lt;h3 id='getting_the_version_number_of_a_distribution_tarball'&gt;Getting the version number of a distribution tarball&lt;/h3&gt;

&lt;p&gt;Another thing to notice is that we need to maintain a &lt;em&gt;RELEASE-VERSION&lt;/em&gt; file, ensuring that it always contains the current version, and always including it when we create any source packages. That way, we can still get the current version number, even if we can&amp;#8217;t get it from &lt;code&gt;git describe&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To keep the &lt;em&gt;RELEASE-VERSION&lt;/em&gt; file up-to-date, the &lt;code&gt;get_git_version()&lt;/code&gt; function always read in the current contents of the file as its first step. If the output of &lt;code&gt;git describe&lt;/code&gt; differs from what&amp;#8217;s in the file, we update the file with the new output before returning the version.&lt;/p&gt;

&lt;p&gt;This ensures that the file has the right contents, but we also have to make sure we include it in our source packages. To do this, we simply add the following line to our &lt;em&gt;MANIFEST.in&lt;/em&gt; file (creating it if necessary):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;include RELEASE-VERSION&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(Note that we don&amp;#8217;t want the &lt;em&gt;RELEASE-VERSION&lt;/em&gt; file to be checked into the git repository, so we also add it to the top-level &lt;em&gt;.gitignore&lt;/em&gt; file.)&lt;/p&gt;

&lt;h2 id='the_simpler_release_process'&gt;The simpler release process&lt;/h2&gt;

&lt;p&gt;With this script, our release process is now much simpler:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Create a maintenance branch if you want to.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Create a signed or annotated tag, whose name is the new version number.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most importantly, no extra commits are needed, since we don&amp;#8217;t have to edit any version numbers or maintain different &lt;em&gt;setup.cfg&lt;/em&gt; files.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>A combinator-based parsing library for C</title>
    <link href="http://dcreager.net/2010/02/06/libpush/"/>
    <updated>2010-02-06T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/02/06/libpush</id>
    <content type="html">&lt;p&gt;Recently I&amp;#8217;ve been working on &lt;a href='http://github.com/dcreager/libpush/'&gt;libpush&lt;/a&gt;, which a new parsing library for C. It has two main features that I think will be valuable: it&amp;#8217;s a &lt;em&gt;push parser&lt;/em&gt;, which means that instead of parsing a file, stream, or single memory buffer, you supply the data (or &amp;#8220;push&amp;#8221; it) to the parser in chunks, as it becomes available. I plan to discuss this aspect of the parser in more detail in a later post.&lt;/p&gt;

&lt;p&gt;The other main feature is that you design your parsers using &lt;em&gt;combinators&lt;/em&gt;. Parser combinators are widely used in Haskell, with &lt;a href='http://legacy.cs.uu.nl/daan/parsec.html'&gt;Parsec&lt;/a&gt; being the most common example. Combinator-based parsing libraries are especially nice in Haskell, because Haskell&amp;#8217;s syntax makes them look very simple. For instance, a parser that parses matching nested parentheses is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='haskell'&gt;&lt;span class='nf'&gt;parens&lt;/span&gt; &lt;span class='ow'&gt;::&lt;/span&gt; &lt;span class='kt'&gt;Parser&lt;/span&gt; &lt;span class='nb'&gt;()&lt;/span&gt;
&lt;span class='nf'&gt;parens&lt;/span&gt; &lt;span class='ow'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;char&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;(&amp;#39;&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;parens&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;char&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;)&amp;#39;&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;parens&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;|&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;return&lt;/span&gt; &lt;span class='nb'&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Here, the &lt;code&gt;&amp;lt;|&amp;gt;&lt;/code&gt; operator represents &lt;em&gt;choice&lt;/em&gt;: we try parsing the left operand, and if it fails, then we try the right operand. In our example, the right operand is the base case, which matches the empty string. The left operand parses an opening parenthesis; then recursively calls itself to match any parentheses that might be nested in the current set; then parses the closing parenthesis; and then finally tries to match a nested set that occurs after the current set.&lt;/p&gt;

&lt;p&gt;When we say that this is a combinator-based parser, we mean that it&amp;#8217;s implemented by taking &lt;em&gt;primitive parsers&lt;/em&gt; — in this case &lt;code&gt;char &amp;#39;(&amp;#39;&lt;/code&gt; and &lt;code&gt;return ()&lt;/code&gt; — and combining them into more complex parsers using generic operators like &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;|&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now, in order to be able to use combinators like this, parsers have to be first-class objects in your language. In the Haskell code, the parsers are represented by the &lt;code&gt;Parser ()&lt;/code&gt; type. In most Haskell parsing libraries (including Parsec), the parser type is implemented as a &lt;a href='http://en.wikipedia.org/wiki/Monad_%28functional_programming%29'&gt;&lt;em&gt;monad&lt;/em&gt;&lt;/a&gt;. Monads have a reputation for being a horribly complex topic, but in this case, we don&amp;#8217;t really need to learn about the underlying math. Instead, we can just view the monad as letting us do two things concisely:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Parsers can return a value, which could (for instance) be the abstract syntax tree that you&amp;#8217;re building up while parsing your language. The monadic bind operator (&lt;code&gt;&amp;gt;&amp;gt;=&lt;/code&gt;) gives you a way to &amp;#8220;pass&amp;#8221; these values between parsers, if needed.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Simultaneously, the parser monad maintains the state of the stream you&amp;#8217;re parsing from, keeping track of how many bytes remain, whether there&amp;#8217;s an error condition, and possibly a nice human-readable description (line and column) of the current location.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is admittedly a lot of setup; we&amp;#8217;ve been talking a lot about Haskell in a post that&amp;#8217;s ostensibly describing a C library. But hopefully, this gives you a taste for the kinds of features we want to support in libpush:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Parsers will be represented by a C type. In libpush, this is the &lt;code&gt;push_callback_t&lt;/code&gt; type.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;There will be several primitive parsers; these will be functions that return a &lt;code&gt;push_callback_t&lt;/code&gt;. The functions can take in parameters, but none of the parameters will be a &lt;code&gt;push_callback_t&lt;/code&gt;. (See the &lt;code&gt;char&lt;/code&gt; primitive from above; it needed to take in the particular character that is expected.)&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;There will be several combinators; these will be functions that return a &lt;code&gt;push_callback_t&lt;/code&gt;, and take in other &lt;code&gt;push_callback_t&lt;/code&gt;s as parameters.&lt;/p&gt;

&lt;p&gt;You can see several of these primitives and combinators in action in the &lt;a href='http://github.com/dcreager/libpush/'&gt;libpush Github repository&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;We will use something like a monad to take care of passing values between our parsers, and for keeping track of the state of the underlying stream. I say &amp;#8220;something like a monad&amp;#8221;, because, unlike the Parsec library, the libpush parser type will &lt;em&gt;not&lt;/em&gt; be implemented as a monad; in turns out that C is more amenable to implementing them as &lt;a href='http://www.haskell.org/arrows/'&gt;&lt;em&gt;arrows&lt;/em&gt;&lt;/a&gt;. In a later post, I&amp;#8217;ll explain what this means in terms of writing your own parsers, or for building them up from combinators.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content>
  </entry>
  
  <entry>
    <title>Updating graffle-export to work with OmniGraffle 5</title>
    <link href="http://dcreager.net/2010/02/05/omnigraffle-5-export/"/>
    <updated>2010-02-05T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/02/05/omnigraffle-5-export</id>
    <content type="html">&lt;p&gt;I recently upgraded to OmniGraffle 5, which caused my &lt;a href='http://github.com/dcreager/graffle-export/'&gt;graffle-export&lt;/a&gt; script to break:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ graffle.sh ~/git/cwa/figures/analyst.graffle foo.pdf 
OmniGraffle Professional 5
/Users/dcreager/git/cwa/figures/analyst.graffle
./graffle.scpt: execution error: OmniGraffle Professional 5 got an error: The document cannot be exported to the &amp;quot;pdf&amp;quot; format. (-50)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(This was first reported to me by Nima Talebi as &lt;a href='http://github.com/dcreager/graffle-export/issues/issue/1'&gt;a ticket&lt;/a&gt; on graffle-export&amp;#8217;s Github page.)&lt;/p&gt;

&lt;p&gt;Before we can understand what error we&amp;#8217;re seeing, a little explanation is in order. The core logic of the OmniGraffle exporter is an AppleScript. Unfortunately, AppleScripts are stored in a binary format, so if you go to the Github page, you can&amp;#8217;t easily view the contents of the file. The important line of the script is:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='applescript'&gt;&lt;span class='nv'&gt;save&lt;/span&gt; &lt;span class='nv'&gt;doc&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;format&lt;/span&gt; &lt;span class='k'&gt;in&lt;/span&gt; &lt;span class='nv'&gt;file&lt;/span&gt; &lt;span class='nv'&gt;output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This saves an OmniGraffle document (which an earlier part of the script makes sure is open) into a new output file. The &lt;code&gt;output&lt;/code&gt; variable is the name of the desired output file, and is taken directly from the &lt;em&gt;graffle.sh&lt;/em&gt; command line.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;format&lt;/code&gt; variable is what&amp;#8217;s causing us problems. This parameter to the &lt;code&gt;save&lt;/code&gt; command tells OmniGraffle what format to use for the file it&amp;#8217;s about to save. This is how we get our export functionality; we just give it the name of one of the export formats that it supports. The value of our &lt;code&gt;format&lt;/code&gt; variable comes from the optional first parameter to the &lt;em&gt;graffle.sh&lt;/em&gt; script. Previously, if no value was specified, I used &amp;#8221;&lt;code&gt;pdf&lt;/code&gt;&amp;#8221; as a default.&lt;/p&gt;

&lt;p&gt;Now, there&amp;#8217;s no real documentation that I&amp;#8217;ve been able to find out what values are allowed for this parameter. I came across &lt;code&gt;pdf&lt;/code&gt; simply by trial and error. &amp;#8221;&lt;code&gt;PDF&lt;/code&gt;&amp;#8221; also seems to work, as does &amp;#8221;&lt;code&gt;PDF
vector image&lt;/code&gt;&amp;#8221;, which is the text that appears in the Format entry of OmniGraffle&amp;#8217;s Export dialog box.&lt;/p&gt;

&lt;p&gt;Or, to be more accurate, I should say that these values all work &lt;strong&gt;in OmniGraffle 4&lt;/strong&gt;. Once you upgrade to version 5, these values no longer seem to be valid choices for that parameter of the &lt;code&gt;save&lt;/code&gt; command — hence the error message. A quick, non-exhaustive test shows that none of these variations work for EPS or SVG, either. The only one that seems to still work is PNG.&lt;/p&gt;

&lt;p&gt;So, what are we to do? After looking at several other related AppleScripts on the web, it seems that the &lt;code&gt;as&lt;/code&gt; parameter of the &lt;code&gt;save&lt;/code&gt; command is optional. After some experimentation, it turns out that if you leave out this parameter, OmniGraffle tries to deduce the correct output format based on the extension of your output filename. So, we change our &lt;code&gt;save&lt;/code&gt; command to the following:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='applescript'&gt;&lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='nv'&gt;format&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s2'&gt;&amp;quot;&amp;quot;&lt;/span&gt; &lt;span class='k'&gt;then&lt;/span&gt;
  &lt;span class='nv'&gt;save&lt;/span&gt; &lt;span class='nv'&gt;doc&lt;/span&gt; &lt;span class='k'&gt;in&lt;/span&gt; &lt;span class='nv'&gt;file&lt;/span&gt; &lt;span class='nv'&gt;output&lt;/span&gt;
&lt;span class='k'&gt;else&lt;/span&gt;
  &lt;span class='nv'&gt;save&lt;/span&gt; &lt;span class='nv'&gt;doc&lt;/span&gt; &lt;span class='k'&gt;as&lt;/span&gt; &lt;span class='nv'&gt;format&lt;/span&gt; &lt;span class='k'&gt;in&lt;/span&gt; &lt;span class='nv'&gt;file&lt;/span&gt; &lt;span class='nv'&gt;output&lt;/span&gt;
&lt;span class='k'&gt;end&lt;/span&gt; &lt;span class='k'&gt;if&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;(We also have to modify the &lt;em&gt;graffle.sh&lt;/em&gt; wrapper script to not use &lt;code&gt;pdf&lt;/code&gt; as a default, but you can see that change &lt;a href='http://github.com/dcreager/graffle-export/commit/b605b461a29b73ab4c21bd42b48549bd8bad8fcc'&gt;on Github&lt;/a&gt;.) This lets us export a PDF version of a &lt;em&gt;.graffle&lt;/em&gt; file by giving an output filename ending in &lt;em&gt;.pdf&lt;/em&gt;, and leaving out the format parameter.&lt;/p&gt;

&lt;p&gt;I still have my old copy of OmniGraffle 4, and it looks like this trick works with that version as well. So, this is now the default behavior, regardless of which version you have installed.&lt;/p&gt;

&lt;p&gt;It would be nice if there was an accurate list of what values were allowed for the &lt;code&gt;as&lt;/code&gt; parameter, but we do have a working solution, at least. The only problem is if you want to export a PDF with a different extension; with this solution, you&amp;#8217;d have to export to a &lt;em&gt;.pdf&lt;/em&gt; file and then rename it to your new extension. But then again, why would you want to do that?&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Default “scons -c” targets</title>
    <link href="http://dcreager.net/2010/01/08/default-scons-clean-targets/"/>
    <updated>2010-01-08T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/01/08/default-scons-clean-targets</id>
    <content type="html">&lt;p&gt;As I mentioned in a &lt;a href='/2009/12/18/make-distclean-in-scons/'&gt;previous post&lt;/a&gt;, the automatic “clean” target provided by SCons (&lt;code&gt;scons -c&lt;/code&gt;) is very useful for cleaning out build files, without requiring much in the way of configuration. Anything that SCons generates when you run &lt;code&gt;scons&lt;/code&gt; will be automatically cleaned when you run &lt;code&gt;scons -c&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;While useful, I&amp;#8217;d like more control over the behavior of &lt;code&gt;scons -c&lt;/code&gt;. Specifically, being a good TDD junkie, I have several test cases that I can run using &lt;code&gt;scons test&lt;/code&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='n'&gt;build_test&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Program&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt; &lt;span class='o'&gt;...&lt;/span&gt; &lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Alias&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;build-tests&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;build_test&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt; 
&lt;span class='lineno'&gt;4&lt;/span&gt; &lt;span class='n'&gt;run_test&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Alias&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;test&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;build_test&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt;                      &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;@&lt;/span&gt;&lt;span class='si'&gt;%s&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt; &lt;span class='o'&gt;%&lt;/span&gt; &lt;span class='n'&gt;build_test&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;abspath&lt;/span&gt;&lt;span class='p'&gt;])&lt;/span&gt;
&lt;span class='lineno'&gt;6&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;AlwaysBuild&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;run_test&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;By setting it up this way, the test programs aren&amp;#8217;t built by default: you have to explicitly run &lt;code&gt;scons build-tests&lt;/code&gt; (if you want to build the tests but not run them) or &lt;code&gt;scons test&lt;/code&gt; (if you want to build and run them). Moreover, because of SCons&amp;#8217;s dependency tracking, I can just use &lt;code&gt;scons test&lt;/code&gt; as my usual build command during my Edit-Test-Debug loop. SCons will automatically rebuild any changed source files before running the tests.&lt;/p&gt;

&lt;p&gt;All of this is great. So what&amp;#8217;s the problem? As I mentioned above, &lt;code&gt;scons -c&lt;/code&gt; only cleans the build files that are created by &lt;code&gt;scons&lt;/code&gt; — and since I&amp;#8217;ve explicitly set things up so that tests aren&amp;#8217;t &lt;em&gt;built&lt;/em&gt; by default, they&amp;#8217;ll also not be &lt;em&gt;cleaned&lt;/em&gt; by default. This means that to fully clean out my build targets, I have to run two commands:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ scons -c
$ scons -c build-tests&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Not ideal! I&amp;#8217;d prefer if &lt;code&gt;scons -c&lt;/code&gt; cleaned everything, just like &lt;code&gt;make clean&lt;/code&gt; would in the Automake world.&lt;/p&gt;

&lt;h2 id='the_solution'&gt;The solution&lt;/h2&gt;

&lt;p&gt;So how to fix this? First we need to understand how SCons decides what to clean when you run &lt;code&gt;scons -c&lt;/code&gt;. The answer is “exactly what&amp;#8217;s built by &lt;code&gt;scons&lt;/code&gt;”. And how does SCons decide what to build when you run &lt;code&gt;scons&lt;/code&gt;? That&amp;#8217;s what the &lt;code&gt;Default&lt;/code&gt; command is for.&lt;/p&gt;

&lt;p&gt;For instance, I could add&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Default&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;build-tests&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;to my &lt;em&gt;SConstruct&lt;/em&gt; file. This would cause all of my tests to be built by default, and by extension, to have them all cleaned by default, as well.&lt;/p&gt;

&lt;p&gt;This is close, since &lt;code&gt;scons -c&lt;/code&gt; now does what we want, but this means that &lt;code&gt;scons&lt;/code&gt; is now building more than we would like. What we need is a way to have a different list of default targets depending on whether we&amp;#8217;re building or cleaning. Luckily, the &lt;code&gt;GetOption&lt;/code&gt; function gives us exactly that:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;GetOption&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;clean&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt;     &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Default&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;build-tests&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;With this in our &lt;em&gt;SConstruct&lt;/em&gt; file, the tests will be considered a default target when we&amp;#8217;re cleaning, but not when we&amp;#8217;re building. So now we have what we want: &lt;code&gt;scons&lt;/code&gt; builds just the code, &lt;code&gt;scons test&lt;/code&gt; builds and runs the tests, and &lt;code&gt;scons -c&lt;/code&gt; cleans it all.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Exporting OmniGraffle documents from the command line</title>
    <link href="http://dcreager.net/2010/01/05/omnigraffle-export/"/>
    <updated>2010-01-05T00:00:00-05:00</updated>
    <id>http://dcreager.net/2010/01/05/omnigraffle-export</id>
    <content type="html">&lt;p&gt;&lt;a href='http://www.omnigroup.com/applications/OmniGraffle/'&gt;OmniGraffle&lt;/a&gt; is my tool of choice for creating figures for my papers. It&amp;#8217;s biggest drawback is that it&amp;#8217;s only available for Mac OS, which can make it cumbersome if I&amp;#8217;m working on one of my Linux machines and need to create or modify a figure. But it&amp;#8217;s ease-of-use and the quality of the figures it creates are hard to beat.&lt;/p&gt;

&lt;p&gt;Of course, creating the figure isn&amp;#8217;t enough — since I write my papers in LaTeX, I have to export my figures into EPS or PDF (depending on whether I&amp;#8217;m creating a PostScript or PDF version of the paper) before I can use them in my documents. It&amp;#8217;s easy enough to use the Export dialog to do this (keyboard shortcut: ⌥⌘E), but ideally I&amp;#8217;d like the ability to export figures from the command line. Coupled with a good Makefile, this would let me run a simple &lt;code&gt;make paper&lt;/code&gt; command, and automatically re-export any necessary figures before rebuilding the paper itself.&lt;/p&gt;

&lt;p&gt;Luckily, OmniGraffle has always had rather good support for being controlled via AppleScript. The commands can be somewhat undocumented, requiring a bit of trial and error, but while entrenched in our PhD studies at Oxford, my colleague David Faitelson and I were able to whip together a script that suited our needs. I&amp;#8217;ve recently extracted the code from our Oxford SVN repository and uploaded it to &lt;a href='http://github.com/dcreager/graffle-export'&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To install the script, just place the &lt;em&gt;graffle.sh&lt;/em&gt; and &lt;em&gt;graffle.scpt&lt;/em&gt; files into some directory that&amp;#8217;s on your &lt;code&gt;$PATH&lt;/code&gt;, such as &lt;em&gt;/usr/local/bin&lt;/em&gt; or &lt;em&gt;$HOME/bin&lt;/em&gt;. Then just run&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ graffle.sh «format» «graffle file» «output file»&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This will open the figure in OmniGraffle, and export it into the format you specify on the command line, saving the result into &lt;code&gt;«output file»&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m still using version 4 of OmniGraffle, so I haven&amp;#8217;t had a chance to verify that the script still works with version 5. If you try it, and it doesn&amp;#8217;t, feel free to open up a ticket on the &lt;a href='http://github.com/dcreager/graffle-export/issues'&gt;Github site&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>“High-water mark” buffers</title>
    <link href="http://dcreager.net/2009/12/23/high-water-mark-buffers/"/>
    <updated>2009-12-23T00:00:00-05:00</updated>
    <id>http://dcreager.net/2009/12/23/high-water-mark-buffers</id>
    <content type="html">&lt;p&gt;My coding project for today was to extract out some code for dealing with “high-water mark buffers”, putting it in a separate library call &lt;code&gt;libhwm&lt;/code&gt;. In this post, I&amp;#8217;m going to describe the rationale for using them, and a brief overview of how to use the library. (The library is hosted on &lt;a href='http://github.com/dcreager/libhwm/'&gt;Github&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;By the way, this post (and the library) is all in C.&lt;/p&gt;

&lt;h2 id='whats_all_this_then'&gt;What&amp;#8217;s all this then?&lt;/h2&gt;

&lt;p&gt;A common idiom I&amp;#8217;m having to deal with these days is reading a really large number of records from a data file. We&amp;#8217;re talking well into the millions of records, but we want the code to scale well past that.&lt;/p&gt;

&lt;h3 id='step_1_fixedlength_records'&gt;Step 1: Fixed-length records&lt;/h3&gt;

&lt;p&gt;Let&amp;#8217;s say that we need to read each record into a simple &lt;code&gt;struct&lt;/code&gt;. For now, we&amp;#8217;re going to use nice, fixed-length fields:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;4&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;rec1_t&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;With this datatype, we can actually read data from a file very quickly; we&amp;#8217;ll just store each record directly in the file, in binary, using 8 bytes. (To simplify things, I&amp;#8217;m not worrying about the endianness of the integers, or whether the &lt;code&gt;struct&lt;/code&gt; is packed; both are easily handled with some pretty simple macro-fu.)&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;file&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='cm'&gt;/* whatever */&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt; &lt;span class='n'&gt;rec1_t&lt;/span&gt;  &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt; 
&lt;span class='lineno'&gt;4&lt;/span&gt; &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;rec1_t&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;file&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;6&lt;/span&gt;     &lt;span class='cm'&gt;/* process the record */&lt;/span&gt;
&lt;span class='lineno'&gt;7&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The C library&amp;#8217;s stream API (&lt;code&gt;fread&lt;/code&gt; and friends) will buffer the data from the actual file, so this gives us pretty good performance.&lt;/p&gt;

&lt;h3 id='step_2_variablelength_records'&gt;Step 2: Variable-length records&lt;/h3&gt;

&lt;p&gt;What if we have a variable-length field in our &lt;code&gt;struct&lt;/code&gt;, though?&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;4&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt;     &lt;span class='kt'&gt;char&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;6&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;rec2_t&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Often in these cases, you can simplify the problem by deciding not to let &lt;code&gt;name&lt;/code&gt; be a variable-length field. Instead, you decide that you&amp;#8217;ll use (say) exactly 20 bytes for the name, padding out short names and truncating long names as necessary. We don&amp;#8217;t want to do that, however — we want to have a truly variable-length field.&lt;/p&gt;

&lt;p&gt;To store this variable-length field in the file, we need some way of encoding the length of a particular record&amp;#8217;s &lt;code&gt;name&lt;/code&gt; field. If we can assume that none of the records has a name that&amp;#8217;s longer than 4 billion characters, we can use a 32-bit length prefix:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;file&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='cm'&gt;/* whatever */&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt; &lt;span class='n'&gt;rec2_t&lt;/span&gt;  &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt; 
&lt;span class='lineno'&gt; 4&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt; 
&lt;span class='lineno'&gt; 8&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt;               &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;file&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt;         &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;12&lt;/span&gt;               &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;file&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;         &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt;               &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;file&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;16&lt;/span&gt;         &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;fread&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;18&lt;/span&gt;               &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;),&lt;/span&gt; &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;file&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;lt;&lt;/span&gt; &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;19&lt;/span&gt;         &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;20&lt;/span&gt;     &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;\0&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;21&lt;/span&gt; 
&lt;span class='lineno'&gt;22&lt;/span&gt;     &lt;span class='cm'&gt;/* process the record */&lt;/span&gt;
&lt;span class='lineno'&gt;23&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;That&amp;#8217;s pretty ugly and repetitive, so let&amp;#8217;s play some macro games:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;file&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='cm'&gt;/* whatever */&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt; &lt;span class='n'&gt;rec2_t&lt;/span&gt;  &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt; 
&lt;span class='lineno'&gt; 4&lt;/span&gt; &lt;span class='cp'&gt;#define READ_FIELD(dest, type, count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt; &lt;span class='cp'&gt;    if (fread(dest, sizeof(type), count, file) &amp;lt; count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; &lt;span class='cp'&gt;        break;&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt; 
&lt;span class='lineno'&gt; 8&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt; 
&lt;span class='lineno'&gt;12&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;16&lt;/span&gt;     &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;\0&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt; 
&lt;span class='lineno'&gt;18&lt;/span&gt;     &lt;span class='cm'&gt;/* process the record */&lt;/span&gt;
&lt;span class='lineno'&gt;19&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;So the basic idea here is pretty sound — we can store a name of any length without wasted space. And the code is still rather fast; we&amp;#8217;ll have a larger overhead from calling &lt;code&gt;fread&lt;/code&gt; multiple times, but the number of low-level I/O reads will still be roughly the same.&lt;/p&gt;

&lt;p&gt;But unfortunately, there&amp;#8217;s a glaring error here. This code will segfault, since we haven&amp;#8217;t actually allocated any memory for the &lt;code&gt;record.name&lt;/code&gt; field.&lt;/p&gt;

&lt;h3 id='step_3_allocate_some_memory'&gt;Step 3: Allocate some memory&lt;/h3&gt;

&lt;p&gt;So what&amp;#8217;s the simplest way we can allocate memory for the &lt;code&gt;record.name&lt;/code&gt; field? The naïve approach would be to &lt;code&gt;malloc&lt;/code&gt; a new string for every record:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;file&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='cm'&gt;/* whatever */&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt; &lt;span class='n'&gt;rec2_t&lt;/span&gt;  &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt; 
&lt;span class='lineno'&gt; 4&lt;/span&gt; &lt;span class='cp'&gt;#define READ_FIELD(dest, type, count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt; &lt;span class='cp'&gt;    if (fread(dest, sizeof(type), count, file) &amp;lt; count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; &lt;span class='cp'&gt;        break;&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt; 
&lt;span class='lineno'&gt; 8&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt; 
&lt;span class='lineno'&gt;12&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt; 
&lt;span class='lineno'&gt;16&lt;/span&gt;     &lt;span class='cm'&gt;/* Remember to include an extra byte for the NUL terminator! */&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt; 
&lt;span class='lineno'&gt;18&lt;/span&gt;     &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;char&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='n'&gt;malloc&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='o'&gt;+&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;));&lt;/span&gt;
&lt;span class='lineno'&gt;19&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='nb'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;20&lt;/span&gt; 
&lt;span class='lineno'&gt;21&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;22&lt;/span&gt;     &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;\0&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;23&lt;/span&gt; 
&lt;span class='lineno'&gt;24&lt;/span&gt;     &lt;span class='cm'&gt;/* process the record */&lt;/span&gt;
&lt;span class='lineno'&gt;25&lt;/span&gt; 
&lt;span class='lineno'&gt;26&lt;/span&gt;     &lt;span class='n'&gt;free&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;27&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;This will avoid the segfault, and let you process your data, but it will perform &lt;em&gt;horribly&lt;/em&gt;, since we&amp;#8217;re calling down into the heap management code for &lt;strong&gt;every single record&lt;/strong&gt;! And remember, we&amp;#8217;re talking about millions of records here.&lt;/p&gt;

&lt;h2 id='step_4_highwater_mark_buffers'&gt;Step 4: High-water mark buffers&lt;/h2&gt;

&lt;p&gt;So what&amp;#8217;s the solution? A high-water mark buffer. The idea is that instead of allocating a new string each time through the loop, you remember how large your current string is. As long as the next record&amp;#8217;s &lt;code&gt;name&lt;/code&gt; isn&amp;#8217;t longer than your buffer, you can reuse it, saving you a call to &lt;code&gt;malloc&lt;/code&gt;. If it is longer, you &lt;code&gt;realloc&lt;/code&gt; it to be large enough for the new string. If you think of the lengths of the &lt;code&gt;name&lt;/code&gt; strings as a rising tide of water, you see where the name of the buffer comes from.&lt;/p&gt;

&lt;p&gt;We can do a high-water mark buffer by hand:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;file&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='cm'&gt;/* whatever */&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt; &lt;span class='n'&gt;rec2_t&lt;/span&gt;  &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt; &lt;span class='kt'&gt;size_t&lt;/span&gt;  &lt;span class='n'&gt;allocated_name_size&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 4&lt;/span&gt; 
&lt;span class='lineno'&gt; 5&lt;/span&gt; &lt;span class='cp'&gt;#define READ_FIELD(dest, type, count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; &lt;span class='cp'&gt;    if (fread(dest, sizeof(type), count, file) &amp;lt; count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt; &lt;span class='cp'&gt;        break;&lt;/span&gt;
&lt;span class='lineno'&gt; 8&lt;/span&gt; 
&lt;span class='lineno'&gt; 9&lt;/span&gt; &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt; 
&lt;span class='lineno'&gt;11&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
&lt;span class='lineno'&gt;12&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;     &lt;span class='kt'&gt;size_t&lt;/span&gt;  &lt;span class='n'&gt;name_size&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt; 
&lt;span class='lineno'&gt;16&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;18&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;19&lt;/span&gt; 
&lt;span class='lineno'&gt;20&lt;/span&gt;     &lt;span class='cm'&gt;/* Remember to include an extra byte for the NUL terminator! */&lt;/span&gt;
&lt;span class='lineno'&gt;21&lt;/span&gt; 
&lt;span class='lineno'&gt;22&lt;/span&gt;     &lt;span class='n'&gt;name_size&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='o'&gt;+&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;23&lt;/span&gt; 
&lt;span class='lineno'&gt;24&lt;/span&gt;     &lt;span class='cm'&gt;/* Reallocate the buffer if it&amp;#39;s not big enough */&lt;/span&gt;
&lt;span class='lineno'&gt;25&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name_size&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='n'&gt;allocated_name_size&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;26&lt;/span&gt;     &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;27&lt;/span&gt;         &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;char&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='n'&gt;realloc&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;name_size&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;28&lt;/span&gt;         &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='nb'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;29&lt;/span&gt;         &lt;span class='n'&gt;allocated_name_size&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;name_size&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;30&lt;/span&gt;     &lt;span class='p'&gt;}&lt;/span&gt;
&lt;span class='lineno'&gt;31&lt;/span&gt; 
&lt;span class='lineno'&gt;32&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;33&lt;/span&gt;     &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;\0&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;34&lt;/span&gt; 
&lt;span class='lineno'&gt;35&lt;/span&gt;     &lt;span class='cm'&gt;/* process the record */&lt;/span&gt;
&lt;span class='lineno'&gt;36&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;37&lt;/span&gt; 
&lt;span class='lineno'&gt;38&lt;/span&gt; &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt; &lt;span class='o'&gt;!=&lt;/span&gt; &lt;span class='nb'&gt;NULL&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;39&lt;/span&gt;     &lt;span class='n'&gt;free&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Note that &lt;code&gt;realloc&lt;/code&gt; does the “right thing” if &lt;code&gt;record.name&lt;/code&gt; is &lt;code&gt;NULL&lt;/code&gt;; this indicates that we haven&amp;#8217;t allocated a buffer yet, and so &lt;code&gt;realloc&lt;/code&gt; acts like &lt;code&gt;malloc&lt;/code&gt; in this case.&lt;/p&gt;

&lt;h2 id='highwater_mark_library'&gt;High-water mark library&lt;/h2&gt;

&lt;p&gt;So, we&amp;#8217;ve described why you&amp;#8217;d want a high-water mark buffer, and how to implement one. But once you write that same code three or four times, you decide to factor it out into a library. Hence &lt;a href='http://github.com/dcreager/libhwm/'&gt;libhwm&lt;/a&gt;. Here&amp;#8217;s the same file reading code using the library:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='k'&gt;typedef&lt;/span&gt; &lt;span class='k'&gt;struct&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;4&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt;     &lt;span class='n'&gt;hwm_buffer_t&lt;/span&gt;  &lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;6&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='n'&gt;rec3_t&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='c'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='kt'&gt;FILE&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;file&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='cm'&gt;/* whatever */&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt; &lt;span class='n'&gt;rec3_t&lt;/span&gt;  &lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt; 
&lt;span class='lineno'&gt; 4&lt;/span&gt; &lt;span class='cp'&gt;#define READ_FIELD(dest, type, count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt; &lt;span class='cp'&gt;    if (fread(dest, sizeof(type), count, file) &amp;lt; count) \&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; &lt;span class='cp'&gt;        break;&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt; 
&lt;span class='lineno'&gt; 8&lt;/span&gt; &lt;span class='n'&gt;hwm_buffer_init&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt; 
&lt;span class='lineno'&gt;10&lt;/span&gt; &lt;span class='k'&gt;do&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt; &lt;span class='p'&gt;{&lt;/span&gt;
&lt;span class='lineno'&gt;12&lt;/span&gt;     &lt;span class='kt'&gt;uint32_t&lt;/span&gt;  &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;     &lt;span class='kt'&gt;char&lt;/span&gt;  &lt;span class='o'&gt;*&lt;/span&gt;&lt;span class='n'&gt;name_ptr&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt; 
&lt;span class='lineno'&gt;15&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;id&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;16&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;num_bananas&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;uint32_t&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;18&lt;/span&gt; 
&lt;span class='lineno'&gt;19&lt;/span&gt;     &lt;span class='cm'&gt;/* Remember to include an extra byte for the NUL terminator! */&lt;/span&gt;
&lt;span class='lineno'&gt;20&lt;/span&gt; 
&lt;span class='lineno'&gt;21&lt;/span&gt;     &lt;span class='n'&gt;name_size&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='k'&gt;sizeof&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;*&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='o'&gt;+&lt;/span&gt;&lt;span class='mi'&gt;1&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;22&lt;/span&gt; 
&lt;span class='lineno'&gt;23&lt;/span&gt;     &lt;span class='cm'&gt;/* Read into the HWM buffer */&lt;/span&gt;
&lt;span class='lineno'&gt;24&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;!&lt;/span&gt;&lt;span class='n'&gt;hwm_buffer_ensure_size&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;name_size&lt;/span&gt;&lt;span class='p'&gt;))&lt;/span&gt;
&lt;span class='lineno'&gt;25&lt;/span&gt;         &lt;span class='k'&gt;break&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;26&lt;/span&gt; 
&lt;span class='lineno'&gt;27&lt;/span&gt;     &lt;span class='n'&gt;name_ptr&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;hwm_buffer_writable_mem&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;28&lt;/span&gt;     &lt;span class='n'&gt;READ_FIELD&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;name_ptr&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='kt'&gt;char&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;29&lt;/span&gt;     &lt;span class='n'&gt;name_ptr&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;name_length&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='sc'&gt;&amp;#39;\0&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;;&lt;/span&gt;
&lt;span class='lineno'&gt;30&lt;/span&gt; 
&lt;span class='lineno'&gt;31&lt;/span&gt;     &lt;span class='cm'&gt;/* process the record */&lt;/span&gt;
&lt;span class='lineno'&gt;32&lt;/span&gt; &lt;span class='p'&gt;}&lt;/span&gt; &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;true&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;span class='lineno'&gt;33&lt;/span&gt; 
&lt;span class='lineno'&gt;34&lt;/span&gt; &lt;span class='n'&gt;hwm_buffer_done&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='o'&gt;&amp;amp;&lt;/span&gt;&lt;span class='n'&gt;record&lt;/span&gt;&lt;span class='p'&gt;.&lt;/span&gt;&lt;span class='n'&gt;name&lt;/span&gt;&lt;span class='p'&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Et voila. Of course, this last code snippet makes me realize that we could make things even simpler with an &lt;code&gt;hwm_buffer_fread&lt;/code&gt; function! The story never ends&amp;#8230;&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Decentralized datatypes</title>
    <link href="http://dcreager.net/2009/12/21/decentralized-datatypes/"/>
    <updated>2009-12-21T00:00:00-05:00</updated>
    <id>http://dcreager.net/2009/12/21/decentralized-datatypes</id>
    <content type="html">&lt;p&gt;Over the past year or so there have been quite a few blog postings in the REST world about MIME types, and their role in the REST architecture. A lot of the discussion seems to be prompted by WADL, which is an attempt to define a WSDL-style interface description language for REST services. &lt;a href='http://bitworking.org/news/193/Do-we-need-WADL'&gt;Joe Gregorio&lt;/a&gt; argues that MIME types are more useful for describing the semantics of a service than a WADL document, since there are parts of the service&amp;#8217;s semantics that just can&amp;#8217;t be encoded in a machine-readable format. MIME types acknowledge this, providing a standard way of identifying a data format and pointing to the human- and machine-readable documents (such as RFCs and XSDs) that define the syntax and accompanying semantics.&lt;/p&gt;

&lt;p&gt;Following this idea, several people have begun debating whether or not the centralized assignment of MIME types is the right way to handle the variety of data formats that REST-based systems produce and consume. &lt;a href='http://www.markbaker.ca/blog/2008/02/media-type-centralization-is-a-feature-not-a-bug/'&gt;Mark Baker&lt;/a&gt; comes in on the side of centralized assignment, whereas &lt;a href='http://www.innoq.com/blog/st/2008/02/decentralizing_media_types.html'&gt;Stefan Tilkov&lt;/a&gt;, &lt;a href='http://netzooid.com/blog/2008/02/07/why-a-restful-idl-is-an-oxymoron-and-what-we-really-need-instead/'&gt;Dan Diephouse&lt;/a&gt;, and &lt;a href='http://macstrac.blogspot.com/2007/11/atompub-services-and-auto-detecting.html'&gt;James Strachan&lt;/a&gt; argue in favor of decentralized types. &lt;a href='http://bill.burkecentral.com/2008/03/05/restful-xml-content-negotitation/'&gt;Bill Burke&lt;/a&gt; and &lt;a href='http://soundadvice.id.au/blog/2009/08/16/#mimeLimitation'&gt;Benjamin Carlyle&lt;/a&gt; have good summaries of the different proposed technical solutions that would enable decentralized types.&lt;/p&gt;

&lt;h2 id='extended_types'&gt;“Extended” types&lt;/h2&gt;

&lt;p&gt;One of the arguments in favor of centralized assignment is that allowing everyone to invent their own MIME types would ruin interoperability. And for certain cases, this seems pretty obviously true. It&amp;#8217;s a good thing that we have a standardized &lt;code&gt;image/png&lt;/code&gt; MIME type; this allows your browser to correctly display the website logo you see up in the upper left corner. If I were daft, I could decide to serve that logo using a MIME type of &lt;code&gt;image/x-dcreager-png&lt;/code&gt; (or similar) to indicate that I&amp;#8217;ve included some particular set of metadata in an ancillary chunk of the PNG.&lt;/p&gt;

&lt;p&gt;Why would I want to do this? Maybe I&amp;#8217;m writing an application that knows how to process this metadata, and I&amp;#8217;d like to easily determine whether a particular resource I&amp;#8217;m accessing has this metadata or not. The &lt;a href='http://www.openmicroscopy.org'&gt;Open Microscopy Environment&lt;/a&gt; does exactly this; they&amp;#8217;ve defined an XML schema that allows biology researchers to provide additional scientific metadata about an image or movie captured from a high-end microscope. One way to encode an image and its metadata is as an “OME-TIFF”, a data format that includes the metadata in an optional TIFF section. OME-TIFF files are also perfectly valid as regular TIFF files. This has the benefit that an OME-aware application can access the scientific metadata, whereas a “regular” image processing application can read the image using its normal TIFF decoder.&lt;/p&gt;

&lt;p&gt;Of course, now we have competing goals that we have to reconcile. On the one hand, we need to ensure that OME-aware applications can see that a particular image is an OME-TIFF. On the other, we need non-OME-aware applications to see the image as a regular TIFF. One of the decentralized proposals — MIME type parameters — tries to address this. For instance, a MIME type for an OME-TIFF might be &lt;code&gt;image/tiff;
ome=xml&lt;/code&gt;. By using the standard &lt;code&gt;image/tiff&lt;/code&gt; as the base MIME type, non-OME-aware applications correctly treat it as a simple TIFF. OME-aware applications would know that the &lt;code&gt;ome=xml&lt;/code&gt; parameter indicates that the OME-specific metadata is present.&lt;/p&gt;

&lt;h2 id='the_multitude_of_xml_types'&gt;The multitude of XML types&lt;/h2&gt;

&lt;p&gt;Another example given is that of an XML document. Most applications will generate XML documents that conform to a particular schema (for instance, a company-specific purchase order), which they might encode as an XSD. Now, the XSD on its own doesn&amp;#8217;t give you the full story on how to process that data, but it does provide some detail on how the data is structured. If you&amp;#8217;re writing an application that consumes this data, having the XSD available would be helpful. More interesting is an application that can consume &lt;em&gt;any&lt;/em&gt; XML document — and which might use an XSD or RelaxNG schema to customize the UI used to display the document.&lt;/p&gt;

&lt;p&gt;In both cases, the schema is necessary to process the document, but for different reasons. In the first case, the consuming application was built with advance knowledge of how the data should be handled, and the schema is used to direct a particular document to the code that implements this knowledge. In the second case, the particular datatype is unimportant, and the application-specific semantics aren&amp;#8217;t used; the data is only consumed as a “generic XML document”, and the schema is used to describe the specific structure of the elements.&lt;/p&gt;

&lt;h2 id='data_doesnt_have_a_single_type'&gt;Data doesn&amp;#8217;t have a single type&lt;/h2&gt;

&lt;p&gt;The common theme in both of these examples is that a single datatype isn&amp;#8217;t enough to describe the data we&amp;#8217;re dealing with. As Roy Fielding &lt;a href='http://roy.gbiv.com/untangled/2009/wrangling-mimetypes'&gt;points out&lt;/a&gt;, “all data formats correspond to multiple media types”. It&amp;#8217;s tempting to think of a datatype as just “the syntax and structure of the data”. But it must also include some intuition about how the data will be used.&lt;/p&gt;

&lt;p&gt;From this point of view, the generic XML processing application does &lt;em&gt;not&lt;/em&gt; handle a multitude of datatypes. Instead, it handles exactly one: “generic XML document with associated schema”. The application that knows how to process this particular schema will handle a different, completely distinct, datatype: “company-specific purchase order XML document”. And the particular XML document in question — a single sequence of bytes that is a single representation of a single resource — is an instance of both types.&lt;/p&gt;

&lt;p&gt;Why shift things around like this? Doesn&amp;#8217;t it just move the complexity from the consumer (who used to consume multiple types) to the producer (who must now publish the XML document under different types)? Not necessarily. The key idea is that we can use &lt;em&gt;transformation graphs&lt;/em&gt; to encode the relationships between the datatypes:&lt;/p&gt;
&lt;div class='figure'&gt;
  &lt;img src='/images/2009/12/21/decentralized-datatypes/xform-graph.png' alt='transformation graph' /&gt;
&lt;/div&gt;
&lt;p&gt;In this specific example, the transformation is simple — since the same sequence of bytes is a valid instance of both types, we don&amp;#8217;t have to modify the data itself. The decentralized MIME types (especially the MIME parameter proposal) already support these kinds of “no-op” transformations: the more generic type is the “base” MIME type, and the more specific extensions are encoded as MIME parameters. However, by modeling the type relationships as an arbitrary graph, we open up the possibility of more complex sets of types, which might require actual code to transform between them, but which can be defined in a decentralized manner.&lt;/p&gt;

&lt;p&gt;Even though the model is more complex, we haven&amp;#8217;t required the producer or consumer to be more complex. A transformation graph is necessary to translate between the two different (but compatible) types, but the graph doesn&amp;#8217;t have to specifically live at the producer or the consumer. The producer can publish the data using the only type it knows about (the “company-specific” type), and a consumer can request the data using the only type it knows about (such as the “generic XML” type). Anywhere along the path from the producer to the consumer, we can use the transformation graph to automatically transform the data from one type to the other.&lt;/p&gt;

&lt;p&gt;More details on transformation graphs, including more complex examples, can be found in my &lt;a href='/publications/012-dphil-thesis'&gt;DPhil thesis&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Simulating “make distclean” in SCons</title>
    <link href="http://dcreager.net/2009/12/18/make-distclean-in-scons/"/>
    <updated>2009-12-18T00:00:00-05:00</updated>
    <id>http://dcreager.net/2009/12/18/make-distclean-in-scons</id>
    <content type="html">&lt;p&gt;SCons provides an automatic “clean” target out of the box — just run &lt;code&gt;scons -c&lt;/code&gt;, and SCons will delete all of the objects that it knows how to build. This is a very useful feature; however, there are two main missing features that I want to add to my build scripts. First, I want to be able to delete all of the temporary files SCons uses, such as its configuration cache and any files I use to store variable values. These aren&amp;#8217;t included in the default list of the files to clean up. Second, I want more control over which items are deleted by default, when you specify &lt;code&gt;scons -c&lt;/code&gt; without any targets. I&amp;#8217;ll describe my solution to the first problem in this post. I&amp;#8217;ll write up the second problem in another post.&lt;/p&gt;

&lt;h2 id='deleting_sconss_temporary_files'&gt;Deleting SCons&amp;#8217;s temporary files&lt;/h2&gt;

&lt;p&gt;This feature is akin to the &lt;code&gt;make distclean&lt;/code&gt; target that Automake puts into the Makefiles that it generates. This differs from &lt;code&gt;make clean&lt;/code&gt;; &lt;code&gt;make clean&lt;/code&gt; is intended to delete all of the build products, but leave behind the results of the &lt;code&gt;configure&lt;/code&gt; step, whereas &lt;code&gt;make
distclean&lt;/code&gt; is supposed delete &lt;em&gt;everything&lt;/em&gt;, returning the source tree to the same state as when you&amp;#8217;d just unpacked the tarball.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;scons -c&lt;/code&gt; command is analogous to &lt;code&gt;make clean&lt;/code&gt;, and requires no setup. It will automatically delete any of the build products that are created by running &lt;code&gt;scons&lt;/code&gt;. There are several cache files that SCons creates, however, and it would be nice to have an equivalent to &lt;code&gt;make distclean&lt;/code&gt;. This is especially useful when developing a new &lt;code&gt;Configuration&lt;/code&gt; check, for instance — if you make a change to the test, you want to be able to (easily) force SCons to ignore any cached results, and try all of the tests again.&lt;/p&gt;

&lt;p&gt;This is actually fairly easy to set up, assuming you know the list of temporary files that SCons will create. You can add the following rule to your top-level &lt;em&gt;SConstruct&lt;/em&gt; file:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Clean&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;distclean&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt;           &lt;span class='p'&gt;[&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt;            &lt;span class='s'&gt;&amp;quot;.sconsign.dblite&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;4&lt;/span&gt;            &lt;span class='s'&gt;&amp;quot;.sconf_temp&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt;            &lt;span class='s'&gt;&amp;quot;config.log&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
&lt;span class='lineno'&gt;6&lt;/span&gt;           &lt;span class='p'&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;As far as I can tell, these three files are always created by SCons. To delete these files, you simply run &lt;code&gt;scons -c distclean&lt;/code&gt;. Because we&amp;#8217;ve defined the target using &lt;code&gt;Clean&lt;/code&gt;, it will only be run when you pass in the &lt;code&gt;-c&lt;/code&gt; option to &lt;code&gt;scons&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Since we&amp;#8217;re putting together the file list manually, you should make sure to add any additional cache files that your SCons scripts use. For instance, I&amp;#8217;m using some &lt;code&gt;Variable&lt;/code&gt; options, which I store into a file called &lt;em&gt;.scons.vars&lt;/em&gt;. (This means that the user doesn&amp;#8217;t have to type them in with every invocation of &lt;code&gt;scons&lt;/code&gt;.) By using these variables, I have to add another entry to the &lt;code&gt;distclean&lt;/code&gt; target:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt;1&lt;/span&gt; &lt;span class='nb'&gt;vars&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;Variables&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;#39;.scons.vars&amp;#39;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;ARGUMENTS&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;2&lt;/span&gt; &lt;span class='c'&gt;# ...define a bunch of variables&lt;/span&gt;
&lt;span class='lineno'&gt;3&lt;/span&gt; &lt;span class='nb'&gt;vars&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Update&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;4&lt;/span&gt; &lt;span class='nb'&gt;vars&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Save&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;.scons.vars&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;5&lt;/span&gt; 
&lt;span class='lineno'&gt;6&lt;/span&gt; &lt;span class='n'&gt;env&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Clean&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;distclean&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;.scons.vars&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;Note how, just like with any SCons target, I can define the &lt;code&gt;distclean&lt;/code&gt; target multiple times. SCons will take care of merging them into a single action, deleting all of the specified files when you run &lt;code&gt;scons -c distclean&lt;/code&gt;.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Downgrading packages in Ubuntu</title>
    <link href="http://dcreager.net/2009/09/08/ubuntu-downgrading/"/>
    <updated>2009-09-08T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/09/08/ubuntu-downgrading</id>
    <content type="html">&lt;h2 id='what_kind_of_trouble_did_you_get_yourself_into_this_time'&gt;What kind of trouble did you get yourself into this time?&lt;/h2&gt;

&lt;p&gt;I have recently been setting up a new machine that I&amp;#8217;ll be using jointly as a work machine and a MythTV frontend/backend. As part of setting this up, I&amp;#8217;ve had several issues getting the integrated ATI Radeon HD 3200 display board to work correctly. I&amp;#8217;ll save the details for another post, but the short version is that none of the three available X drivers (ATI&amp;#8217;s &lt;code&gt;fglrx&lt;/code&gt; and the open-source &lt;code&gt;radeon&lt;/code&gt; and &lt;code&gt;radeonhd&lt;/code&gt;) seem to drive the HDMI output connector correctly.&lt;/p&gt;

&lt;p&gt;As part of testing this, I used &lt;a href='https://launchpad.net/~tormodvolden/+archive/ppa'&gt;Tormod Volden&amp;#8217;s packages&lt;/a&gt; for the &lt;code&gt;radeon&lt;/code&gt; and &lt;code&gt;radeonhd&lt;/code&gt; drivers, which are based on newer releases than are available in the mainline Jaunty package trees. For some packages, they are even based on bleeding-edge git checkouts, rather than released tarballs. (More notes on using &lt;code&gt;radeonhd&lt;/code&gt; can be found &lt;a href='https://help.ubuntu.com/community/RadeonHD'&gt;here&lt;/a&gt;.) While neither package was able to use the HDMI connector properly, the &lt;code&gt;radeon&lt;/code&gt; driver was able to give me output on the VGA connector, with full 2D and video acceleration (which I needed for the MythTV front-end). My monitor (a Westinghouse L2410NM) was auto-detected through the connection, so my &lt;em&gt;xorg.conf&lt;/em&gt; is trivial.&lt;/p&gt;

&lt;p&gt;However, neither open-source driver has 3D acceleration support yet. To get this, I&amp;#8217;ll need to use ATI&amp;#8217;s &lt;code&gt;fglrx&lt;/code&gt; driver. Not a problem, right? Just install the packages, then change the &lt;code&gt;radeon&lt;/code&gt; entry in your &lt;em&gt;xorg.conf&lt;/em&gt; over to &lt;code&gt;fglrx&lt;/code&gt;, and you&amp;#8217;re good to go! Except that by using Tormod&amp;#8217;s package tree to pick up the latest &lt;code&gt;radeon&lt;/code&gt; and &lt;code&gt;radeonhd&lt;/code&gt; drivers, I &lt;em&gt;also&lt;/em&gt; pick up more recent versions of &lt;code&gt;xserver&lt;/code&gt; and friends, and it would seem that the &lt;code&gt;fglrx&lt;/code&gt; drivers don&amp;#8217;t play well with the new version — I get segfaults when the X server tries to start using &lt;code&gt;fglrx&lt;/code&gt;, which didn&amp;#8217;t happen before installing Tormod&amp;#8217;s drivers.&lt;/p&gt;

&lt;p&gt;So, I need to back out all of the packages installed from Tormod&amp;#8217;s tree, and revert back to the versions of these packages that I had previously installed from the mainline Jaunty trees.&lt;/p&gt;

&lt;h2 id='how_you_might_think_it_should_work'&gt;How you might think it should work&lt;/h2&gt;

&lt;p&gt;Anticipating that I might want to remove Tormod&amp;#8217;s tree from my &lt;em&gt;sources.list&lt;/em&gt;, I used the nice &lt;em&gt;sources.list.d&lt;/em&gt; facility. Instead of putting all of your package sources into a single file, you put them in separate files in the &lt;em&gt;/etc/apt/sources.list.d&lt;/em&gt; directory. That way, activating and deactivating a particular package tree is as simple as moving a file and re-running &lt;code&gt;apt-get update&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo mkdir /etc/apt/sources.list.d.unused
$ sudo mv /etc/apt/sources.list.d/tormod.list /etc/apt/sources.list.d.unused
$ sudo apt-get update&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we&amp;#8217;ve removed Tormod&amp;#8217;s packages from our list of available packages. Ideally, we could run an &lt;code&gt;apt-get&lt;/code&gt; command to “downgrade” our packages to those that are mentioned in our package lists. However, I wasn&amp;#8217;t able to find such a command. If you try to run &lt;code&gt;apt-get upgrade&lt;/code&gt; or &lt;code&gt;apt-get dist-upgrade&lt;/code&gt;, APT will see that you have more recent versions of the X packages installed (the ones from Tormod&amp;#8217;s tree), and won&amp;#8217;t overwrite those with the older packages mentioned in the sources that we&amp;#8217;ve activated. Normally, this is the behavior that we want; but in this case, we&amp;#8217;re boned.&lt;/p&gt;

&lt;h2 id='downgrading_explicitly'&gt;Downgrading explicitly&lt;/h2&gt;

&lt;p&gt;Instead, we&amp;#8217;ll need to remove the newer packages using &lt;code&gt;apt-get
remove&lt;/code&gt;, and then reinstall them using &lt;code&gt;apt-get install&lt;/code&gt;. Unfortunately, you have to specify which packages you want to remove and reinstall on the command line. So, we need a command pipeline that will tell us “all packages installed from Tormod&amp;#8217;s tree”, so that we can call &lt;code&gt;apt-get remove&lt;/code&gt; and &lt;code&gt;apt-get install&lt;/code&gt; on them.&lt;/p&gt;

&lt;p&gt;First, we can get a list of the packages defined by a particular source by reading the files in &lt;em&gt;/var/lib/apt/lists&lt;/em&gt;. This directory contains the local copies of the package list files that are downloaded from each source. Each source has its own files, which makes it easy to distinguish the packages that came from Tormod&amp;#8217;s tree from those that came from mainline Jaunty. However, there will only be files for the activated sources — so if you&amp;#8217;ve moved the &lt;em&gt;tormod.list&lt;/em&gt; file like I did above, you won&amp;#8217;t find a file for Tormod&amp;#8217;s package tree. First, we&amp;#8217;ll have to reactivate his package tree:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo mv /etc/apt/sources.list.d.unused/tormod.list /etc/apt/sources.list.d
$ sudo apt-get update&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now that we&amp;#8217;re sure that Tormod&amp;#8217;s tree has a file in &lt;em&gt;/var/lib/apt/lists&lt;/em&gt;. The filename is based on the URL of the &lt;em&gt;sources.list&lt;/em&gt; entry that you want to deal with. So if you&amp;#8217;re following these instructions for a different package tree than Tormod&amp;#8217;s X packages, you&amp;#8217;ll need to find the correct file and &lt;code&gt;grep&lt;/code&gt; it instead. You&amp;#8217;ll see several files for each source; you want the file that ends in &lt;code&gt;Packages&lt;/code&gt; and contains &lt;code&gt;binary&lt;/code&gt; and your architecture in the name.&lt;/p&gt;

&lt;p&gt;Once we find the file, we can extract out the &lt;code&gt;Package&lt;/code&gt; lines from it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ cd /var/lib/apt/lists
$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages
Package: xserver-xorg-video-ati
Package: xserver-xorg-video-ati-dbg
Package: xserver-xorg-video-radeon
Package: xserver-xorg-video-radeon-dbg
[...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We only want the package names, though, so we need to use &lt;code&gt;sed&lt;/code&gt; to strip out the &lt;code&gt;Package:&lt;/code&gt; prefix:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages \
  | sed -e &amp;#39;s/^Package: //&amp;#39;
xserver-xorg-video-ati
xserver-xorg-video-ati-dbg
xserver-xorg-video-radeon
xserver-xorg-video-radeon-dbg
[...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This tells us which packages are defined in Tormod&amp;#8217;s package tree. However, we can&amp;#8217;t pass all of them into &lt;code&gt;apt-get install&lt;/code&gt;, because we probably haven&amp;#8217;t installed all of the packages that Tormod made available. We can use &lt;code&gt;dpkg-query&lt;/code&gt; to see which of these packages are actually installed:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages \
  | sed -e &amp;#39;s/^Package: //&amp;#39; \
  | xargs dpkg-query -W 2&amp;gt;/dev/null
gnome-screensaver	2.24.0-0ubuntu6
libgl1-mesa-dev	7.4-0ubuntu3.2
libgl1-mesa-swx11-dev	
mesa-common-dev	7.4-0ubuntu3.2
xdmx	
xnest	
[...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first four lines of this output describe packages that are installed, while the last two describe packages that are not installed. We can distinguish between the two cases by the presence or absence of the version number. Looking closely, we can see that the package name and version number are separated by a tab character. Hopefully, that tab isn&amp;#8217;t printed for uninstalled packages, which would let us just look for a tab character to filter out the uninstalled packages. Let&amp;#8217;s try it and see:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages \
  | sed -e &amp;#39;s/^Package: //&amp;#39; \
  | xargs dpkg-query -W 2&amp;gt;/dev/null \
  | less -U
gnome-screensaver^I2.24.0-0ubuntu6
libgl1-mesa-dev^I7.4-0ubuntu3.2
libgl1-mesa-swx11-dev^I
mesa-common-dev^I7.4-0ubuntu3.2
xdmx^I
xnest^I
[...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Ah well, it was worth a try. But as a consolation, we can check for a tab followed by any other character, and we&amp;#8217;ll get the installed packages:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages \
  | sed -e &amp;#39;s/^Package: //&amp;#39; \
  | xargs dpkg-query -W 2&amp;gt;/dev/null \
  | grep &amp;#39;\t.&amp;#39;
gnome-screensaver^I2.24.0-0ubuntu6
libgl1-mesa-dev^I7.4-0ubuntu3.2
mesa-common-dev^I7.4-0ubuntu3.2
xscreensaver-data^I5.08-1~rc2
[...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note that the &lt;code&gt;\t&lt;/code&gt; in the above is an actual tab character, and not a backslash followed by a T. In most shells, you type in the tab character by pressing Control-V and then the Tab key.&lt;/p&gt;

&lt;p&gt;Now we have a list of installed packages that &lt;em&gt;might&lt;/em&gt; have come from Tormod&amp;#8217;s package tree. I say might have, because it&amp;#8217;s possible that the mainline Jaunty has a newer version of a package that Tormod&amp;#8217;s tree does. Earlier, we said we were looking for the packages that &lt;em&gt;definitely&lt;/em&gt; came from Tormod&amp;#8217;s tree, so that we can reinstall them. If we reinstall all of the packages in this list we&amp;#8217;ve just created, then we might end up reinstalling a package that we don&amp;#8217;t need to. But that&amp;#8217;s not the worst thing in the world — we&amp;#8217;re only reinstalling a dozen or so packages in total, so the extra work of reinstalling a couple of packages that we don&amp;#8217;t need to won&amp;#8217;t really kill us.&lt;/p&gt;

&lt;p&gt;So now we need to pass this list into &lt;code&gt;apt-get&lt;/code&gt; to reinstall the packages. &lt;code&gt;apt-get&lt;/code&gt; only wants the package names, not the versions, so we&amp;#8217;ll need to use &lt;code&gt;sed&lt;/code&gt; again to strip these out: (Like before, the two &lt;code&gt;\t&lt;/code&gt; entries are actual tab characters.)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages \
  | sed -e &amp;#39;s/^Package: //&amp;#39; \
  | xargs dpkg-query -W 2&amp;gt;/dev/null \
  | grep &amp;#39;\t.&amp;#39; \
  | sed -e &amp;#39;s/\t.*$//&amp;#39;
gnome-screensaver
libgl1-mesa-dev
mesa-common-dev
xscreensaver-data
[...]&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Perfect! Let&amp;#8217;s save this package list into a file.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ grep Package ppa.launchpad.net_tormodvolden_ppa_ubuntu_dists_jaunty_main_binary-amd64_Packages \
  | sed -e &amp;#39;s/^Package: //&amp;#39; \
  | xargs dpkg-query -W 2&amp;gt;/dev/null \
  | grep &amp;#39;\t.&amp;#39; \
  | sed -e &amp;#39;s/\t.*$//&amp;#39; \
  &amp;gt; /tmp/tormod.packages&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then, we can deactivate Tormod&amp;#8217;s tree, and then forcably reinstall any of the packages that we might&amp;#8217;ve gotten from it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sudo mv /etc/apt/sources.list.d/tormod.list /etc/apt/sources.list.d.unused
$ sudo apt-get update
$ sudo apt-get remove `cat /tmp/tormod.packages`
$ sudo apt-get install `cat /tmp/tormod.packages`&lt;/code&gt;&lt;/pre&gt;</content>
  </entry>
  
  <entry>
    <title>Using callbacks with the subprocess module</title>
    <link href="http://dcreager.net/2009/08/13/subprocess-callbacks/"/>
    <updated>2009-08-13T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/08/13/subprocess-callbacks</id>
    <content type="html">&lt;p&gt;In a &lt;a href='/2009/08/06/subprocess-communicate-drawbacks/'&gt;previous post&lt;/a&gt;, we pointed out two drawbacks of Python&amp;#8217;s &lt;code&gt;subprocess.communicate&lt;/code&gt; method. In this post, we look at the first problem in more detail. To reiterate, the problem is that we collect the subprocess&amp;#8217;s output streams into strings. If the subprocess is going to generate a huge amount of output, it can be better to process the output data in a stream-oriented manner — that way we use a constant amount of memory regardless of how much output is produced.&lt;/p&gt;

&lt;p&gt;If we look at the &lt;a href='http://svn.python.org/view/python/trunk/Lib/subprocess.py?revision=74029&amp;amp;view=markup'&gt;implementation&lt;/a&gt; of the &lt;code&gt;communicate&lt;/code&gt; method, we see this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;_communicate_with_select&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='nb'&gt;input&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt;     &lt;span class='n'&gt;read_set&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt;     &lt;span class='n'&gt;write_set&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt;
&lt;span class='lineno'&gt; 4&lt;/span&gt;     &lt;span class='n'&gt;stdout&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='bp'&gt;None&lt;/span&gt; &lt;span class='c'&gt;# Return&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt;     &lt;span class='n'&gt;stderr&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='bp'&gt;None&lt;/span&gt; &lt;span class='c'&gt;# Return&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; 
&lt;span class='lineno'&gt; 7&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdin&lt;/span&gt; &lt;span class='ow'&gt;and&lt;/span&gt; &lt;span class='nb'&gt;input&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt; 8&lt;/span&gt;         &lt;span class='n'&gt;write_set&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;append&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdin&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt;         &lt;span class='n'&gt;read_set&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;append&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt;         &lt;span class='n'&gt;stdout&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt;
&lt;span class='lineno'&gt;12&lt;/span&gt;     &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;         &lt;span class='n'&gt;read_set&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;append&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;         &lt;span class='n'&gt;stderr&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt; 
&lt;span class='lineno'&gt;16&lt;/span&gt;     &lt;span class='n'&gt;input_offset&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt;     &lt;span class='k'&gt;while&lt;/span&gt; &lt;span class='n'&gt;read_set&lt;/span&gt; &lt;span class='ow'&gt;or&lt;/span&gt; &lt;span class='n'&gt;write_set&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;18&lt;/span&gt;         &lt;span class='k'&gt;try&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;19&lt;/span&gt;             &lt;span class='n'&gt;rlist&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;wlist&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;xlist&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;select&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;select&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;read_set&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;write_set&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='p'&gt;[])&lt;/span&gt;
&lt;span class='lineno'&gt;20&lt;/span&gt;         &lt;span class='k'&gt;except&lt;/span&gt; &lt;span class='n'&gt;select&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;error&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;e&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;21&lt;/span&gt;             &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;e&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;args&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='n'&gt;errno&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;EINTR&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;22&lt;/span&gt;                 &lt;span class='k'&gt;continue&lt;/span&gt;
&lt;span class='lineno'&gt;23&lt;/span&gt;             &lt;span class='k'&gt;raise&lt;/span&gt;
&lt;span class='lineno'&gt;24&lt;/span&gt; 
&lt;span class='lineno'&gt;25&lt;/span&gt;         &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdin&lt;/span&gt; &lt;span class='ow'&gt;in&lt;/span&gt; &lt;span class='n'&gt;wlist&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;26&lt;/span&gt;             &lt;span class='n'&gt;chunk&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='nb'&gt;input&lt;/span&gt;&lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;input_offset&lt;/span&gt; &lt;span class='p'&gt;:&lt;/span&gt; &lt;span class='n'&gt;input_offset&lt;/span&gt; &lt;span class='o'&gt;+&lt;/span&gt; &lt;span class='n'&gt;_PIPE_BUF&lt;/span&gt;&lt;span class='p'&gt;]&lt;/span&gt;
&lt;span class='lineno'&gt;27&lt;/span&gt;             &lt;span class='n'&gt;bytes_written&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;os&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;write&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdin&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;fileno&lt;/span&gt;&lt;span class='p'&gt;(),&lt;/span&gt; &lt;span class='n'&gt;chunk&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;28&lt;/span&gt;             &lt;span class='n'&gt;input_offset&lt;/span&gt; &lt;span class='o'&gt;+=&lt;/span&gt; &lt;span class='n'&gt;bytes_written&lt;/span&gt;
&lt;span class='lineno'&gt;29&lt;/span&gt;             &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;input_offset&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;=&lt;/span&gt; &lt;span class='nb'&gt;len&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;input&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt;30&lt;/span&gt;                 &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdin&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;close&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='lineno'&gt;31&lt;/span&gt;                 &lt;span class='n'&gt;write_set&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;remove&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdin&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;32&lt;/span&gt; 
&lt;span class='lineno'&gt;33&lt;/span&gt;         &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt; &lt;span class='ow'&gt;in&lt;/span&gt; &lt;span class='n'&gt;rlist&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;34&lt;/span&gt;             &lt;span class='n'&gt;data&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;os&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;read&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;fileno&lt;/span&gt;&lt;span class='p'&gt;(),&lt;/span&gt; &lt;span class='mi'&gt;1024&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;35&lt;/span&gt;             &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;data&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;36&lt;/span&gt;                 &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;close&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='lineno'&gt;37&lt;/span&gt;                 &lt;span class='n'&gt;read_set&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;remove&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;38&lt;/span&gt;             &lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;append&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;data&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;39&lt;/span&gt; 
&lt;span class='lineno'&gt;40&lt;/span&gt;         &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stderr&lt;/span&gt; &lt;span class='ow'&gt;in&lt;/span&gt; &lt;span class='n'&gt;rlist&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;41&lt;/span&gt;             &lt;span class='n'&gt;data&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;os&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;read&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;fileno&lt;/span&gt;&lt;span class='p'&gt;(),&lt;/span&gt; &lt;span class='mi'&gt;1024&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;42&lt;/span&gt;             &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;data&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;43&lt;/span&gt;                 &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;close&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='lineno'&gt;44&lt;/span&gt;                 &lt;span class='n'&gt;read_set&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;remove&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;45&lt;/span&gt;             &lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;append&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;data&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;46&lt;/span&gt; 
&lt;span class='lineno'&gt;47&lt;/span&gt;     &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;(There are actually several different &lt;code&gt;communicate&lt;/code&gt; implementations in the module: a Windows-specific implementation, an implementation using the POSIX &lt;code&gt;poll&lt;/code&gt; function, and one using POSIX &lt;code&gt;select&lt;/code&gt;. We&amp;#8217;re going to look at the &lt;code&gt;select&lt;/code&gt; implementation; the modifications we make can be rolled into the other methods, too.)&lt;/p&gt;

&lt;h2 id='output_callbacks'&gt;Output callbacks&lt;/h2&gt;

&lt;p&gt;For collecting stdout, the important part is lines 33-38. If the &lt;code&gt;select&lt;/code&gt; call tells us that the stdout stream is ready for reading, we try to read up to 1024 bytes from it. If we get the empty string, this means we&amp;#8217;ve reached EOF, and can close down the stream. (We also no longer have to keep passing it in to further &lt;code&gt;select&lt;/code&gt; calls, since we know we&amp;#8217;re done with this stream.) If we get a non-empty string, then we append it into a list. The function that calls &lt;code&gt;_communicate_with_select&lt;/code&gt; will eventually &lt;code&gt;join&lt;/code&gt; this list of strings together, yielding a single string.&lt;/p&gt;

&lt;p&gt;It&amp;#8217;s actually a very simple change to make this process the output using a stream-based callback. For now, we assume that we&amp;#8217;ve been given the callback, in a &lt;code&gt;stdout_callback&lt;/code&gt; variable. Then, we can change line 38 to&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;            &lt;span class='n'&gt;stdout_callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;data&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The callback can be any Python callable object; it should accept a single argument, which is the next chunk of data from stdout. We can make a similar change to line 45 to send the stderr data to its own &lt;code&gt;stderr_callback&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id='lineoriented_callbacks'&gt;Line-oriented callbacks&lt;/h2&gt;

&lt;p&gt;One possible issue with the output callbacks in the previous section is that the data is sent into the callback in arbitrary chunks. We might prefer to guarantee that the callback will be called exactly once for each &lt;em&gt;line&lt;/em&gt; of output. This would allow us, for intsance, to easily interleave the output lines of a bunch of subprocesses into the output of the parent process, without having to worry about locking.&lt;/p&gt;

&lt;p&gt;To use a line-based callback, we have to wrap it, creating a arbitrary-chunk callback that buffers data as necessary. Once it receives a full line of data, it sends it to the wrapped line callback.&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;wrap_line_callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;line_callback&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt;     &lt;span class='k'&gt;class&lt;/span&gt; &lt;span class='nc'&gt;Callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='nb'&gt;object&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt; 3&lt;/span&gt;         &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;__init__&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;line_callback&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt; 4&lt;/span&gt;             &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_buffer&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt;             &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_callback&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;line_callback&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; 
&lt;span class='lineno'&gt; 7&lt;/span&gt;         &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;output_buffer&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt; 8&lt;/span&gt;             &lt;span class='n'&gt;line&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;join&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_buffer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt;             &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;line&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt;             &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_buffer&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='p'&gt;[]&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt; 
&lt;span class='lineno'&gt;12&lt;/span&gt;         &lt;span class='k'&gt;def&lt;/span&gt; &lt;span class='nf'&gt;data_callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;data&lt;/span&gt;&lt;span class='p'&gt;):&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt;             &lt;span class='c'&gt;# If we get an empty string, that represents the end of&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;             &lt;span class='c'&gt;# the input.  If there is anything in the buffer, send&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt;             &lt;span class='c'&gt;# then out first, then send an empty string on the to line&lt;/span&gt;
&lt;span class='lineno'&gt;16&lt;/span&gt;             &lt;span class='c'&gt;# callback.&lt;/span&gt;
&lt;span class='lineno'&gt;17&lt;/span&gt; 
&lt;span class='lineno'&gt;18&lt;/span&gt;             &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;data&lt;/span&gt; &lt;span class='o'&gt;==&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;19&lt;/span&gt;                 &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='nb'&gt;len&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_buffer&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;&amp;gt;&lt;/span&gt; &lt;span class='mi'&gt;0&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;20&lt;/span&gt;                     &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;output_buffer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='lineno'&gt;21&lt;/span&gt; 
&lt;span class='lineno'&gt;22&lt;/span&gt;                 &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;23&lt;/span&gt;                 &lt;span class='k'&gt;return&lt;/span&gt;
&lt;span class='lineno'&gt;24&lt;/span&gt; 
&lt;span class='lineno'&gt;25&lt;/span&gt;             &lt;span class='c'&gt;# Otherwise, we split the new data into separate lines,&lt;/span&gt;
&lt;span class='lineno'&gt;26&lt;/span&gt;             &lt;span class='c'&gt;# each of which we call an “entry”.  We add each entry to&lt;/span&gt;
&lt;span class='lineno'&gt;27&lt;/span&gt;             &lt;span class='c'&gt;# the buffer.  If the entry ends with a newline, we output&lt;/span&gt;
&lt;span class='lineno'&gt;28&lt;/span&gt;             &lt;span class='c'&gt;# the buffer and then clear it.&lt;/span&gt;
&lt;span class='lineno'&gt;29&lt;/span&gt; 
&lt;span class='lineno'&gt;30&lt;/span&gt;             &lt;span class='n'&gt;lines&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;data&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;splitlines&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='bp'&gt;True&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;31&lt;/span&gt;             &lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='n'&gt;line&lt;/span&gt; &lt;span class='ow'&gt;in&lt;/span&gt; &lt;span class='n'&gt;lines&lt;/span&gt;&lt;span class='p'&gt;:&lt;/span&gt;
&lt;span class='lineno'&gt;32&lt;/span&gt;                 &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;line_buffer&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;append&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;line&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;
&lt;span class='lineno'&gt;33&lt;/span&gt;                 &lt;span class='k'&gt;if&lt;/span&gt; &lt;span class='n'&gt;line&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;endswith&lt;/span&gt;&lt;span class='p'&gt;((&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\r\n&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\n&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;&lt;/span&gt;&lt;span class='se'&gt;\r&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;)):&lt;/span&gt;
&lt;span class='lineno'&gt;34&lt;/span&gt;                     &lt;span class='bp'&gt;self&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;output_buffer&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
&lt;span class='lineno'&gt;35&lt;/span&gt; 
&lt;span class='lineno'&gt;36&lt;/span&gt;             &lt;span class='k'&gt;return&lt;/span&gt;
&lt;span class='lineno'&gt;37&lt;/span&gt; 
&lt;span class='lineno'&gt;38&lt;/span&gt;     &lt;span class='k'&gt;return&lt;/span&gt; &lt;span class='n'&gt;Callback&lt;/span&gt;&lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;line_callback&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;data_callback&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;line 1&lt;/strong&gt; — We start by declaring the wrapping function. It will take in a line-based callback, and return an arbitrary-chunk callback.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;strong&gt;lines 2-5&lt;/strong&gt; — We&amp;#8217;ll need to maintain some state in between invocations of the arbitrary-chunk callback — specifically, if a line of output data falls across a chunk boundary, we&amp;#8217;ll need to hold onto the part in the first chunk until we receive the part in the second chunk. One relatively easy way to do this is to create a new class, and store the state in &lt;code&gt;self&lt;/code&gt; properties. Note that the &lt;code&gt;Callback&lt;/code&gt; class is declared inside the &lt;code&gt;wrap_line_callback&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;The buffer is a list of strings. This needs to be a list to support arbitrarily long lines, since we don&amp;#8217;t know how many chunks we&amp;#8217;ll need to store before outputting the line.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;strong&gt;lines 7-10&lt;/strong&gt; — We also declare a method in the class that can output the current contents of the saved buffer (if any). Like the original &lt;code&gt;communicate&lt;/code&gt; method, we join the buffer list together into a single string, then pass it onto the wrapped line callback.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;strong&gt;lines 12-23&lt;/strong&gt; — Next we can define the arbitrary-chunk callback. First, it checks to see if we&amp;#8217;ve received an EOF indicator (the empty string). If so, we first output whatever is currently in the buffer; then we pass on the EOF to the line callback.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;strong&gt;lines 25-34&lt;/strong&gt; — If we get some actual data, we first split the current chunk into separate lines. We pass in a &lt;code&gt;True&lt;/code&gt; parameter to the &lt;code&gt;splitlines&lt;/code&gt; method so that we get the newlines in the split strings. This will let us tell if the final string in the list represents a complete line, or if it&amp;#8217;s the first part of a line that falls across a chunk boundary.&lt;/p&gt;

&lt;p&gt;We then loop through the split strings, adding them to the buffer. If any of them end with one of the newline endings (Unix, Windows, or otherwise), we output the buffer. Note that only the last string in the &lt;code&gt;lines&lt;/code&gt; list can end with something other than a newline; anything at the beginning of the list is only a separate element because &lt;code&gt;splitlines&lt;/code&gt; found a newline to split on!&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;strong&gt;line 38&lt;/strong&gt; — Finally, we instantiate the new class and return its arbitrary-chunk callback.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='more_to_come'&gt;More to come&lt;/h2&gt;

&lt;p&gt;This gives us a nice output callback mechanism, to prevent us from buffering the entire contents of the stdout and stderr streams in memory. In later posts, we&amp;#8217;ll look at doing the same with the &lt;em&gt;input&lt;/em&gt; data, and then we&amp;#8217;ll address the second problem, of dealing with more than one subprocess simultaneously.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>iPhone tethering</title>
    <link href="http://dcreager.net/2009/08/13/iphone-tethering/"/>
    <updated>2009-08-13T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/08/13/iphone-tethering</id>
    <content type="html">&lt;p&gt;While attending the &lt;a href='http://mil-oss.org/'&gt;Mil-OSS conference&lt;/a&gt; this week, I had the opportunity to use one of the coolest features of my new iPhone 3GS — Bluetooth Internet tethering. Assuming that your mobile carrier allows the feature on their network, it provides a very easy way to have a persistent Internet connection, for those situations where a free Ethernet drop or WiFi access point aren&amp;#8217;t readily available.&lt;/p&gt;

&lt;p&gt;I happened to have my Dell Mini 9 (running Ubuntu Jaunty) with me for the conference, rather than my MacBook, so I thought that it might be difficult to get the Bluetooth connection working between the phone and laptop. This &lt;a href='http://xn--9bi.net/2009/06/17/tethering-iphone-3-0-to-ubuntu-9-04/'&gt;blog posting&lt;/a&gt;, however, provided exactly what I needed.&lt;/p&gt;

&lt;p&gt;Following this process, the connection worked without a hitch. One caveat is that I didn&amp;#8217;t need to explicitly pair my phone with my laptop; the first time I ran the &lt;code&gt;pand --connect&lt;/code&gt; command, my phone prompted me with the pairing confirmation dialog. Later connections didn&amp;#8217;t require re-pairing.&lt;/p&gt;

&lt;p&gt;As for bandwidth, the connection was perfectly reasonable for email and basic Web surfing as long as I had decent 3G coverage — 3 of 5 bars or higher and I was good to go. I also did an &lt;code&gt;apt-get&lt;/code&gt; package update as a “beefier” test; I was usually seeing around 20-30 Kb/sec of download speed, which would be fine for small daily updates, but would probably be unworkable for something large like a GNOME, texlive, or GHC update. All in all, not bad for some surreptitious email checking during the talks.&lt;/p&gt;

&lt;h2 id='commandline_scripts'&gt;Command-line scripts&lt;/h2&gt;

&lt;p&gt;One thing that can be cumbersome about the instructions on the blog post is that you have to run the &lt;code&gt;pand&lt;/code&gt; and &lt;code&gt;ifup&lt;/code&gt;/&lt;code&gt;ifdown&lt;/code&gt; commands separately each time you want to start or stop the Bluetooth connection. Not a huge waste of effort, to be sure, but we can do better. So I wrote a quick little Bash script that will start and stop the connection with a single command. You can find the script in &lt;a href='http://github.com/dcreager/home/commit/f5c0db363049f7433494924c63d4a2a19e325b6c'&gt;this commit&lt;/a&gt; on my Github page.&lt;/p&gt;

&lt;p&gt;The script is fairly straightforward; you start your Bluetooth tether by running&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tether up&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and you stop it by running&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;tether down&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Instead of hard-coding your phone&amp;#8217;s Bluetooth MAC address into the script itself, you place it into the &lt;em&gt;$HOME/etc/bluetooth.conf&lt;/em&gt; file. This file isn&amp;#8217;t checked into Git, so that I&amp;#8217;m not putting my personal MAC addresses into the public repository. Instead, the commit contains a &lt;em&gt;$HOME/etc/bluetooth.conf.sample&lt;/em&gt; file, which you copy over to &lt;em&gt;bluetooth.conf&lt;/em&gt;, and then edit appropriately.&lt;/p&gt;

&lt;h2 id='issues'&gt;Issues&lt;/h2&gt;

&lt;p&gt;While this script worked great for during the conference, there are two main issues with it as it stands.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;dbus integration&lt;/strong&gt; — Many applications now listen to the system&amp;#8217;s dbus message bus to determine different facts about the current state of the system, including whether there&amp;#8217;s an active Internet connection. The NetworkManager application knows to publish the correct dbus messages when it starts and stops a network connection. The &lt;code&gt;tether&lt;/code&gt; script does not. This means that each time I open up Firefox after turning on the tether, I have to manually uncheck the &lt;em&gt;File › Work Offline&lt;/em&gt; menu option to be able to access any web pages.&lt;/p&gt;

&lt;p&gt;Fixing this issue should only require finding the appropriate dbus messages to send, and adding them to the &lt;code&gt;up&lt;/code&gt; and &lt;code&gt;down&lt;/code&gt; cases in the script.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GUI access&lt;/strong&gt; — While I don&amp;#8217;t mind running a simple command to activate and deactivate the network connection, I realize that a GUI control within the NetworkManager applet would be more ideal. (This would also eliminate the first issue, since NetworkManager would then send the appropriate dbus messages when the connection is started or stopped.) Luckily, Dan Williams has recently &lt;a href='http://blogs.gnome.org/dcbw/2009/07/10/unwire-with-networkmanager/'&gt;added Bluetooth PAN support&lt;/a&gt; to &lt;code&gt;nm-applet&lt;/code&gt;. The new code is in the bleeding edge tree, so hopefully it will make it into a release in time to be picked up for October&amp;#8217;s Karmic release.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</content>
  </entry>
  
  <entry>
    <title>Adding Disqus comments</title>
    <link href="http://dcreager.net/2009/08/07/disqus-comments/"/>
    <updated>2009-08-07T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/08/07/disqus-comments</id>
    <content type="html">&lt;p&gt;I&amp;#8217;ve just enabled comments on the posts on my website. On its own, that&amp;#8217;s not a particularly unique or exciting feature. However, I&amp;#8217;m using &lt;a href='http://disqus.com'&gt;Disqus&lt;/a&gt; as the comment engine, and the way in which I&amp;#8217;ve integrated Disqus into my Jekyll-powered website might be of interest to others. (Thanks to &lt;a href='http://metajack.im/'&gt;Jack Moffitt&lt;/a&gt; for the idea!)&lt;/p&gt;

&lt;h2 id='the_generic_code_installation_target'&gt;The “generic code” installation target&lt;/h2&gt;

&lt;p&gt;At the core of Disqus is a snippet of HTML and JavaScript that&amp;#8217;s embedded into each of the webpages that you want to contain a comments section. If you&amp;#8217;re using one of the standard blog engines for your website, Disqus can automatically “install” itself, adding the Disqus snippet to your pages for you. If you&amp;#8217;re not using one of these blog engines, however, you have to install the Disqus snippet yourself.&lt;/p&gt;

&lt;p&gt;Luckily, this is very easy. If you choose the “generic code” installation target when setting up the Disqus account for your website, you see the snippet of code to include. For &lt;a href='http://dcreager.net/'&gt;dcreager.net&lt;/a&gt;, it looks like this:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='html'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;div&lt;/span&gt; &lt;span class='na'&gt;id=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;disqus_thread&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 2&lt;/span&gt; 
&lt;span class='lineno'&gt; 3&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;script&lt;/span&gt;
&lt;span class='lineno'&gt; 4&lt;/span&gt; &lt;span class='nt'&gt;   &lt;/span&gt;&lt;span class='na'&gt;type=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;text/javascript&amp;quot;&lt;/span&gt;
&lt;span class='lineno'&gt; 5&lt;/span&gt;    &lt;span class='na'&gt;src=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;http://disqus.com/forums/«account»/embed.js&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;noscript&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 8&lt;/span&gt;   &lt;span class='nt'&gt;&amp;lt;a&lt;/span&gt; &lt;span class='na'&gt;href=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;http://«account».disqus.com/?url=ref&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;View the discussion thread.&lt;span class='nt'&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;/noscript&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt; 
&lt;span class='lineno'&gt;11&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;a&lt;/span&gt; &lt;span class='na'&gt;href=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;http://disqus.com&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;class=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;dsq-brlink&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;12&lt;/span&gt;   blog comments powered by &lt;span class='nt'&gt;&amp;lt;span&lt;/span&gt; &lt;span class='na'&gt;class=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;logo-disqus&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;Disqus&lt;span class='nt'&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;13&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;&lt;div class='warning'&gt;
  &lt;p&gt;
    &lt;strong&gt;UPDATE&lt;/strong&gt;: Please do not copy-paste this JavaScript
    snippet directly into your own website!  There are several
    occurrences of the string “&lt;code&gt;«account»&lt;/code&gt;” which should be
    replaced with the name of your own Disqus account.  If you don't
    do this, your comment threads won't be associated with your
    account, and you therefore won't be able to moderate or export
    those comments.
  &lt;/p&gt;

  &lt;p&gt;
    To see the specific JavaScript snippet for your own site, please
    go to &lt;a href='http://www.disqus.com/comments/install/'&gt;http://www.disqus.com/comments/install/&lt;/a&gt;.
  &lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I have to wrap this in another &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; element, to fit into the CSS layout that I&amp;#8217;m using, but otherwise it&amp;#8217;s a straightforward copy/paste.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;disqus_thread&lt;/code&gt; element is a placeholder. The &lt;em&gt;embed.js&lt;/em&gt; JavaScript retrieves the comments that are linked to the current page, formats them according to the style that you&amp;#8217;ve chosen, and injects the resulting HTML into the &lt;code&gt;disqus_thread&lt;/code&gt; element. The end result is something like the comment section that you see at the end of this page.&lt;/p&gt;

&lt;h2 id='jekyll_layouts'&gt;Jekyll layouts&lt;/h2&gt;

&lt;p&gt;At this point, we have the Disqus snippet that we need to include into each comment-enabled page on the site. The naïve solution would be to manually add this snippet to each of the pages that we want to contain comments. But of course, as good software engineers, we want to avoid that kind of repetition — and Jekyll layouts give us good way to do that.&lt;/p&gt;

&lt;p&gt;For my site, I&amp;#8217;ve decided that I want to include a comment section on each dated “post” — the articles that you see under the “Recent Posts” on the front page. I don&amp;#8217;t want to include comments, for instance, on my &lt;a href='/publications/'&gt;publication list&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Luckily, I already have a system of Jekyll layouts that implements this distinction. For instance, dated posts have a “Last updated” entry at the bottom, which the front page and publications list don&amp;#8217;t contain. The site currently has two layouts defined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href='http://github.com/dcreager/dcreager.net/blob/master/_layouts/default.html'&gt;&lt;em&gt;default.html&lt;/em&gt;&lt;/a&gt; — defines the overall layout of the site: the background logo, the navigation bar, the box containing the text of each post, etc. All of the pages on the site use this layout, even if they don&amp;#8217;t reference it directly in their YAML front-matter.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&lt;a href='http://github.com/dcreager/dcreager.net/blob/master/_layouts/post.html'&gt;&lt;em&gt;post.html&lt;/em&gt;&lt;/a&gt; — adds the “last updated” entry at the bottom of a dated post.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, if I want to include Disqus comments on all of my dated posts, I can just add the HTML/JavaScript snippet to the &lt;em&gt;post.html&lt;/em&gt; file.&lt;/p&gt;

&lt;h2 id='using_template_variables'&gt;Using template variables&lt;/h2&gt;

&lt;p&gt;This solution is nice, in that we don&amp;#8217;t have to duplicate the Disqus snippet on each of the post pages, but we can take this one step further. What if I decide that I want to include comments on one of the pages that isn&amp;#8217;t a dated blog post? Now I have to duplicate that snippet again — maybe not in the page&amp;#8217;s source, but at least in the layout that that page uses.&lt;/p&gt;

&lt;p&gt;Instead, I&amp;#8217;m going to use a new variable in the YAML front-matter to give me fine-grained control over which pages have comments. If I want a page to have comments, I just make sure to include&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;comments: true&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;in that page&amp;#8217;s YAML header.&lt;/p&gt;

&lt;p&gt;Then, instead of including the Disqus snippet directly in the &lt;em&gt;post.html&lt;/em&gt; layout, I put it into the &lt;em&gt;default.html&lt;/em&gt; layout that every page eventually uses. I wrap the snippet in a Liquid &lt;code&gt;if&lt;/code&gt; statement to only include the Disqus comment section if the &lt;code&gt;comments&lt;/code&gt; YAML variable is &lt;code&gt;true&lt;/code&gt;:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='html'&gt;&lt;span class='lineno'&gt; 1&lt;/span&gt; {% if page.comments %}
&lt;span class='lineno'&gt; 2&lt;/span&gt; 
&lt;span class='lineno'&gt; 3&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;div&lt;/span&gt; &lt;span class='na'&gt;id=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;disqus_thread&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 4&lt;/span&gt; 
&lt;span class='lineno'&gt; 5&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;script&lt;/span&gt;
&lt;span class='lineno'&gt; 6&lt;/span&gt; &lt;span class='nt'&gt;   &lt;/span&gt;&lt;span class='na'&gt;type=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;text/javascript&amp;quot;&lt;/span&gt;
&lt;span class='lineno'&gt; 7&lt;/span&gt;    &lt;span class='na'&gt;src=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;http://disqus.com/forums/«account»/embed.js&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 8&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt; 9&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;noscript&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;10&lt;/span&gt;   &lt;span class='nt'&gt;&amp;lt;a&lt;/span&gt; &lt;span class='na'&gt;href=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;http://«account».disqus.com/?url=ref&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;View the discussion thread.&lt;span class='nt'&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;11&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;/noscript&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;12&lt;/span&gt; 
&lt;span class='lineno'&gt;13&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;a&lt;/span&gt; &lt;span class='na'&gt;href=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;http://disqus.com&amp;quot;&lt;/span&gt; &lt;span class='na'&gt;class=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;dsq-brlink&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;14&lt;/span&gt;   blog comments powered by &lt;span class='nt'&gt;&amp;lt;span&lt;/span&gt; &lt;span class='na'&gt;class=&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;logo-disqus&amp;quot;&lt;/span&gt;&lt;span class='nt'&gt;&amp;gt;&lt;/span&gt;Disqus&lt;span class='nt'&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;15&lt;/span&gt; &lt;span class='nt'&gt;&amp;lt;/a&amp;gt;&lt;/span&gt;
&lt;span class='lineno'&gt;16&lt;/span&gt; 
&lt;span class='lineno'&gt;17&lt;/span&gt; {% endif %}
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;If the &lt;code&gt;comments&lt;/code&gt; YAML variable isn&amp;#8217;t defined for the current page, the &lt;code&gt;if&lt;/code&gt; statement treats it as false, giving us the behavior that we want — no comments section unless we ask for it.&lt;/p&gt;

&lt;p&gt;Finally, to ensure that all dated posts get a comments section, without me having to explicitly ask for it, I add&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;comments: true&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;to the YAML front-matter of the &lt;em&gt;post.html&lt;/em&gt; layout.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Problems with Python's subprocess.communicate method</title>
    <link href="http://dcreager.net/2009/08/06/subprocess-communicate-drawbacks/"/>
    <updated>2009-08-06T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/08/06/subprocess-communicate-drawbacks</id>
    <content type="html">&lt;p&gt;The &lt;a href='http://docs.python.org/library/subprocess.html'&gt;&lt;code&gt;subprocess&lt;/code&gt;&lt;/a&gt; module, which was introduced in Python 2.4, provides you with a convenient interface for spawning &lt;em&gt;subprocesses&lt;/em&gt;, and for interacting with these subprocesses in your parent process. The module was introduced in &lt;a href='http://www.python.org/dev/peps/pep-0324/'&gt;PEP 324&lt;/a&gt;, and is a replacement for the proliferation of other functions and modules that were used previously for spawning and interacting with processes. The &lt;code&gt;subprocess&lt;/code&gt; module aims to provide a more consistent interface, regardless of the particulars of how you need to interact with the subprocesses.&lt;/p&gt;

&lt;h2 id='overview_of_the__module'&gt;Overview of the &lt;code&gt;subprocess&lt;/code&gt; module&lt;/h2&gt;

&lt;p&gt;Subprocesses are encapsulated in a &lt;code&gt;Popen&lt;/code&gt; object. You interact with a subprocess via its stdin, stdout, and stderr streams. When you create a new &lt;code&gt;Popen&lt;/code&gt; object, you can give a value of &lt;code&gt;PIPE&lt;/code&gt; for the &lt;code&gt;stdin&lt;/code&gt;, &lt;code&gt;stdout&lt;/code&gt;, and &lt;code&gt;stderr&lt;/code&gt; keyword parameters. If you do, then the &lt;code&gt;Popen&lt;/code&gt; object that you get back will have &lt;code&gt;stdin&lt;/code&gt;, &lt;code&gt;stdout&lt;/code&gt;, and/or &lt;code&gt;stderr&lt;/code&gt; attributes. Each of these is a file-like object, giving you access to the corresponding stream of the subprocess.&lt;/p&gt;

&lt;p&gt;Now, you have to be careful how you use these pipe objects, since it&amp;#8217;s easy to fall into a situation where you have deadlock. For instance, your parent process might be trying to write some data into the &lt;code&gt;stdin&lt;/code&gt; pipe, to send some information into the subprocess. The subprocess, on the other hand, is trying to write some data into the &lt;code&gt;stdout&lt;/code&gt; pipe, to send some information back out to the parent process. If the &lt;code&gt;stdout&lt;/code&gt; pipe&amp;#8217;s buffer is full, then the subprocess will block trying write into the pipe; it won&amp;#8217;t be able to proceed until the parent process has read some data from the &lt;code&gt;stdout&lt;/code&gt; pipe, clearing room in the buffer for the new data. However, the parent process is currently trying to write into the &lt;code&gt;stdin&lt;/code&gt; pipe. If this write is also blocked, then we have deadlock — neither process can proceed.&lt;/p&gt;

&lt;h2 id='the__method'&gt;The &lt;code&gt;communicate&lt;/code&gt; method&lt;/h2&gt;

&lt;p&gt;The usual solution in these cases is to use the &lt;code&gt;Popen&lt;/code&gt; object&amp;#8217;s &lt;code&gt;communicate&lt;/code&gt; method. This method takes in an optional string to send to the subprocess on stdin. It then collects all of the stdout and stderr output from the subprocess, and returns these. The &lt;code&gt;communicate&lt;/code&gt; method takes responsibility for avoiding deadlock; it only sends the next chunk of the stdin string when the subprocess is ready to read it, and it only tries to read the next chuck of stdout or stderr when the subprocess is ready to provide it.&lt;/p&gt;

&lt;p&gt;Under the covers, the &lt;code&gt;communicate&lt;/code&gt; method uses a &lt;code&gt;select&lt;/code&gt; loop to perform this choreography with the subprocess. (At least for the Unix implementation of the &lt;code&gt;subprocess&lt;/code&gt; module, that is.) This solution is nice because it doesn&amp;#8217;t require introducing threading into the parent process. During each iteration of the loop, it calls the OS&amp;#8217;s &lt;code&gt;select&lt;/code&gt; system call, giving it the file descriptors of the stdin, stdout, and stderr pipes. The &lt;code&gt;select&lt;/code&gt; call tells us which of these file descriptors can perform an I/O operation without blocking. If none of them can immediately, it will block until one of them can. Once the &lt;code&gt;select&lt;/code&gt; call returns, we read from or write to the pipes that are ready. We repeat this process until we see EOF on both stdout and stderr; this indicates that the subprocess has finished — or at least, that it&amp;#8217;s through communicating with us.&lt;/p&gt;

&lt;h2 id='drawbacks'&gt;Drawbacks&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;communicate&lt;/code&gt; method provides a nice, simple interface for interacting with a subprocess, without having to worry about deadlock situations. Unfortunately, it has two main drawbacks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The subprocess&amp;#8217;s stdout and stderr are collected into strings.&lt;/li&gt;

&lt;li&gt;You can only interact with one subprocess at a time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(If neither of these is an issue for you, then the rest of this post is less interesting to you — &lt;code&gt;communicate&lt;/code&gt; does exactly what you want!)&lt;/p&gt;

&lt;p&gt;The first item is a problem if your subprocess creates a lot of output — the worry is the output will be too large to fit into a Python string. If it is, then the parent process will (at best) start to thrash as it eats into virtual memory.&lt;/p&gt;

&lt;p&gt;The second item is a problem if you have to spawn multiple subprocesses, and interact with them simultaneously. You could argue that there&amp;#8217;s no need to fix this problem if you haven&amp;#8217;t fixed the first: since the &lt;code&gt;communicate&lt;/code&gt; method is just going to collect the stdout and stderr into strings, then you could just loop through each of your subprocesses, calling &lt;code&gt;communicate&lt;/code&gt; on each in turn:&lt;/p&gt;
&lt;div class='highlight'&gt;&lt;pre&gt;&lt;code class='python'&gt;&lt;span class='kn'&gt;import&lt;/span&gt; &lt;span class='nn'&gt;subprocess&lt;/span&gt;

&lt;span class='n'&gt;sp1&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;subprocess&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Popen&lt;/span&gt;&lt;span class='p'&gt;([&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;ls&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;-l&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt;
                       &lt;span class='n'&gt;stdin&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;subprocess&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;PIPE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
                       &lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;subprocess&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;PIPE&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='n'&gt;sp2&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;subprocess&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;Popen&lt;/span&gt;&lt;span class='p'&gt;([&lt;/span&gt;&lt;span class='s'&gt;&amp;quot;ls&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='s'&gt;&amp;quot;-al&amp;quot;&lt;/span&gt;&lt;span class='p'&gt;],&lt;/span&gt;
                       &lt;span class='n'&gt;stdin&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;subprocess&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;PIPE&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt;
                       &lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='o'&gt;=&lt;/span&gt;&lt;span class='n'&gt;subprocess&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;PIPE&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt;

&lt;span class='k'&gt;for&lt;/span&gt; &lt;span class='n'&gt;sp&lt;/span&gt; &lt;span class='ow'&gt;in&lt;/span&gt; &lt;span class='p'&gt;[&lt;/span&gt;&lt;span class='n'&gt;sp1&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;sp2&lt;/span&gt;&lt;span class='p'&gt;]:&lt;/span&gt;
    &lt;span class='p'&gt;(&lt;/span&gt;&lt;span class='n'&gt;stdout&lt;/span&gt;&lt;span class='p'&gt;,&lt;/span&gt; &lt;span class='n'&gt;stderr&lt;/span&gt;&lt;span class='p'&gt;)&lt;/span&gt; &lt;span class='o'&gt;=&lt;/span&gt; &lt;span class='n'&gt;sp&lt;/span&gt;&lt;span class='o'&gt;.&lt;/span&gt;&lt;span class='n'&gt;communicate&lt;/span&gt;&lt;span class='p'&gt;()&lt;/span&gt;
    &lt;span class='k'&gt;print&lt;/span&gt; &lt;span class='n'&gt;stdout&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;p&gt;The end result would be what you want — all of the stdout and stderr strings for all of your subprocesses.&lt;/p&gt;

&lt;p&gt;However, doing so can make your subprocesses take longer to run, since you won&amp;#8217;t be able to exploit parallelism as much. Since you&amp;#8217;re firing off these subprocesses at the same time, you supposedly want them to execute simultaneously, allowing the OS to schedule them appropriate so that they finish as quickly as possible. However, you&amp;#8217;ve introduced a serialization into this logic, since your parent process is only able to interact with one subprocess at a time. For instance, subprocess #2 might be waiting for some input, while the parent process is still snarfing up the output from subprocess #1. In this case, subprocess #2 is &lt;strong&gt;&lt;em&gt;not going to be able to start executing&lt;/em&gt;&lt;/strong&gt; until subprocess #1 has &lt;strong&gt;&lt;em&gt;completely finished&lt;/em&gt;&lt;/strong&gt;. So your &lt;code&gt;communicate&lt;/code&gt; loop has completely eliminated the benefit of starting the subprocesses simultaneously.&lt;/p&gt;

&lt;p&gt;In later posts, I will outline how to solve these two problems.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>SBMF07 paper chosen for extended proceedings</title>
    <link href="http://dcreager.net/2009/08/06/sbmf-paper/"/>
    <updated>2009-08-06T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/08/06/sbmf-paper</id>
    <content type="html">&lt;p&gt;My SBMF07 &lt;a href='/publications/014-csp-algorithm-study/'&gt;paper&lt;/a&gt;, “Empirical analysis and optimization of an NP-hard algorithm using CSP and FDR,” was chosen to appear in the conference&amp;#8217;s extended proceedings. The extended proceedings will be published in a forthcoming issue of ENTCS. Huzzah!&lt;/p&gt;

&lt;p&gt;The paper originally was extracted from Chapter 8 of my &lt;a href='/publications/012-dphil-thesis/'&gt;D.Phil thesis&lt;/a&gt;, and I had to cut out a bit of detail in order to make the space requirements for the conference paper. Luckily, the extended proceedings allow for a longer paper, so I was able to add most of the cut parts back in. So the version that will appear in ENTCS is, for the most part, identical to the corresponding chapter from my thesis.&lt;/p&gt;</content>
  </entry>
  
  <entry>
    <title>Site layout</title>
    <link href="http://dcreager.net/2009/08/05/site-setup/"/>
    <updated>2009-08-05T00:00:00-04:00</updated>
    <id>http://dcreager.net/2009/08/05/site-setup</id>
    <content type="html">&lt;p&gt;This post will probably end up being more useful to me than to anyone else who might stumble across the page. Here I&amp;#8217;m going to document how I&amp;#8217;ve set up my homepage, from a technical standpoint.&lt;/p&gt;

&lt;h2 id='directory_layout'&gt;Directory layout&lt;/h2&gt;

&lt;p&gt;The content of the website is stored in a Git repository (found &lt;a href='http://github.com/dcreager/dcreager.net/'&gt;here&lt;/a&gt;). Most of the pages are originally written in Markdown. I use &lt;a href='http://github.com/mojombo/jekyll/'&gt;Jekyll&lt;/a&gt; to process the Markdown pages into a static website.&lt;/p&gt;

&lt;p&gt;The Git repository contains a standard Jekyll layout:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Dated “posts” (such as blog entries) are placed in the &lt;em&gt;_posts&lt;/em&gt; directory.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;HTML layouts are placed in the &lt;em&gt;_layouts&lt;/em&gt; directory.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;All other content (CSS, images, other pages) lives wherever I please; that directory structure is reproduced on the live site.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One difference is that I include the &lt;em&gt;_site&lt;/em&gt; directory in the Git repository; most people seem to include this directory in their &lt;em&gt;.gitignore&lt;/em&gt; file so that it&amp;#8217;s not tracked by Git. Doing so allows me to check out the repository and have a working copy of the site, without having to have Jekyll and its dependencies installed on that machine.&lt;/p&gt;

&lt;h2 id='editing_and_deploying_changes'&gt;Editing and deploying changes&lt;/h2&gt;

&lt;p&gt;While I edit my pages, I keep a&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;jekyll --server --auto&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;instance running in the background, which allows me to view a local copy of the new website as I save changes.&lt;/p&gt;

&lt;p&gt;For deployment, I have a (non-bare) clone of the Git repository on the Dreamhost machine that hosts my website. Once I have a change that I&amp;#8217;m ready to deploy, I make a new Git commit and push it to the Dreamhost clone. Since I include the &lt;em&gt;_site&lt;/em&gt; directory in my commits, this places the latest copy of the website onto the Dreamhost filesystem, ready to go.&lt;/p&gt;

&lt;p&gt;Pushing doesn&amp;#8217;t automatically update the checked-out HEAD on the remote system, however, so there&amp;#8217;s an additional step needed. Once I&amp;#8217;ve pushed the changes to Dreamhost, I run the following from the Dreamhost clone:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;git reset --hard master&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;which updates the working copy on disk to be the same as the latest commit that I just pushed. At this point, the Dreamhost clone contains the latest copy of the site in its &lt;em&gt;_site&lt;/em&gt; directory.&lt;/p&gt;

&lt;p&gt;Dreamhost is expecting to serve my website out of a particular directory within my home directory; the final step is having this served directory be a symlink to the &lt;em&gt;_site&lt;/em&gt; directory of the Dreamhost clone. Et voila!&lt;/p&gt;</content>
  </entry>
  

</feed>

