• 2014-02-28

    CSP: The basics

    tl;dr CSP is a formal method that lets you describe and reason about the behavior of concurrent systems. CSP is composable; you write simple processes, and then use special operators to combine them together into larger, more complex processes. A process is a summary of some system; it uses events to describe how that system works, and to synchronously communicate with other processes. You can compare two processes using a refinement check; this lets us check, for instance, whether a real-world system satisfies some important safety or liveness property. CSP has good tool support, which lets us perform these refinement checks quickly and automatically.

    Well that was easy, wasn’t it? You can boil just about anything down to a single paragraph. Let’s look at each of those key points in more detail.

  • 2014-01-07

    CSP: An introduction

    Communicating Sequential Processes (CSP) has been around for almost four decades at this point, but for much of its life, it was only well-known among theoretical computer scientists and formal methods advocates. More recently, many more people have at least heard of CSP, largely because it inspired the concurrency support in Go, a popular mainstream programming language. However, if you ask most people what it means to be inspired by CSP, the most common response would probably be “erm, something about message passing”?

    That said, CSP isn’t just some dusty theory that inspired part of Go; it can also help us understand the distributed systems that we create. We’ve developed a plethora of tools that help us build distributed systems. But unfortunately, we don’t always understand of how those tools work, how they fail, and how they interact when we piece them together into a larger system. We can all name-drop the CAP theorem, but do you really know what your system is going to do when the network partitions, or when a host dies? How do you convince someone that you’re right?

    We can’t just rely on intuition and hand-wavy arguments; our systems are too large, and too important, for that. So how do you address these concerns with rigor? There are two main approaches: you can either test your assumptions empirically on a running system, or you can describe your system in detail and prove that your assumptions are correct. Kyle Kingsbury has great examples of both: Jepsen on the testing side, Knossos on the proof side. Both approaches are important; if you want to be convincing, you have to choose at least one of them. If you prefer the proof-based approach, CSP is another option. If you only think of CSP in terms of Go’s concurrency primitives, or if you haven’t thought of it at all, then you overlook the fact that CSP was specifically designed to help answer these kinds of questions.

    In this series of articles, I want to describe how CSP fits into this landscape, for developers with a range of expertise. For the every-day programmer, I want to give a basic, high-level introduction to CSP, and to explain what it means for Go to be inspired by CSP. For the distributed systems engineer, I want to add weight to the argument that formal methods are a useful tool for studying and designing the systems that we create and use. And for the formal methodist, I want to show how to use CSP in particular to specify and reason about those systems.

  • 2010-05-13

    Installing Ubuntu Lucid on a PowerPC QEMU virtual machine

    Part of the software I help develop at RedJack needs to be tested on both little-endian and big-endian machines. Little-endian machines are easy, since everyone and their mother is running on a little-endian Intel or AMD x86 chip. It used to be that big-endian was pretty easy to test, too — just break out your trusty Apple Powerbook G4 and you’re good to go. Since Apple has shifted over to Intel chips, though, the situation has changed.

    Luckily, QEMU has PowerPC as one of the targets that it can emulate, so in theory, I can still easily test my code on a big-endian machine by creating a QEMU PowerPC virtual machine. There’s already a writeup about trying to install Debian onto a QEMU VM here. Aurélien Jarno has graciously put together downloadable disk images with Debian preinstalled. If that’s good enough for your purposes, just go download those! You won’t need any of the rest of the information on this page.

    Unfortunately, I didn’t want to run stock Debian; my little-endian build machine is running Ubuntu Lucid, and for consistency, I wanted my big-endian VM to be running the same. As it turns out, this also required a fair dose of masochism on my part. There are several issues that you’ll encounter if you try to do this by hand. Here is my cheat sheet for getting around these issues.

    Note that this isn’t a full step-by-step account of how to install Lucid onto a QEMU VM. For now, I’m just trying to get my notes down into a more permanent form.

  • 2010-02-25

    Parser callbacks in libpush, Part 1 — Streams

    This post is the first in a series that describes the push_callback_t type in the libpush library. In these posts, we’ll walk through a couple of possible ways to implement callbacks under the covers. At each stage, we’ll encounter problems with the current design. Fixing these problems should lead closer us to the actual implementation in libpush, and along the way, we’ll gain a good understanding of how our design decisions affect the performance and usability of the library.

    The push_callback_t type is used to define parser callbacks, which are the basic unit of parsing in libpush. Callbacks are pretty simple: they take in an input value, read some data from the input stream, and produce an output value. (The fact that callbacks take in an input value, in addition to reading from the input stream, is what makes them arrows instead of monads — but that’s a story for a later post).

  • 2010-02-17

    Using LLVM's link-time optimization on Ubuntu Karmic

    While playing around with libpush on my MacBook, I was pleasantly surprised to see a huge performance increase when I used the link-time optimization (LTO) feature of the LLVM GCC front end. (It’s really quite nifty; the new Homebrew package manager uses it by default when compiling packages.) On MacOS, using LTO is as simple as using llvm-gcc as your C compiler (or llvm-g++ if you’re compiling C++), and passing in -O4 as your optimization flag. I use SCons as my builder, so this turns into:

    $ scons CC=llvm-gcc CCFLAGS=-O4
    

    This will cause GCC to output LLVM bytecode into the .o output files, and to perform whole-program optimizations during each linking phase. I was able to see a big performance win simply from the linker being able to inline in copies of small functions that live in “other” compilation units.

  • 2010-02-10

    Extracting setuptools version numbers from your git repository

    Just like everyone else, we’re using setuptools as the core of the build system for our Python-based projects. For the most part, this has been a painless, straightforward process. However, one lingering annoyance is that we’ve been specifying the version number directly in our setup.py files:

    from setuptools import setup
    
    setup(
        name = awesomelib,
        version = 1.2,
        # …etc
    )
    

    On our maintenance branches, we get a nice awesomelib-1.2.tar.gz file when we run python setup.py sdist. On our development branch, we’ve also got the following setup.cfg file:

    [egg_info]
    tag_build = dev
    tag_date = true
    

    That gives us tarballs like awesomelib-1.2dev-20100210.tar.gz on our development branch. Because we’re using the dev suffix, which setuptools considers to be a “prerelease”, we have to remember to increment the version number in development whenever we cut a new release. The end result is that we have a longish process for creating releases. If we want to create a new 1.3 release, we have to do the following:

    1. Create a new maintenance branch for 1.3:

      $ git checkout -b maint-1.3 master
      
    2. Update the setup.cfg file to remove the tag_build and tag_date entries. Commit this with a “Tagging version 1.3” commit message.

    3. Back on the development branch, update setup.py to increment the “development version” to 1.4.

    Granted, this isn’t horribly difficult, but we can do better.

  • 2010-02-06

    A combinator-based parsing library for C

    Recently I’ve been working on libpush, which a new parsing library for C. It has two main features that I think will be valuable: it’s a push parser, which means that instead of parsing a file, stream, or single memory buffer, you supply the data (or “push” it) to the parser in chunks, as it becomes available. I plan to discuss this aspect of the parser in more detail in a later post.

    The other main feature is that you design your parsers using combinators. Parser combinators are widely used in Haskell, with Parsec being the most common example. Combinator-based parsing libraries are especially nice in Haskell, because Haskell’s syntax makes them look very simple. For instance, a parser that parses matching nested parentheses is:

    parens :: Parser ()
    parens = (char ‘(’ >> parens >> char ‘)’ >> parens) <|> return ()
    

    Here, the <|> operator represents choice: we try parsing the left operand, and if it fails, then we try the right operand. In our example, the right operand is the base case, which matches the empty string. The left operand parses an opening parenthesis; then recursively calls itself to match any parentheses that might be nested in the current set; then parses the closing parenthesis; and then finally tries to match a nested set that occurs after the current set.

  • 2010-02-05

    Updating graffle-export to work with OmniGraffle 5

    I recently upgraded to OmniGraffle 5, which caused my graffle-export script to break:

    $ graffle.sh ~/git/cwa/figures/analyst.graffle foo.pdf 
    OmniGraffle Professional 5
    /Users/dcreager/git/cwa/figures/analyst.graffle
    ./graffle.scpt: execution error: OmniGraffle Professional 5 got an error: The document cannot be exported to the “pdf” format. (-50)
    

    (This was first reported to me by Nima Talebi as a ticket on graffle-export’s Github page.)

    Before we can understand what error we’re seeing, a little explanation is in order. The core logic of the OmniGraffle exporter is an AppleScript. Unfortunately, AppleScripts are stored in a binary format, so if you go to the Github page, you can’t easily view the contents of the file. The important line of the script is:

  • 2010-01-08

    Default “scons -c” targets

    As I mentioned in a previous post, the automatic “clean” target provided by SCons (scons -c) is very useful for cleaning out build files, without requiring much in the way of configuration. Anything that SCons generates when you run scons will be automatically cleaned when you run scons -c.

    While useful, I’d like more control over the behavior of scons -c. Specifically, being a good TDD junkie, I have several test cases that I can run using scons test:

    build_test = env.Program( … )
    env.Alias(build-tests, build_test)
    
    run_test = env.Alias(test, [build_test],
                         [@%s % build_test[0].abspath])
    env.AlwaysBuild(run_test)
    

    By setting it up this way, the test programs aren’t built by default: you have to explicitly run scons build-tests (if you want to build the tests but not run them) or scons test (if you want to build and run them). Moreover, because of SCons’s dependency tracking, I can just use scons test as my usual build command during my Edit-Test-Debug loop. SCons will automatically rebuild any changed source files before running the tests.

    All of this is great. So what’s the problem? As I mentioned above, scons -c only cleans the build files that are created by scons — and since I’ve explicitly set things up so that tests aren’t built by default, they’ll also not be cleaned by default. This means that to fully clean out my build targets, I have to run two commands:

    $ scons -c
    $ scons -c build-tests
    

    Not ideal! I’d prefer if scons -c cleaned everything, just like make clean would in the Automake world.

  • 2010-01-05

    Exporting OmniGraffle documents from the command line

    OmniGraffle is my tool of choice for creating figures for my papers. It’s biggest drawback is that it’s only available for Mac OS, which can make it cumbersome if I’m working on one of my Linux machines and need to create or modify a figure. But it’s ease-of-use and the quality of the figures it creates are hard to beat.

    Of course, creating the figure isn’t enough — since I write my papers in LaTeX, I have to export my figures into EPS or PDF (depending on whether I’m creating a PostScript or PDF version of the paper) before I can use them in my documents. It’s easy enough to use the Export dialog to do this (keyboard shortcut: ⌥⌘E), but ideally I’d like the ability to export figures from the command line. Coupled with a good Makefile, this would let me run a simple make paper command, and automatically re-export any necessary figures before rebuilding the paper itself.

    Luckily, OmniGraffle has always had rather good support for being controlled via AppleScript. The commands can be somewhat undocumented, requiring a bit of trial and error, but while entrenched in our PhD studies at Oxford, my colleague David Faitelson and I were able to whip together a script that suited our needs. I’ve recently extracted the code from our Oxford SVN repository and uploaded it to Github.

    To install the script, just place the graffle.sh and graffle.scpt files into some directory that’s on your $PATH, such as /usr/local/bin or $HOME/bin. Then just run