• 2016-09-07

    Concurrency models in CSP

    Read Atomic: Internal consistency

    We’ll start by looking at the weakest concurrency model covered in the paper, Read Atomic. In fact, to keep things really simple to start with, we’re only going to look at one of Read Atomic’s two axioms: internal consistency. (We’ll look at external consistency in the next post.)

    A transaction is internally consistent if it “reads its own writes”. This is the simplest axiom covered in the paper, since it expresses a property that’s strictly local to each transaction; we don’t have to consider the behavior of any other transaction when defining what it means for a transaction to be internally consistent.

  • 2016-08-03

    Concurrency models in CSP


    Before diving into the details of our first concurrency model, it will be helpful to give an overview of how we’re going to use CSP in these posts.

    At the highest level, a CSP specification consists of a set of processes, which use events to describe their behavior. We use a process to represent any vantage point that lets us talk about which events occur and in which order. Each entity in our system is a process (which might be defined using subprocesses to describe each of its components or parts). We also use processes to describe how two or more entities interact with each other, and to describe any global properties of the entire system. And lastly, because CSP verification is based on “refinement”, we also use processes to define the properties that we hope our system exhibits.

  • 2016-07-28

    Concurrency models in CSP


    Two years ago, I started writing a series of blog posts about CSP — Communicating Sequential Processes. That series has…let’s say “stalled”. Part of the reason is that I didn’t have a good non-trivial example to work with. The stereotypical running example is the vending machine: start with a simple one, which accepts a single coin and spits out a tea; add more detail as you introduce more of the language. I always had a hunch that I needed an example with more meat on it, but could never find one.

    Fast-forward to today. I was reading through some of Christopher Meiklejohn’s work on Lasp, I came across a citation to a really nice paper by Cerone, Bernardi, and Gotsman, which adds some formal rigor to the consistency models that we use to describe modern distributed systems. Their formalism is a great combination of simple and expressive. The core of the paper is about processes accessing a transactional data store; the authors provide formal definitions of several concurrency models, and of some reference implementations that supposedly provide those concurrency models. They then use a technique called “observational refinement” to show that the reference implementations really do provide the concurrency guarantees in question.

    This approach lines up very well with how you perform refinement checks in CSP to show that systems satisfy some specification. And so I finally found my meaty running example! I’m resurrecting this blog series, and plan to work through each of the consistency models and proofs described in the paper, translating them into CSP processes and refinement checks. This isn’t an attempt to replace or outdo anything in the paper! Far from it — it’s my attempt to use something more familiar to work through the details of something less familiar.

    I’m not going to assume a working knowledge of CSP — or of the consistency models described in the paper! If you’re familiar with one, my hope is that you’ll be able to follow along and pick up the other. And if you’re not familiar with either…well, I guess we’ll see how good I am at writing an intro to a difficult topic!

  • 2014-11-21

    Dependency management in C

    At RedJack, all of our core products depend on a network sensor that collects various bits of information about the raw traffic that we see on the network. We’re doing some non-trivial analysis on fairly large network links using commodity hardware, so we’ve implemented this sensor in C. At its core is an extremely fast custom flow-based programming framework. It’s a damn cool piece of code, but this post isn’t about the code itself; it’s about how we deliver that code to our customers.

    Just because we’ve written this component in C, that doesn’t mean we want to turn our back on the kinds of tooling you get to use when working in other, more modern languages. In particular, once you’ve gotten used to modern package managers like npm, leiningen, go, and Cargo, it’s hard to go back to things like CMake and [shudder] the autotools.

  • 2014-05-14

    Tagged releases using git flow

    I’ve used git flow for most of my software projects — specifically for those that have versioned releases. The original post describing git flow is still the best overview of how it works. In short, you have a master branch where every commit is a merge commit. Each of these merge commits represents a new release, and is tagged with the version number of that release. The merge brings in all of the subsidiary commits and feature branches that make up that release. Ongoing work happens on a separate develop branch. This is where you merge in completed new features and bug fixes on a day-to-day basis. develop should always be a stable version of the software — you don’t merge a feature branch into develop until it passes all of your tests and is “complete” with regards to the feature you’re trying to implement.

    My favorite part of this model is how each release is just some tagged commit on the master branch. You want to see the code for the latest released version? That’s easy — git checkout master. You want version 1.2.5 specifically? Use git checkout 1.2.5 instead.

    Unfortunately, the git flow tool has implemented a slightly different behavior for awhile now. That patch makes git flow tag the last commit on the release branch, instead of the merge commit on the master branch. The reasons for this might be perfectly valid, but it’s not what I want, and it’s not what the original git flow post described. That means that I can’t use git flow release finish as-is.

  • 2014-02-28

    CSP: The basics

    tl;dr CSP is a formal method that lets you describe and reason about the behavior of concurrent systems. CSP is composable; you write simple processes, and then use special operators to combine them together into larger, more complex processes. A process is a summary of some system; it uses events to describe how that system works, and to synchronously communicate with other processes. You can compare two processes using a refinement check; this lets us check, for instance, whether a real-world system satisfies some important safety or liveness property. CSP has good tool support, which lets us perform these refinement checks quickly and automatically.

    Well that was easy, wasn’t it? You can boil just about anything down to a single paragraph. Let’s look at each of those key points in more detail.

  • 2014-01-07

    CSP: An introduction

    Communicating Sequential Processes (CSP) has been around for almost four decades at this point, but for much of its life, it was only well-known among theoretical computer scientists and formal methods advocates. More recently, many more people have at least heard of CSP, largely because it inspired the concurrency support in Go, a popular mainstream programming language. However, if you ask most people what it means to be inspired by CSP, the most common response would probably be “erm, something about message passing”?

    That said, CSP isn’t just some dusty theory that inspired part of Go; it can also help us understand the distributed systems that we create. We’ve developed a plethora of tools that help us build distributed systems. But unfortunately, we don’t always understand of how those tools work, how they fail, and how they interact when we piece them together into a larger system. We can all name-drop the CAP theorem, but do you really know what your system is going to do when the network partitions, or when a host dies? How do you convince someone that you’re right?

    We can’t just rely on intuition and hand-wavy arguments; our systems are too large, and too important, for that. So how do you address these concerns with rigor? There are two main approaches: you can either test your assumptions empirically on a running system, or you can describe your system in detail and prove that your assumptions are correct. Kyle Kingsbury has great examples of both: Jepsen on the testing side, Knossos on the proof side. Both approaches are important; if you want to be convincing, you have to choose at least one of them. If you prefer the proof-based approach, CSP is another option. If you only think of CSP in terms of Go’s concurrency primitives, or if you haven’t thought of it at all, then you overlook the fact that CSP was specifically designed to help answer these kinds of questions.

    In this series of articles, I want to describe how CSP fits into this landscape, for developers with a range of expertise. For the every-day programmer, I want to give a basic, high-level introduction to CSP, and to explain what it means for Go to be inspired by CSP. For the distributed systems engineer, I want to add weight to the argument that formal methods are a useful tool for studying and designing the systems that we create and use. And for the formal methodist, I want to show how to use CSP in particular to specify and reason about those systems.

  • 2010-05-13

    Installing Ubuntu Lucid on a PowerPC QEMU virtual machine

    Part of the software I help develop at RedJack needs to be tested on both little-endian and big-endian machines. Little-endian machines are easy, since everyone and their mother is running on a little-endian Intel or AMD x86 chip. It used to be that big-endian was pretty easy to test, too — just break out your trusty Apple Powerbook G4 and you’re good to go. Since Apple has shifted over to Intel chips, though, the situation has changed.

    Luckily, QEMU has PowerPC as one of the targets that it can emulate, so in theory, I can still easily test my code on a big-endian machine by creating a QEMU PowerPC virtual machine. There’s already a writeup about trying to install Debian onto a QEMU VM here. Aurélien Jarno has graciously put together downloadable disk images with Debian preinstalled. If that’s good enough for your purposes, just go download those! You won’t need any of the rest of the information on this page.

    Unfortunately, I didn’t want to run stock Debian; my little-endian build machine is running Ubuntu Lucid, and for consistency, I wanted my big-endian VM to be running the same. As it turns out, this also required a fair dose of masochism on my part. There are several issues that you’ll encounter if you try to do this by hand. Here is my cheat sheet for getting around these issues.

    Note that this isn’t a full step-by-step account of how to install Lucid onto a QEMU VM. For now, I’m just trying to get my notes down into a more permanent form.

  • 2010-02-25

    Parser callbacks in libpush, Part 1 — Streams

    This post is the first in a series that describes the push_callback_t type in the libpush library. In these posts, we’ll walk through a couple of possible ways to implement callbacks under the covers. At each stage, we’ll encounter problems with the current design. Fixing these problems should lead closer us to the actual implementation in libpush, and along the way, we’ll gain a good understanding of how our design decisions affect the performance and usability of the library.

    The push_callback_t type is used to define parser callbacks, which are the basic unit of parsing in libpush. Callbacks are pretty simple: they take in an input value, read some data from the input stream, and produce an output value. (The fact that callbacks take in an input value, in addition to reading from the input stream, is what makes them arrows instead of monads — but that’s a story for a later post).

  • 2010-02-17

    Using LLVM's link-time optimization on Ubuntu Karmic

    While playing around with libpush on my MacBook, I was pleasantly surprised to see a huge performance increase when I used the link-time optimization (LTO) feature of the LLVM GCC front end. (It’s really quite nifty; the new Homebrew package manager uses it by default when compiling packages.) On MacOS, using LTO is as simple as using llvm-gcc as your C compiler (or llvm-g++ if you’re compiling C++), and passing in -O4 as your optimization flag. I use SCons as my builder, so this turns into:

    $ scons CC=llvm-gcc CCFLAGS=-O4

    This will cause GCC to output LLVM bytecode into the .o output files, and to perform whole-program optimizations during each linking phase. I was able to see a big performance win simply from the linker being able to inline in copies of small functions that live in “other” compilation units.