• 2016-11-17

    HST

    Semantic methods

    Since CSP is a formal method, it’s not surprising that Roscoe spends a large part of his textbook talking about how to use rigorous mathematics to talk about processes. He actually goes one step (er, two?) further and defines three different ways to do so:

    • The denotational semantics defines (mathematically, using sets and sequences) what the behavior of a process is. Each CSP operator comes with a rule for how to calculate a process’s behavior recursively — that is, in terms of the behavior of its operands. (So for example, the “external choice” rule tells you how to define the behavior of \(P \mathrel{\Box} Q\) in terms of the behavior of \(P\) and \(Q\).)

    • The algebraic semantics tell you to not worry about what a process “means” or what it “does”. Instead, it provides a list of rewrite rules that let you change what the definition of a process looks like without changing its behavior.

    • The operational semantics say that a process is nothing more than a state machine, with nodes representing processes (and subprocesses) and edges representing the events that allow you to transition between them. We can learn anything important about a process just by interpreting or analyzing this state machine.

  • 2016-11-17

    HST

    Refinement overview

    Our goal is to learn about CSP refinement by implementing a refinement checker. So a good first step is to make sure we’re all on the same page about what refinement is, and then to step through the refinement algorithm that we mean to implement. (If nothing else, that will help make sure I don’t go off on too many tangents while implementing it!)

    I’ve mentioned refinement elsewhere on this blog a few times (for instance, here). The basic idea is that in CSP, you use the same process language to describe the system you’re designing or investigating, as well as the properties that you would like that system to have. (This is unlike most other formal methods, where you have separate languages for the system and the properties.) In CSP, the system’s process is typically called \(Impl\) (for implementation), and the property description process is typically called \(Spec\) (for specification).

    CSP then defines several semantic models that provide rigorous mathematical definitions of what a process’s behavior is. You perform a refinement check within the context of a particular semantic model. A successful refinement check tells you that the property defined by \(Spec\) “holds” — specifically, that all of the behaviors of \(Impl\) are also allowed behaviors of \(Spec\). A failed refinement check gives you a counterexample — that is, a specific behavior of \(Impl\) that was disallowed by \(Spec\).

    The three most common semantic models are traces, failures, and failures-divergences. We’ll go into more detail about the mathematics behind these semantic models in later posts; for now, the 10,000-foot overview is that:

  • 2016-11-16

    HST

    Introduction

    It seems like this blog is basically turning into “all things CSP”! As part of that trend, I’ve started implementing a new CSP refinement checker called HST. Why do this when there’s a perfectly good refinement checker in FDR? Well, I want to learn more about how FDR’s refinement algorithm works. The algorithm is documented in Bill Roscoe’s textbook (and a series of follow-on papers), and working through those descriptions gives you a good bit of insight into how refinenment really works. But I often find it easier to learn a complex topic by implementing it (or at least, by looking at the code of an implementation). Hence HST! In this new series of blog posts, I’m going to walk through the CSP refinement algorithm in more detail than is presented in the academic literature, by implementing it (and describing that implementation) along the way.

    I should emphasize that this is not meant to be a replacement for FDR! FDR is a very good piece of software, and if you’re writing any CSP specs in anger, you probably want FDR at your disposal. HST is meant to be more of an educational exercise. If people find it more generally useful than that, that’s great! But it’s not what I’m aiming for.

    (And as for the name, “HST” does not stand for “Harry S. Truman”, just like “FDR” does not stand for “Franklin Delano Roosevelt”.)

  • 2016-09-07

    Concurrency models in CSP

    Read Atomic: Internal consistency

    We’ll start by looking at the weakest concurrency model covered in the paper, Read Atomic. In fact, to keep things really simple to start with, we’re only going to look at one of Read Atomic’s two axioms: internal consistency. (We’ll look at external consistency in the next post.)

    A transaction is internally consistent if it “reads its own writes”. This is the simplest axiom covered in the paper, since it expresses a property that’s strictly local to each transaction; we don’t have to consider the behavior of any other transaction when defining what it means for a transaction to be internally consistent.

  • 2016-08-03

    Concurrency models in CSP

    Preliminaries

    Before diving into the details of our first concurrency model, it will be helpful to give an overview of how we’re going to use CSP in these posts.

    At the highest level, a CSP specification consists of a set of processes, which use events to describe their behavior. We use a process to represent any vantage point that lets us talk about which events occur and in which order. Each entity in our system is a process (which might be defined using subprocesses to describe each of its components or parts). We also use processes to describe how two or more entities interact with each other, and to describe any global properties of the entire system. And lastly, because CSP verification is based on “refinement”, we also use processes to define the properties that we hope our system exhibits.

  • 2016-07-28

    Concurrency models in CSP

    Introduction

    Two years ago, I started writing a series of blog posts about CSP — Communicating Sequential Processes. That series has…let’s say “stalled”. Part of the reason is that I didn’t have a good non-trivial example to work with. The stereotypical running example is the vending machine: start with a simple one, which accepts a single coin and spits out a tea; add more detail as you introduce more of the language. I always had a hunch that I needed an example with more meat on it, but could never find one.

    Fast-forward to today. I was reading through some of Christopher Meiklejohn’s work on Lasp, I came across a citation to a really nice paper by Cerone, Bernardi, and Gotsman, which adds some formal rigor to the consistency models that we use to describe modern distributed systems. Their formalism is a great combination of simple and expressive. The core of the paper is about processes accessing a transactional data store; the authors provide formal definitions of several concurrency models, and of some reference implementations that supposedly provide those concurrency models. They then use a technique called “observational refinement” to show that the reference implementations really do provide the concurrency guarantees in question.

    This approach lines up very well with how you perform refinement checks in CSP to show that systems satisfy some specification. And so I finally found my meaty running example! I’m resurrecting this blog series, and plan to work through each of the consistency models and proofs described in the paper, translating them into CSP processes and refinement checks. This isn’t an attempt to replace or outdo anything in the paper! Far from it — it’s my attempt to use something more familiar to work through the details of something less familiar.

    I’m not going to assume a working knowledge of CSP — or of the consistency models described in the paper! If you’re familiar with one, my hope is that you’ll be able to follow along and pick up the other. And if you’re not familiar with either…well, I guess we’ll see how good I am at writing an intro to a difficult topic!

  • 2014-11-21

    Dependency management in C

    At RedJack, all of our core products depend on a network sensor that collects various bits of information about the raw traffic that we see on the network. We’re doing some non-trivial analysis on fairly large network links using commodity hardware, so we’ve implemented this sensor in C. At its core is an extremely fast custom flow-based programming framework. It’s a damn cool piece of code, but this post isn’t about the code itself; it’s about how we deliver that code to our customers.

    Just because we’ve written this component in C, that doesn’t mean we want to turn our back on the kinds of tooling you get to use when working in other, more modern languages. In particular, once you’ve gotten used to modern package managers like npm, leiningen, go, and Cargo, it’s hard to go back to things like CMake and [shudder] the autotools.

  • 2014-05-14

    Tagged releases using git flow

    I’ve used git flow for most of my software projects — specifically for those that have versioned releases. The original post describing git flow is still the best overview of how it works. In short, you have a master branch where every commit is a merge commit. Each of these merge commits represents a new release, and is tagged with the version number of that release. The merge brings in all of the subsidiary commits and feature branches that make up that release. Ongoing work happens on a separate develop branch. This is where you merge in completed new features and bug fixes on a day-to-day basis. develop should always be a stable version of the software — you don’t merge a feature branch into develop until it passes all of your tests and is “complete” with regards to the feature you’re trying to implement.

    My favorite part of this model is how each release is just some tagged commit on the master branch. You want to see the code for the latest released version? That’s easy — git checkout master. You want version 1.2.5 specifically? Use git checkout 1.2.5 instead.

    Unfortunately, the git flow tool has implemented a slightly different behavior for awhile now. That patch makes git flow tag the last commit on the release branch, instead of the merge commit on the master branch. The reasons for this might be perfectly valid, but it’s not what I want, and it’s not what the original git flow post described. That means that I can’t use git flow release finish as-is.

  • 2014-02-28

    CSP: The basics

    tl;dr CSP is a formal method that lets you describe and reason about the behavior of concurrent systems. CSP is composable; you write simple processes, and then use special operators to combine them together into larger, more complex processes. A process is a summary of some system; it uses events to describe how that system works, and to synchronously communicate with other processes. You can compare two processes using a refinement check; this lets us check, for instance, whether a real-world system satisfies some important safety or liveness property. CSP has good tool support, which lets us perform these refinement checks quickly and automatically.

    Well that was easy, wasn’t it? You can boil just about anything down to a single paragraph. Let’s look at each of those key points in more detail.

  • 2014-01-07

    CSP: An introduction

    Communicating Sequential Processes (CSP) has been around for almost four decades at this point, but for much of its life, it was only well-known among theoretical computer scientists and formal methods advocates. More recently, many more people have at least heard of CSP, largely because it inspired the concurrency support in Go, a popular mainstream programming language. However, if you ask most people what it means to be inspired by CSP, the most common response would probably be “erm, something about message passing”?

    That said, CSP isn’t just some dusty theory that inspired part of Go; it can also help us understand the distributed systems that we create. We’ve developed a plethora of tools that help us build distributed systems. But unfortunately, we don’t always understand of how those tools work, how they fail, and how they interact when we piece them together into a larger system. We can all name-drop the CAP theorem, but do you really know what your system is going to do when the network partitions, or when a host dies? How do you convince someone that you’re right?

    We can’t just rely on intuition and hand-wavy arguments; our systems are too large, and too important, for that. So how do you address these concerns with rigor? There are two main approaches: you can either test your assumptions empirically on a running system, or you can describe your system in detail and prove that your assumptions are correct. Kyle Kingsbury has great examples of both: Jepsen on the testing side, Knossos on the proof side. Both approaches are important; if you want to be convincing, you have to choose at least one of them. If you prefer the proof-based approach, CSP is another option. If you only think of CSP in terms of Go’s concurrency primitives, or if you haven’t thought of it at all, then you overlook the fact that CSP was specifically designed to help answer these kinds of questions.

    In this series of articles, I want to describe how CSP fits into this landscape, for developers with a range of expertise. For the every-day programmer, I want to give a basic, high-level introduction to CSP, and to explain what it means for Go to be inspired by CSP. For the distributed systems engineer, I want to add weight to the argument that formal methods are a useful tool for studying and designing the systems that we create and use. And for the formal methodist, I want to show how to use CSP in particular to specify and reason about those systems.