• 2017-08-10

# Introduction

Programming languages have a lot of interesting ways to think about what you’re allowed to do, and not allowed to do. In a statically typed language, your compiler will yell at you if you try to do something that’s not allowed, and a lot of interesting PL research involves teaching your language and compiler to disallow things in increasingly sophisticated ways.

This is not limited to statically typed languages. In a dynamically typed language, your interpreter is just as invested in disallowing certain behaviors. It’s just that the enforcement happens while your program is running!

Someone with a theoretical background might quibble with my choice of the word “disallow”. They might prefer that I say that certain operations in your programming language are “invalid”. In the world of mathematics, it’s usually not the case that some little gremlin is actively preventing these operations from occurring; instead, it’s that the invalid operations don’t even exist as possibilities!

But regardless what language you’re programming in, and regardless of whether you think in terms of permissions or possibilities, the act of programming is to consider all of these possible operations, and choose which ones you want to use to accomplish your goal. At any given time, there are many possible operations, and you have the control to decide which ones to use!

In this series of posts I want to look at the flip side of this, and talk about requirements or obligations. What mechanisms do programming languages have to force you to perform some operation, or to ensure that some operation is performed even if you don’t actively choose to do it yourself?

At the end of this series, I hope to convince you that obligations are just as important as possibilities, and that there are interesting complex behaviors that are easier to think about and work with if we use programming languages that can talk about obligations directly.

• 2017-01-17

# Lazy processes

1. Load in a description of the $$Spec$$ and $$Impl$$ processes, transforming them each into a labeled transition system (LTS).

As mentioned in the previous post, we’re going to rely on the labeled transition system defined by CSP’s operational semantics to represent processes in our refinement checker. An LTS is just a directed graph, with nodes representing processes and subprocesses, and edges representing events. In this post, we’ll look at LTSes in a bit more detail, and at how best to represent them in code.

• 2016-11-17

# Semantic methods

Since CSP is a formal method, it’s not surprising that Roscoe spends a large part of his textbook talking about how to use rigorous mathematics to talk about processes. He actually goes one step (er, two?) further and defines three different ways to do so:

• The denotational semantics defines (mathematically, using sets and sequences) what the behavior of a process is. Each CSP operator comes with a rule for how to calculate a process’s behavior recursively — that is, in terms of the behavior of its operands. (So for example, the “external choice” rule tells you how to define the behavior of $$P \mathrel{\Box} Q$$ in terms of the behavior of $$P$$ and $$Q$$.)

• The algebraic semantics tell you to not worry about what a process “means” or what it “does”. Instead, it provides a list of rewrite rules that let you change what the definition of a process looks like without changing its behavior.

• The operational semantics say that a process is nothing more than a state machine, with nodes representing processes (and subprocesses) and edges representing the events that allow you to transition between them. We can learn anything important about a process just by interpreting or analyzing this state machine.

• 2016-11-17

# Refinement overview

Our goal is to learn about CSP refinement by implementing a refinement checker. So a good first step is to make sure we’re all on the same page about what refinement is, and then to step through the refinement algorithm that we mean to implement. (If nothing else, that will help make sure I don’t go off on too many tangents while implementing it!)

I’ve mentioned refinement elsewhere on this blog a few times (for instance, here). The basic idea is that in CSP, you use the same process language to describe the system you’re designing or investigating, as well as the properties that you would like that system to have. (This is unlike most other formal methods, where you have separate languages for the system and the properties.) In CSP, the system’s process is typically called $$Impl$$ (for implementation), and the property description process is typically called $$Spec$$ (for specification).

CSP then defines several semantic models that provide rigorous mathematical definitions of what a process’s behavior is. You perform a refinement check within the context of a particular semantic model. A successful refinement check tells you that the property defined by $$Spec$$ “holds” — specifically, that all of the behaviors of $$Impl$$ are also allowed behaviors of $$Spec$$. A failed refinement check gives you a counterexample — that is, a specific behavior of $$Impl$$ that was disallowed by $$Spec$$.

The three most common semantic models are traces, failures, and failures-divergences. We’ll go into more detail about the mathematics behind these semantic models in later posts; for now, the 10,000-foot overview is that:

• 2016-11-16

# Introduction

It seems like this blog is basically turning into “all things CSP”! As part of that trend, I’ve started implementing a new CSP refinement checker called HST. Why do this when there’s a perfectly good refinement checker in FDR? Well, I want to learn more about how FDR’s refinement algorithm works. The algorithm is documented in Bill Roscoe’s textbook (and a series of follow-on papers), and working through those descriptions gives you a good bit of insight into how refinenment really works. But I often find it easier to learn a complex topic by implementing it (or at least, by looking at the code of an implementation). Hence HST! In this new series of blog posts, I’m going to walk through the CSP refinement algorithm in more detail than is presented in the academic literature, by implementing it (and describing that implementation) along the way.

I should emphasize that this is not meant to be a replacement for FDR! FDR is a very good piece of software, and if you’re writing any CSP specs in anger, you probably want FDR at your disposal. HST is meant to be more of an educational exercise. If people find it more generally useful than that, that’s great! But it’s not what I’m aiming for.

(And as for the name, “HST” does not stand for “Harry S. Truman”, just like “FDR” does not stand for “Franklin Delano Roosevelt”.)

• 2016-09-07

# Read Atomic: Internal consistency

We’ll start by looking at the weakest concurrency model covered in the paper, Read Atomic. In fact, to keep things really simple to start with, we’re only going to look at one of Read Atomic’s two axioms: internal consistency. (We’ll look at external consistency in the next post.)

A transaction is internally consistent if it “reads its own writes”. This is the simplest axiom covered in the paper, since it expresses a property that’s strictly local to each transaction; we don’t have to consider the behavior of any other transaction when defining what it means for a transaction to be internally consistent.

• 2016-08-03

# Preliminaries

Before diving into the details of our first concurrency model, it will be helpful to give an overview of how we’re going to use CSP in these posts.

At the highest level, a CSP specification consists of a set of processes, which use events to describe their behavior. We use a process to represent any vantage point that lets us talk about which events occur and in which order. Each entity in our system is a process (which might be defined using subprocesses to describe each of its components or parts). We also use processes to describe how two or more entities interact with each other, and to describe any global properties of the entire system. And lastly, because CSP verification is based on “refinement”, we also use processes to define the properties that we hope our system exhibits.

• 2016-07-28

# Introduction

Two years ago, I started writing a series of blog posts about CSP — Communicating Sequential Processes. That series has…let’s say “stalled”. Part of the reason is that I didn’t have a good non-trivial example to work with. The stereotypical running example is the vending machine: start with a simple one, which accepts a single coin and spits out a tea; add more detail as you introduce more of the language. I always had a hunch that I needed an example with more meat on it, but could never find one.

Fast-forward to today. I was reading through some of Christopher Meiklejohn’s work on Lasp, I came across a citation to a really nice paper by Cerone, Bernardi, and Gotsman, which adds some formal rigor to the consistency models that we use to describe modern distributed systems. Their formalism is a great combination of simple and expressive. The core of the paper is about processes accessing a transactional data store; the authors provide formal definitions of several concurrency models, and of some reference implementations that supposedly provide those concurrency models. They then use a technique called “observational refinement” to show that the reference implementations really do provide the concurrency guarantees in question.

This approach lines up very well with how you perform refinement checks in CSP to show that systems satisfy some specification. And so I finally found my meaty running example! I’m resurrecting this blog series, and plan to work through each of the consistency models and proofs described in the paper, translating them into CSP processes and refinement checks. This isn’t an attempt to replace or outdo anything in the paper! Far from it — it’s my attempt to use something more familiar to work through the details of something less familiar.

I’m not going to assume a working knowledge of CSP — or of the consistency models described in the paper! If you’re familiar with one, my hope is that you’ll be able to follow along and pick up the other. And if you’re not familiar with either…well, I guess we’ll see how good I am at writing an intro to a difficult topic!

• 2014-11-21

# Dependency management in C

At RedJack, all of our core products depend on a network sensor that collects various bits of information about the raw traffic that we see on the network. We’re doing some non-trivial analysis on fairly large network links using commodity hardware, so we’ve implemented this sensor in C. At its core is an extremely fast custom flow-based programming framework. It’s a damn cool piece of code, but this post isn’t about the code itself; it’s about how we deliver that code to our customers.

Just because we’ve written this component in C, that doesn’t mean we want to turn our back on the kinds of tooling you get to use when working in other, more modern languages. In particular, once you’ve gotten used to modern package managers like npm, leiningen, go, and Cargo, it’s hard to go back to things like CMake and [shudder] the autotools.

• 2014-05-14

# Tagged releases using git flow

I’ve used git flow for most of my software projects — specifically for those that have versioned releases. The original post describing git flow is still the best overview of how it works. In short, you have a master branch where every commit is a merge commit. Each of these merge commits represents a new release, and is tagged with the version number of that release. The merge brings in all of the subsidiary commits and feature branches that make up that release. Ongoing work happens on a separate develop branch. This is where you merge in completed new features and bug fixes on a day-to-day basis. develop should always be a stable version of the software — you don’t merge a feature branch into develop until it passes all of your tests and is “complete” with regards to the feature you’re trying to implement.

My favorite part of this model is how each release is just some tagged commit on the master branch. You want to see the code for the latest released version? That’s easy — git checkout master. You want version 1.2.5 specifically? Use git checkout 1.2.5 instead.

Unfortunately, the git flow tool has implemented a slightly different behavior for awhile now. That patch makes git flow tag the last commit on the release branch, instead of the merge commit on the master branch. The reasons for this might be perfectly valid, but it’s not what I want, and it’s not what the original git flow post described. That means that I can’t use git flow release finish as-is.