dcreager.net

Swanson: S₀

2020-12-10

Note: This post is out of date! Swanson still has an S₀ language, but it looks completely different now.

Previous: Execution model

In the previous post, we talked about Swanson’s execution model, but didn’t really describe what Swanson code looks like. In this post, we’ll look at S₀ (pronounced “ess naught”), which is Swanson’s “assembly language”.

As we’ll see, S₀ hews pretty closely to the Swanson execution model, and isn’t really a language that you’ll want to program in directly. Typically, you’ll actually _write_ in some other higher-level language, which will be translated into S₀. We’ll see in later posts how this process works. For now, don’t be put off by the amount of boilerplate that you see here — it’s not something that you’ll have to author directly!

Preliminaries

We’ve carefully designed the concrete syntax of S₀ so that it is as simple to parse as possible — for instance, without requiring a particular parser generator. (The reference implementation is a simple recursive descent parser that only requires a single character of lookahead.) In fact, it’s more important for S₀ to be easy to parse, than for it to be easy for humans to write. After all, you will very rarely have to write S₀ directly!

This simplicity is important since the first part of writing a new Swanson host is being able to load in “bootstrap” code, which is written in S₀. This part of the host will need to be written directly in the host’s language, and so we want to make that host-specific custom code as simple to write as possible.

Names

As mentioned in the previous post, Swanson names are binary. There are three ways to encode a Swanson name in S₀.

The first allows us to encode arbitrary binary content, written in hexadecimal within square brackets. Each octet in the name must be written with two hexadecimal characters, and there must be one or more whitespace characters in between each octet. For instance, the following is one way to encode the name name:

[6e 61 6d 65]

The second notation is a shortcut syntax for names that are ASCII-encoded strings, since there are enough higher-level languages that use ASCII to encode their identifiers. A bare name is an ASCII-encoded name that only consists of common identifier characters: in particular, alphanumerics, underscore, period, at-sign (`@`), and dollar sign (`$`). (Bare names cannot contain whitespace, any other printable characters, or any non-printable characters.) A bare name can be written in S₀ directly:

$_    entry      atom       i        ds.hashmap
$0    entry@1    literal    while    primitives.bool
You might wonder why `@` and `$` are considered bare name characters, since most programming languages limit identifiers to alphanumberic characters. (Lisps that allow `kebab-case` names are an exception!)
And that’s actually the reason why! Since S₀ is a translation target for other languages, having a couple of atypical characters available lets us more easily construct “derived” Swanson names that cannot conflict with names that appear in the source languages.

The final notation is a syntax for names that are ASCII-encoded strings, but don’t consist of only bare name characters. If a name contains only ASCII printable characters, you can enclose them in double-quotes:

"entry"               "kebab-case-with->arrows"
"name with spaces"    "name:with:colons"

Note that there are no escape sequences for this notation — which means in particular that you can’t use this notation if the name contains a double-quote character. If it does, you have to use the hex notation described above.

Modules and blocks

S₀ code is organized into modules. Each module consists of a number of blocks, each of which has a distinct name. Blocks are used in closures, which are Swanson invokables whose branches are implemented in S₀.

module test {
  $load: containing () receiving ($loaded) {
    $module = atom;
    -> $loaded;
  }
}

Each block starts with containing and receiving clauses, which define which names are available in the environment at the start of the block. The names in the containing clause are part of the block’s “closure set” — they represent values that are moved “into” the closure when it’s created. The names in the receiving clause are the “inputs” of the block — the caller must ensure that the environment contains exactly these names when invoking the block’s closure. When a block is invoked, its closure and input environments are merged together before execution proceeds — which means that the block’s containing and receiving clauses can’t have any names in common.

Each block also contains a body, enclosed in curly braces. The body consists of zero or more statements followed by exactly one invocation.

Statements

There are four kinds of statement in S₀:

Create atom

Creates a new Swanson atom distinct from all others, and adds it to the environment:

dest = atom;

Create literal

Creates a new Swanson literal, and adds it to the environment:

dest = literal [6e 61 6d 65];

Create closure

Creates a new S₀ closure, and adds it to the environment:

dest = closure containing (value1, value2)
  branch true = true_branch,
  branch false = false_branch;

The statement has a containing clause which removes the specified values from the environment, moving them into the new closure. Each branch of the closure has a name (true, false), and refers to one of the blocks in the enclosing module (true_branch, false_branch). The containing clause of each of those blocks must match the containing clause of the create closure statement.

There is a shortcut syntax for when the new closure has a single branch, with an empty name:

dest = closure containing (value1, value2) -> block;

is exactly equivalent to:

dest = closure containing (value1, value2)
  branch [] = block;

Rename

Changes the name of a value in the environment:

dest = rename source;

For all of these statements, it’s an error if there’s already a value in the environment with the desired “destination” name. For create closure and rename statements, it’s an error if there _isn’t_ already a value in the environment with the desired “source” name.

And that’s it! You’ll notice that you can’t really do anything interesting with S₀ statements. They’re just used to set up the environment as needed for the block’s invocation, which is where real computation happens.

Invocations

Each ends with exactly one invocation:

-> value branch;

This removes the value named value from the environment, and passes control to its branch named branch. (It’s an error if there’s no value in the environment named value, or if that value isn’t an invokable, or if that invokable doesn’t have a branch named branch.)

There’s a shortcut syntax for invoking a branch with an empty name:

-> value;

Whatever values are still in the environment (after removing the invokable that we’re about to pass control to) are provided to the invokable as its inputs. If the invokable is an S₀ closure, then the receiving clause of the selected branch’s block must match the set of names that are in the environment that are about to be provided as input.

S₀ modules as Swanson units

S₀ modules can be used as Swanson units. The module has a name, which is also used as the name of the unit. The first block in the module is its loader block, which is invoked to load the unit. The loader block’s containing set must be empty.

The loader block’s receiving set defines the dependencies of the unit. Each name is treated as the name of some other Swanson unit. The host will load those dependencies, and put them into the environment as inputs before invoking the loader block (just like for the receiving set of any other block).

The input named `$loaded is handled specially. Instead of loading a unit named $loaded` as a dependency, the host provides a special invokable for this input, which the loader block will invoke with the “value” of the module (as an output named `$module`). Our example module from up above is a Swanson unit that produces a new atom when loaded:

module test {
  $load: containing () receiving ($loaded) {
    $module = atom;
    -> $loaded;
  }
}

A larger example

Putting it all together, this is an example module that:

  • is an entry point (that is, the main module of a program)
  • depends on another (primitive) unit named primitive.bool, which contains some routines for dealing with booleans,
  • creates a true constant,
  • evaluates the constant,
  • and then quits.
module bool.can_evaluate_true {
  $load: containing () receiving ($loaded, primitive.bool) {
    $module = closure containing (primitive.bool) -> main;
    -> $loaded;
  }

  main: containing (primitive.bool) receiving ($finish) {
    $return = closure containing ($finish) -> main@1;
    -> primitive.bool true;
  }

  main@1: containing ($finish) receiving ($_, $0) {
    primitive.bool = rename $_;
    value = rename $0;
    $return = closure containing ($finish, value) -> main@2;
    -> primitive.bool drop;
  }

  main@2: containing ($finish, value) receiving () {
    $evaluate = closure containing ($finish)
      branch true = main@2$$evaluate$true,
      branch false = main@2$$evaluate$false;
    -> value evaluate;
  }

  main@2$$evaluate$true: containing ($finish) receiving () {
    -> $finish succeed;
  }

  main@2$$evaluate$false: containing ($finish) receiving () {
    -> $finish fail;
  }
}

Can you see why this isn’t a language that you’d want to program in directly? In the next post, we’ll learn about S₁, which provides some helpful syntactic sugar, while still being a very low-level language.

Next: S₁