In the previous post, we talked about Swanson’s execution model, but didn’t really describe what Swanson code looks like. In this post, we’ll look at S₀ (pronounced “ess naught”), which is Swanson’s “assembly language”.
As we’ll see, S₀ hews pretty closely to the Swanson execution model, and isn’t really a language that you’ll want to program in directly. Typically, you’ll actually write in some other higher-level language, which will be translated into S₀. We’ll see in later posts how this process works. For now, don’t be put off by the amount of boilerplate that you see here — it’s not something that you’ll have to author directly!
We’ve carefully designed the concrete syntax of S₀ so that it is as simple to parse as possible — for instance, without requiring a particular parser generator. (The reference implementation is a simple recursive descent parser that only requires a single character of lookahead.) In fact, it’s more important for S₀ to be easy to parse, than for it to be easy for humans to write. After all, you will very rarely have to write S₀ directly!
This simplicity is important since the first part of writing a new Swanson host is being able to load in “bootstrap” code, which is written in S₀. This part of the host will need to be written directly in the host’s language, and so we want to make that host-specific custom code as simple to write as possible.
As mentioned in the previous post, Swanson names are binary. There are three ways to encode a Swanson name in S₀.
The first allows us to encode arbitrary binary content, written in hexadecimal
within square brackets. Each octet in the name must be written with two
hexadecimal characters, and there must be one or more whitespace characters in
between each octet. For instance, the following is one way to encode the name
name
:
[6e 61 6d 65]
The second notation is a shortcut syntax for names that are ASCII-encoded
strings, since there are enough higher-level languages that use ASCII to encode
their identifiers. A bare name is an ASCII-encoded name that only
consists of common identifier characters: in particular, alphanumerics,
underscore, period, at-sign (@
), and dollar sign ($
). (Bare names cannot
contain whitespace, any other printable characters, or any non-printable
characters.) A bare name can be written in S₀ directly:
$_ entry atom i ds.hashmap $0 entry@1 literal while primitives.bool
You might wonder why @
and $
are considered bare name characters, since
most programming languages limit identifiers to alphanumberic characters.
(Lisps that allow kebab-case
names are an exception!)
And that’s actually the reason why! Since S₀ is a translation target for other languages, having a couple of atypical characters available lets us more easily construct “derived” Swanson names that cannot conflict with names that appear in the source languages.
The final notation is a syntax for names that are ASCII-encoded strings, but don’t consist of only bare name characters. If a name contains only ASCII printable characters, you can enclose them in double-quotes:
"entry" "kebab-case-with->arrows" "name with spaces" "name:with:colons"
Note that there are no escape sequences for this notation — which means in particular that you can’t use this notation if the name contains a double-quote character. If it does, you have to use the hex notation described above.
S₀ code is organized into modules. Each module consists of a number of blocks, each of which has a distinct name. Blocks are used in closures, which are Swanson invokables whose branches are implemented in S₀.
module test { $load: containing () receiving ($loaded) { $module = atom; -> $loaded; } }
Each block starts with containing
and receiving
clauses, which define which
names are available in the environment at the start of the block. The names in
the containing
clause are part of the block’s “closure set” — they represent
values that are moved “into” the closure when it’s created. The names in the
receiving
clause are the “inputs” of the block — the caller must ensure that
the environment contains exactly these names when invoking the block’s closure.
When a block is invoked, its closure and input environments are merged together
before execution proceeds — which means that the block’s containing
and
receiving
clauses can’t have any names in common.
Each block also contains a body, enclosed in curly braces. The body consists of zero or more statements followed by exactly one invocation.
There are four kinds of statement in S₀:
Create atom: Creates a new Swanson atom distinct from all others, and adds it to the environment:
dest = atom;
Create literal: Creates a new Swanson literal, and adds it to the environment:
dest = literal [6e 61 6d 65];
Create closure: Creates a new S₀ closure, and adds it to the environment:
dest = closure containing (value1, value2) branch true = true_branch, branch false = false_branch;
The statement has a containing
clause which removes the specified
values from the environment, moving them into the new closure. Each branch of
the closure has a name (true
, false
), and refers to one of the blocks in
the enclosing module (true_branch
, false_branch
). The containing
clause
of each of those blocks must match the containing
clause of the create
closure statement.
There is a shortcut syntax for when the new closure has a single branch, with an empty name:
dest = closure containing (value1, value2) -> block;
is exactly equivalent to:
dest = closure containing (value1, value2) branch [] = block;
Rename: Changes the name of a value in the environment:
dest = rename source;
For all of these statements, it’s an error if there’s already a value in the environment with the desired “destination” name. For create closure and rename statements, it’s an error if there isn’t already a value in the environment with the desired “source” name.
And that’s it! You’ll notice that you can’t really do anything interesting with S₀ statements. They’re just used to set up the environment as needed for the block’s invocation, which is where real computation happens.
Each ends with exactly one invocation:
-> value branch;
This removes the value named value
from the environment, and passes control
to its branch named branch
. (It’s an error if there’s no value in the
environment named value
, or if that value isn’t an invokable, or if that
invokable doesn’t have a branch named branch
.)
There’s a shortcut syntax for invoking a branch with an empty name:
-> value;
Whatever values are still in the environment (after removing the invokable that
we’re about to pass control to) are provided to the invokable as its inputs. If
the invokable is an S₀ closure, then the receiving
clause of the selected
branch’s block must match the set of names that are in the environment that are
about to be provided as input.
S₀ modules can be used as Swanson units. The module has a name, which is also
used as the name of the unit. The first block in the module is its loader
block, which is invoked to load the unit. The loader block’s containing
set
must be empty.
The loader block’s receiving
set defines the dependencies of the unit. Each
name is treated as the name of some other Swanson unit. The host will load
those dependencies, and put them into the environment as inputs before invoking
the loader block (just like for the receiving
set of any other block).
The input named $loaded
is handled specially. Instead of loading a unit named
$loaded
as a dependency, the host provides a special invokable for this input,
which the loader block will invoke with the “value” of the module (as an output
named $module
). Our example module from up above is a Swanson unit that
produces a new atom when loaded:
module test { $load: containing () receiving ($loaded) { $module = atom; -> $loaded; } }
Putting it all together, this is an example module that:
is an entry point (that is, the main
module of a program)
depends on another (primitive) unit named primitive.bool
, which contains
some routines for dealing with booleans,
creates a true
constant,
evaluates the constant,
and then quits.
module bool.can_evaluate_true { $load: containing () receiving ($loaded, primitive.bool) { $module = closure containing (primitive.bool) -> main; -> $loaded; } main: containing (primitive.bool) receiving ($finish) { $return = closure containing ($finish) -> main@1; -> primitive.bool true; } main@1: containing ($finish) receiving ($_, $0) { primitive.bool = rename $_; value = rename $0; $return = closure containing ($finish, value) -> main@2; -> primitive.bool drop; } main@2: containing ($finish, value) receiving () { $evaluate = closure containing ($finish) branch true = main@2$$evaluate$true, branch false = main@2$$evaluate$false; -> value evaluate; } main@2$$evaluate$true: containing ($finish) receiving () { -> $finish succeed; } main@2$$evaluate$false: containing ($finish) receiving () { -> $finish fail; } }
Can you see why this isn’t a language that you’d want to program in directly? In the next post, we’ll learn about S₁, which provides some helpful syntactic sugar, while still being a very low-level language.