• 2021-06-14

A map of the tree-sitter ecosystem

The tree-sitter ecosystem is divided up across a large number of components, each in different repositories, which can be quite overwhelming at first. This post tries to provide a map of sorts.

• 2021-06-07

Getting started with tree-sitter

This is the first in a series of posts on using the tree-sitter parsing framework. The primary audience, at least for these first posts, is people who want to use existing grammars to write program analysis tools that work across a variety of programming languages. I’m not (yet) writing about how to create a new grammar for a new programming language.

We’re going to start by keeping it very simple. In this post, we’re going to install the tree-sitter command-line program and the Python grammar, and then use those to parse and syntax highlighting some Python code.

• 2021-03-08

A better varint

Many binary file formats need to store integer values. And often, those integer values are typically “small”. That is, while the field might technically be a 32-bit integer, that field will rarely hold values from that entire 32-bit range. Instead, values will usually be small in magnitude, and a large number of their higher-order bits will be 0. In those cases, it can be useful to try to save space by encoding the integer value in a smaller number of bytes. Doing so can substantially reduce the size of your file, especially when there are lots of these small-magnitude integers to store.

(Note that even though the goal, and end result, is that the file is smaller, this is not a compression scheme, since it only works when your values are distributed in one particular way. Proper compression schemes are more sophisticated, so that they can find redundancies in many other patterns of data.)

With this goal, our job is to figure out a coding scheme that lets us:

• serialize an integer into a variable-length sequence of bytes
• deserialize a sequence of bytes back into an integer value
• 2021-01-28

Using a Sony RX100 Mk IV as a webcam on Linux

Earlier this month I decided to use some of my company’s generous home office reimbursement to test out the “use a real digital camera as a webcam” craze that’s been going around. I spend most of my day on Zoom meetings with colleagues and other people around the industry, so I figured it wouldn’t be the worst use of money — especially if it noticeably increased the quality of the video feed that I’m sending out into the world.

To raise the level of difficulty, my primary machine is a beefy Linux workstation. And as we all know, Linux is well-known for being straightforward and easy to configure… 🙄

In the end, I decided to return the camera and stick with my battle-tested, traditional Logitech C920 webcam. But in case others are interested in getting this setup working under Linux, here are my notes!

• 2021-01-03

Git commits and code review (revisited)

Drew DeVault had an interesting post yesterday (gemini, web) about how large a git commit should be. It echoes several points that I tried to make a couple of years back in a post of my own.

I agree with all of the points Drew makes, and try to follow the best practices that he outlines in my own projects. There’s an alternative mindset, though, which is very popular these days — and I’ve argued at length with my collegues at GitHub about the merits and drawbacks of each approach. The different mindsets grow out of a point that Drew mentioned:

and that I called out as its own top-level rule for what makes a “clean” git history:

Code review does not appear in the final history of your project.

I’ve seen this alternative mindset most often with projects that have fully bought in to using GitHub as their project hub, but it is not limited to GitHub! I’ve also seen it with projects hosted on GitLab, Gitea, Phabricator, and Gerrit. All of those code forges are centered around the Pull Request (or Merge Request or whatever else you might call it) as the mechanism for code review. There are differences in workflow between these hosts — Phabricator and Gerrit, for instance, have much better support for “stacked” changes, like you’d see with a [1/X] thread on the Linux kernel mailing list. But they all focus on having code review itself take place on the forge’s web site, as opposed to (for instance) on a mailing list.

That, in turn, makes the project maintainers and contributors internalize the idea that the forge’s unit of review should be the same as the project’s unit of change and history. And that is the difference in mindset. Zach Holman has the best summary I’ve found so far.

If you view the PR as your unit of change, then you apply all of the same best practices, but in a different place:

• The PR should introduce exactly one change, of the correct size.
• The PR title and body should contain a description of that change, structured and formatted in a very similar way to Tim Pope’s git commit message recommendations.
• The test suite must pass for the final state of the PR before it can be merged into the main branch.

But as Zach calls out, you end up “not giving two shits” about the individual git commits that make up your PRs.

(To be clear, I am not advocating this mindset, but I was happy to find a good post explaining it. It’s still like fingernails on a chalkboard to come across a project that’s organized this way. But the “PR as unit of change” explanation at least means that I can now see that it is a consistent organizing principle, even if it’s not one I agree with.)

In my original post I suggest that if you use GitHub PRs as your unit of review, and want to use git commits as your unit of history, then you must make sure to use the “squash merge” feature when merging your PRs. This reduces each PR-as-unit-of-review into a single commit-as-unit-of-history at merge time. (If you’re happy to mindfully decide to use PRs as both your unit of review and your unit of history, then you can use whatever merge strategy you want.) It’s still unsolved how best to use PRs to review a “stack” of changes, that should end up as separate commits in the history, but which should be reviewed as a unit (or as a closely reviewed sequence).

• 2020-12-30

S₁ for bootstrapping

In the previous post, we described S₀, and showed how it would be absolutely disgusting to have to program in it directly. Which is why I described it as Swanson’s “assembly language”. In this post, we’ll look into exactly how the language is complicated, and use that to describe a slightly better language named S₁.

• 2020-12-29

Staff engineering manager

I’ve been promoted! As of January 1st, I will now be a Staff Engineering Manager at GitHub.

This is a new title on our career ladder, which was only added in the last couple of months. Prior to this, the next promotion on the manager track would have been to Director. The “Director” title, however, usually implies that you are no longer a line manager, and are instead a manager of managers. You’re in charge of a larger group, consisting of a number of distinct teams. As the Director of the group, you achieve impact by how your teams can work on a number of things in parallel.

For my own career progression, that left me with a difficult decision. I joined GitHub specifically to work with the Semantic Code team, applying interesting program analysis ideas from research to create useful developer productivity features. I’m still very invested in the people on this team, and the problems we’re trying to solve. But working with a single team doesn’t line up well with the “Director” title, at least as it’s defined on our career ladder. So going for a promotion would have realistically required moving to another part of the company, working on technology and product features that I’m less interested in.

We added the Staff Engineering Manager title to the career ladder specifically to address this discrepancy. It allows me to progress in my career while still being the line manager of a single team. I still have to show higher impact now with this new title. And like on the IC ladder, “staff” implies that your impact needs to be cross-team in some way.

But so far — having not yet spent even a single day with my new title — it seems like a great fit!

• 2020-12-16

Rust error handling patterns

This post summarizes how best to produce and consume errors in Rust code. It’s short and to the point! If you want more detail, check out this great article from Nick Groenen:

• 2020-11-15

S₀, the “assembly language”

In the previous post, we talked about Swanson’s execution model, but didn’t really describe what Swanson code looks like. In this post, we’ll look at S₀ (pronounced “ess naught”), which is Swanson’s “assembly language”.

As we’ll see, S₀ hews pretty closely to the Swanson execution model, and isn’t really a language that you’ll want to program in directly. Typically, you’ll actually write in some other higher-level language, which will be translated into S₀. We’ll see in later posts how this process works. For now, don’t be put off by the amount of boilerplate that you see here — it’s not something that you’ll have to author directly!

• 2020-10-29

Execution model

This post summarizes the computation model that the Swanson framework builds on.