• 2021-03-08

    A better varint

    Many binary file formats need to store integer values. And often, those integer values are typically “small”. That is, while the field might technically be a 32-bit integer, that field will rarely hold values from that entire 32-bit range. Instead, values will usually be small in magnitude, and a large number of their higher-order bits will be 0. In those cases, it can be useful to try to save space by encoding the integer value in a smaller number of bytes. Doing so can substantially reduce the size of your file, especially when there are lots of these small-magnitude integers to store.

    (Note that even though the goal, and end result, is that the file is smaller, this is not a compression scheme, since it only works when your values are distributed in one particular way. Proper compression schemes are more sophisticated, so that they can find redundancies in many other patterns of data.)

    With this goal, our job is to figure out a coding scheme that lets us:

    • serialize an integer into a variable-length sequence of bytes
    • deserialize a sequence of bytes back into an integer value
  • 2021-01-28

    Using a Sony RX100 Mk IV as a webcam on Linux

    Earlier this month I decided to use some of my company’s generous home office reimbursement to test out the “use a real digital camera as a webcam” craze that’s been going around. I spend most of my day on Zoom meetings with colleagues and other people around the industry, so I figured it wouldn’t be the worst use of money — especially if it noticeably increased the quality of the video feed that I’m sending out into the world.

    To raise the level of difficulty, my primary machine is a beefy Linux workstation. And as we all know, Linux is well-known for being straightforward and easy to configure… 🙄

    In the end, I decided to return the camera and stick with my battle-tested, traditional Logitech C920 webcam. But in case others are interested in getting this setup working under Linux, here are my notes!

  • 2021-01-03

    Git commits and code review (revisited)

    Drew DeVault had an interesting post yesterday (gemini, web) about how large a git commit should be. It echoes several points that I tried to make a couple of years back in a post of my own.

    I agree with all of the points Drew makes, and try to follow the best practices that he outlines in my own projects. There’s an alternative mindset, though, which is very popular these days — and I’ve argued at length with my collegues at GitHub about the merits and drawbacks of each approach. The different mindsets grow out of a point that Drew mentioned:

    As you receive feedback on your patch and make updates, continue to rebase and improve your original commit, or commits.

    and that I called out as its own top-level rule for what makes a “clean” git history:

    Code review does not appear in the final history of your project.

    I’ve seen this alternative mindset most often with projects that have fully bought in to using GitHub as their project hub, but it is not limited to GitHub! I’ve also seen it with projects hosted on GitLab, Gitea, Phabricator, and Gerrit. All of those code forges are centered around the Pull Request (or Merge Request or whatever else you might call it) as the mechanism for code review. There are differences in workflow between these hosts — Phabricator and Gerrit, for instance, have much better support for “stacked” changes, like you’d see with a [1/X] thread on the Linux kernel mailing list. But they all focus on having code review itself take place on the forge’s web site, as opposed to (for instance) on a mailing list.

    That, in turn, makes the project maintainers and contributors internalize the idea that the forge’s unit of review should be the same as the project’s unit of change and history. And that is the difference in mindset. Zach Holman has the best summary I’ve found so far.

    If you view the PR as your unit of change, then you apply all of the same best practices, but in a different place:

    • The PR should introduce exactly one change, of the correct size.
    • The PR title and body should contain a description of that change, structured and formatted in a very similar way to Tim Pope’s git commit message recommendations.
    • The test suite must pass for the final state of the PR before it can be merged into the main branch.

    But as Zach calls out, you end up “not giving two shits” about the individual git commits that make up your PRs.

    (To be clear, I am not advocating this mindset, but I was happy to find a good post explaining it. It’s still like fingernails on a chalkboard to come across a project that’s organized this way. But the “PR as unit of change” explanation at least means that I can now see that it is a consistent organizing principle, even if it’s not one I agree with.)

    In my original post I suggest that if you use GitHub PRs as your unit of review, and want to use git commits as your unit of history, then you must make sure to use the “squash merge” feature when merging your PRs. This reduces each PR-as-unit-of-review into a single commit-as-unit-of-history at merge time. (If you’re happy to mindfully decide to use PRs as both your unit of review and your unit of history, then you can use whatever merge strategy you want.) It’s still unsolved how best to use PRs to review a “stack” of changes, that should end up as separate commits in the history, but which should be reviewed as a unit (or as a closely reviewed sequence).

  • 2020-12-30

    Swanson

    S₁ for bootstrapping

    In the previous post, we described S₀, and showed how it would be absolutely disgusting to have to program in it directly. Which is why I described it as Swanson’s “assembly language”. In this post, we’ll look into exactly how the language is complicated, and use that to describe a slightly better language named S₁.

  • 2020-12-29

    Staff engineering manager

    I’ve been promoted! As of January 1st, I will now be a Staff Engineering Manager at GitHub.

    This is a new title on our career ladder, which was only added in the last couple of months. Prior to this, the next promotion on the manager track would have been to Director. The “Director” title, however, usually implies that you are no longer a line manager, and are instead a manager of managers. You’re in charge of a larger group, consisting of a number of distinct teams. As the Director of the group, you achieve impact by how your teams can work on a number of things in parallel.

    For my own career progression, that left me with a difficult decision. I joined GitHub specifically to work with the Semantic Code team, applying interesting program analysis ideas from research to create useful developer productivity features. I’m still very invested in the people on this team, and the problems we’re trying to solve. But working with a single team doesn’t line up well with the “Director” title, at least as it’s defined on our career ladder. So going for a promotion would have realistically required moving to another part of the company, working on technology and product features that I’m less interested in.

    We added the Staff Engineering Manager title to the career ladder specifically to address this discrepancy. It allows me to progress in my career while still being the line manager of a single team. I still have to show higher impact now with this new title. And like on the IC ladder, “staff” implies that your impact needs to be cross-team in some way.

    But so far — having not yet spent even a single day with my new title — it seems like a great fit!

  • 2020-12-16

    Rust error handling patterns

    This post summarizes how best to produce and consume errors in Rust code. It’s short and to the point! If you want more detail, check out this great article from Nick Groenen:

  • 2020-11-15

    Swanson

    S₀, the “assembly language”

    In the previous post, we talked about Swanson’s execution model, but didn’t really describe what Swanson code looks like. In this post, we’ll look at S₀ (pronounced “ess naught”), which is Swanson’s “assembly language”.

    As we’ll see, S₀ hews pretty closely to the Swanson execution model, and isn’t really a language that you’ll want to program in directly. Typically, you’ll actually write in some other higher-level language, which will be translated into S₀. We’ll see in later posts how this process works. For now, don’t be put off by the amount of boilerplate that you see here — it’s not something that you’ll have to author directly!

  • 2020-10-29

    Swanson

    Execution model

    This post summarizes the computation model that the Swanson framework builds on.

  • 2020-10-28

    Swanson

    Introduction

    My Twitter bio currently lists me as (among other things) a “PL dilettante”. Which of course means that I’ve been hacking around on my own programming language for a number of years! (Seriously, this has been a thing for a long time. The earliest Swanson-related commit I can find is from 2012.) It’s gone through a number of iterations over that time, but I’m pretty happy with where it’s at right now. Not complete by any stretch of the imagination. But I’ve chatted about it in passing with several people at this point, and I figured I need to write down the details somewhere. My friend and colleague Rob can hold forth on these kinds of topics in epic Twitter threads, but I need something more longform. So here we are!

    In this series of posts I’m going to describe Swanson, the programming language framework that I’ve been noodling on. It has two main components, one of which is much more fleshed out than the other.

    The first (not as fleshed out) part is an actual programming language, which doesn’t even really have a name yet! It brings in some interesting notions about programmable syntax and parsing in a way that (I think) makes it easy to construct things like DSLs.

    The second (more important) part is an execution model that can be used as an IR of sorts for all languages. That makes it similar in spirit to WebAssembly, in that the goal is to be something that you would compile (or translate or transpile or whatever) many other languages into, and not something to be written directly. The Great Unnamed Language, like any other language, would be compiled into Swanson the execution framework.

  • 2018-05-31

    NP-hard isn't the end of the world

    tl;dr We’ve typically considered it a deal-breaker to discover that an algorithm we care about is NP-hard. We’ll go to great lengths to simplify the problem domain so that we can switch to a polynomial-time algorithm. But if we simplify too much, then we run the risk that our solution is no longer useful. Luckily, we might not have to! NP-hard is a worst-case bound. If we can convince ourselves that we won’t encounter pathological inputs, the NP-hard algorithm might be just fine in practice.