¶Shared library versions

2017-10-29

Congratulations, you've written a software library! You hope that lots of people will find it useful, and will take it as a dependency when writing their own software. You know that at some point you'll have to make changes to your library, either to add features or to fix bugs. Being a good maintainer, you want to be as diligent as possible in telling your users what to expect as you publish these changes. Will they need to change their code in response to the changes that you've made? Have you retired features that they depend on? Or are the changes "safe", presumably requiring no updates on their part?

The traditional approach is to encode all of this information into an easy-to-digest **version number**. Of course, nothing in this world is simple, so there are a number of different systems for encoding compatibility information into a version number. And surprisingly, if you're writing a shared library for a compiled language like C or C++, there are (at least!) two different versioning systems that you'll need to learn. In this post, we're going to look at these different systems, how they relate to each other, and how to actually apply these version numbers to your library using a couple of common build tools.

¶Shared libraries: Why?

Before we jump into version numbers, let's talk about shared libraries and why you might need them. Shared libraries aren't really a thing in many modern programming languages. You don't need them, for instance, in programming languages which aren't compiled (Python, Ruby), or where the convention is to always compile static binaries (Go, Rust). And so knowing how to manage them has become a bit of a lost art.

How to write shared libraries [Ulrich Drepper]

But in the glorious old world of C and C++, shared libraries are still very much a thing. Shared libraries provide two main benefits:

¶They reduce the amount of time you have to spend compiling your code.

A popular Rust crate, for instance, will be compiled every time that you build one of the projects that depends on it. That's not necessarily a bad thing — it means you can optimize the library differently for each project that uses it. But it does come at a cost in developer time and in CPU cycles.

Requisite xkcd reference

With a shared library, you compile the library once, and install it into a shared location in the filesystem (typically `/usr/lib` on Linux systems). Any project that depends on that shared library can use that shared, already compiled representation as-is.

Most Linux distributions further reduce compile times by distributing **binary packages** of popular libraries, where the distribution's packaging system has compiled the code for you. By installing the package, you download a (hopefully signed) copy of the compiled library, and place it into the shared location, all without ever having to invoke the compiler (or any other part of the build chain that produced the library).

¶They reduce memory usage at runtime.

This part is less well known, I think. At runtime, the linker and the operating system conspire to make sure that any individual shared library file is only loaded into memory once, no matter how many running processes are using it! The OS will expose that single copy of the library code to each process that uses it, using the magic of virtual memory to ensure that all of those virtual "copies" are actually backed by the same chunk of physical memory. Since the code has already been compiled, it's marked read-only, making this sharing safe. (No process can overwrite any of the library code, invalidating it for the other processes sharing that copy.)

Both of these benefits are proportional to the number of projects that depend on the shared library, so they're especially useful for "core" libraries that are used by a lot of other software packages — e.g., the low-level GUI toolkit provided by GTK+, or the standard C library.

GTK+

glibc

Note that using a compiled language like C doesn't **force** you to use shared libraries; you're free to compile your dependencies along with your own project, and link everything together **statically**. (If you're going to distribute your project via a container, you might as well link statically, since we typically don't share code across containers at runtime!)

¶Version numbers

Right, so you're writing some code, which will be compiled into a shared library, and you want to attach a version number to play nice with your users. What do you do?

Your job is to think about the **public API** ("application programming interface") of your library — that is, the promises you make _at the source code level_ about how other programmers can use your library. Which functions and classes are available, what their signatures are, that kind of thing.

All of your changes will fall into one of three categories:

¶Backwards incompatible changes

These occur when you *change* part of your public API, or *remove* something from it. That means that some user of your library is going to have to **change their code** as a result of this release. You should avoid this if you can, since it places more of a burden on your users; when it's unavoidable, you need to communicate this clearly so that your users know what to expect.

¶Backwards compatible changes

These occur when you *add* something to your public API. All of your existing users will be able to use the old release or the new release, without having to change any of their code. These kinds of releases are great, since they introduce new functionality without adding an extra burden to your users.

¶Bug-fix releases

These occur when there's no change to your public API at all. All of your existing users should be able to use the old release or the new release, without having to change any of their code. They will presumably want to upgrade to this new release at their earliest convenience, what with all of those bugs that you just fixed! (Unless they're depending on the buggy behavior? Never...)

Version numbers can be consumed either by humans ("is this upgrade going to be annoying and dangerous?") and by computers ("my package manager will upgrade this for me automatically if it can"), but either way, their goal is to succinctly describe what kinds of changes you've made to your code from release to release.

Historically, there have been many competing "patterns" for constructing a version number from your list of changes. These days, however, most people have settled on Semantic Versioning (semver) as the best set of rules, with some languages going so far as to mandate it.

Semantic Versioning

Rust mandates semver

Under semver, a version number consists of three numbers: a **major version**, a **minor version**, and a **patch level**. Each of these corresponds to one kind of change, and each time you cut a new release, you "bump" the portion that lines up with the "strongest" change that you've made to the public API. Any backwards incompatible changes? Bump the major version, set minor and patch to 0. Only backwards compatible changes? Bump the minor version, set patch to 0. No changes at all? Bump the patch level. This intuitively lines up with what many project maintainers were doing anyway; semver just codifies that behavior as an explicit set of rules.

¶APIs and ABIs

Congratulations! We've solved library versioning once and for all!

Well, no, not really. Compiled shared libraries complicate this situation, because you also have to consider the library's **ABI** (application *binary* interface). In a compiled language, you can make a change to your library, which does *not* require your users to change their code at all, but which still means they can't use an existing compiled version of your library as-is!

What would this look like? Let's say you've written a C library for tracking Aussie football games, and have the following struct in version 1.3.0:

struct score {
  int goals;
  int behinds;
};

int total_score(const struct score) {
  return score.goals * 6 + score.behinds;
}

And some other programmer has written some code that uses this struct:

struct score adelaide;
adelaide.goals = 11;
adelaide.behinds = 14;
printf("Adelaide %d.%d (%d)\n",
       adelaide.goals, adelaide.behinds,
       total_score(adelaide));

Running this code, your user would get:

Adelaide 11.14 (80)

Now for some reason, you decide to reorder the fields in your struct:

struct score {
  int behinds;  /* behinds come first now! */
  int goals;
};

Your user's code is still perfectly valid! If you only consider the source API, since your user doesn't have to make any changes to their code, semver calls this a bug-fix change. When it's time to release the new version of the library including this change, you would bump the version number from **1.3.0** to **1.3.1**.

But if your user doesn't recompile their code, they'll get the wrong answer when they run their program using the new version of your library:

Adelaide 11.14 (95)

Because the two pieces of *compiled* code had different assumptions about how the fields in `struct score` were laid out in memory, they were incompatible, even though the original source code was fine!

All of this means that if you're working with a compiled language and shared libraries, you should consider the **compiled ABI** as well as the **source API** when deciding what kinds of changes are included in a release. You should consider this field-reordering example a backwards-incompatible change, and bump your version from **1.3.0** to **2.0.0**.

¶Shared library filenames

Now that we've talked about version numbers in the abstract, what do we see in practice?

On Linux and Mac machines, you will also encode version numbers into the filenames of your shared libraries. Each library will end up with **two or three** different filenames under `/usr/lib`. For instance, if you have version 1.2.3 of a shared library called _libfoo_ installed, you'll find:

$ ls /usr/lib/libfoo*
/usr/lib/libfoo.so
/usr/lib/libfoo.so.1
/usr/lib/libfoo.so.1.2.3

Note that there aren't three *copies* of the library; the first two filenames will be symlinks to the last one.

The `libfoo.so` file is only used at build-time. (In fact, Debian-based systems will only include this file in the library's `-dev` package; if you don't have that package installed, you'll only see the last two versioned filenames.) You compile some code that uses _libfoo_ by passing in `-lfoo` to your build tools. But when you do this, the build tools don't know in advance which version of the library is installed. Instead of doing some kind of wildcard match, looking for all filenames that match a pattern, the build tools assume that they can find the library with a simple `libfoo.so` filename. It's up to you (or more realistically, your package manager) to make sure that this points at the currently installed version.

The `libfoo.so.1` file is used at runtime. By convention, this base filename, which only includes the *major version* of the library, is called the library's **SONAME**. On a Linux system, you can see the SONAME of a library using the `objdump` command:

$ objdump -x /usr/lib/libfoo.so | grep SONAME
  SONAME               libfoo.so.1

When you compile some other code that depends on this library, the build chain will find the shared library file using the non-versioned `libfoo.so` filename, extract the library's SONAME from that file, and record that as the dependency. You can see these dependencies using `objdump`, too:

$ objdump -x /usr/bin/foo | grep NEEDED
  NEEDED               libfoo.so.1
  NEEDED               libc.so.6

So, at runtime, when `/usr/bin/foo` is loaded, the dynamic linker will see these `NEEDED` entries, and look for a library file called `libfoo.so.1`.

The last file, with the full version number included in the filename, isn't technically needed these days. I guess you could have multiple copies of the same major version installed, but the symlinks will only point at one of them, so it's not clear to me how that would be useful beyond satisfying any hoarder tendencies you might have!

¶libtool versions

Of course, I've glossed over an important detail. You might assume from the previous section that the Semantic Version that you've chosen for your project (Foo Library 2.1.0) will line up with the version number encoded in your shared library file (`libfoo.so.2.1.0`). Au contraire, my friend!

While there are many build systems out there these days for the C and C++ ecosystem, the dreaded autotools are still where we get our most cherished conventions. The autotool responsible for shared libraries is called libtool, and it has its own versioning scheme, with exactly the same goals as semver — but its rules are **just** different enough to give you different version numbers for the same sequence of API changes. And it's the libtool version that determines what ends up in your shared library's filename.

autotools

libtool

libtool's versioning scheme

Note that I *didn't* say that the libtool version **is** what shows up in the filename — it **determines** it. That's right! Not only is the libtool versioning scheme *different*, it's also the *input* to a process that is what *actually* determines what shows up in your filenames and in the linker commands embedded in your shared libraries. Fun stuff.

So what are these rules? The libtool documentation goes into detail, but here's a summary:

the gory details

¶1. Version info

First, you assign your shared library a **version info**. Like a traditional version number, this consists of three numbers, but they're separated by colons instead of periods, and they're out of order!

The three numbers are **current**, **revision**, and **age**. Current sorta lines up with semver's major version; age sorta lines up with the minor version; and revision sorta lines up with the patch level.

¶2. Bump version info when releasing

When you release a new version of your library, you bump the version info according to libtool's rules:

Backwards-incompatible change: bump current, set revision and age to 0.
Backwards-compatible change: bump current *and* age, set revision to 0.
No API change: bump revision.

libtool's rules

¶3. Use the ‘-version-info’ flag

You will endeavour to pass this version info to the `libtool` using the `-version-info` flag. If you're using the autotools, you'll do this using an `LDFLAGS` line.

An example that sets -version-info via LDFLAGS

¶4. libtool does its thing

libtool will perform its magic to transform this version info to a shared library filename, which basically involves undoing some of the math you performed in step 2. The result will still not be the same that you'd get by following the semver rules instead.

libtool's munging

¶CMake

What if you're not using the autotools? I'm not going to go into every build system that's out there, but CMake is pretty common, and if you're using something else, I trust you to wing it.

CMake

CMake doesn't implement the libtool magic that transforms a version info into the shared library filename. Instead, it gives you nice precise control over exactly what goes into your shared library's filename and linker commands, using the `set_target_properties` command to set the library's `VERSION` and `SOVERSION` properties.

set_target_properties

If you want to follow libtool's scheme to the letter, you have two options:

Maintain an actual libtool-style version info for your library, like I do for my libcork library, and reimplement libtool's magic using some equivalent CMake trickery.

Cry, and do all of the bookkeeping yourself, by hand.

libcork

how libcork sets version-info

the necessary CMake trickery

All of this is confusing and overly complicated, yes, but that's the world we live in.