2021-06-14

# A map of the tree-sitter ecosystem

The tree-sitter ecosystem is divided up across a large number of components, each in different repositories, which can be quite overwhelming at first. This post tries to provide a map of sorts.

## Overview

Say you’re interested in the tree-sitter project, so you decide to check out the tree-sitter organization on GitHub, browsing through its repositories to determine how the ecosystem is structured. The list of repositories spills over onto a second page, and you see entries that seem redundant. Why is there both tree-sitter-python and py-tree-sitter? Are they competing with each other? Is one deprecated?

You might instead decide to check out the project homepage. The landing page lists (as of June 2021) over 40 different programming language parsers that various folks have implemented, as well as a handful of language bindings.

This, at least, points to an answer. The tree-sitter ecosystem is complicated because when we write a code analysis tool, we want to support different programming languages in two separate, orthogonal ways:

• First, we want to be able to parse source code implemented in different programming languages.

• Second, and possibly less obviously, we want to use tree-sitter in several different programming languages. You specifically are going to write your analysis tool in one language, but we (the tree-sitter developers) don’t know which one that is! We’ve tried to implement tree-sitter so that we don’t place any restrictions on which language you use.

That at least explains why “Python support” in tree-sitter might mean two different things. But why have we separated everything out into distinct repositories? The main reason is to make it as clear as possible that all of these pieces are truly independent of each other. There shouldn’t be any way for the Python language bindings to influence the design or release process of the Haskell bindings, for instance, nor of any of the language grammars.

True, it adds complexity to the ecosystem, but we’ve tried to get around this with careful naming conventions, and tree-sitter-specific tooling to make it easy to find and work with whatever pieces you need.

So, given the above, you will encounter all of the following on your journey:

### Language parser bindings

Complicating things even more, you need both the runtime library and the generated parser for each language that you want to parse — and in particular, you need bindings for both! The language bindings described above only include the runtime library, since they can’t know in advance which languages you will want to parse. The bindings should include instructions for how to build and include your desired parsers.

For some language bindings, we can lean on the language’s package manager for this. For instance, for the Rust bindings, we publish packages to crates.io both for the language binding itself (the tree-sitter crate) and for most of the supported grammars (e.g. the tree-sitter-python crate). So if you are writing a tool, which is implemented in Rust, and which analyzes Python code, you would add both tree-sitter and tree-sitter-python to your Cargo.toml file. Wherever possible, we follow this approach for other language bindings, too.

You can also read this post via Gemini.