How We Built Devbox's Package-Version Search

How We Built Devbox's Package-Version Search

Back when we first released Devbox, one of the most requested features was the ability to install specific versions of packages. This isn't something that Nix easily supports out of the box, so initially Devbox could only install the most recent versions of a package. That changed in Devbox v0.5 which added a devbox search command that allows users to find and install old versions of any package. More recently, we also launched Nixhub.io that makes the same search available to Nix users.

This post is a technical deep-dive on how we implemented package versioning in Devbox along with the search engine behind Nixhub. There's a lot to cover here, so we're going to split it up into two parts. This first post explains the basics of how the first iteration of devbox search and Nixhub worked. The second part will go into more detail on how we improved search to index hundreds of thousands of packages while still keeping it fast.

Primer on nixpkgs

In case you're unfamiliar with Nix, most Nix packages live in a "nixpkgs" git repository. The inner-workings of nixpkgs is a whole separate topic, but fundamentally it's a giant mapping of package names to build instructions. In Nix terms, the package names are attribute paths and the build instructions are derivations. When you install a package, you're telling Nix to find the package name in the map and then run its build steps.

# This is a simplification to show how nixpkgs conceptually works as a
# large key-value map.
{
	go = derivation {
		name = "go";

		# Inline shell script that builds the package.
		builder = ''
			curl https://go.dev/dl/go1.20.src.tar.gz | tar -xz
			cd go/src
			./all.bash
		'';

		meta = {
			version = "1.20";
			description = "The Go Programming language";
		};
		# ...
	};
}

Fortunately, because Nix is smart, it can usually download prebuilt packages from a cache and skip building them from scratch.

Package Versioning

You may have already noticed why defining packages in a map can make versioning difficult. Each package attribute path is a key, and therefore can only have a single value. To update a package's version someone modifies the existing package and commits the changes. Afterwards, the previous version no longer exists in the latest nixpkgs commit.

 {
        go = derivation {
                name = "go";

                # Inline shell script that builds the package.
                builder = ''
-                       curl https://go.dev/dl/go1.20.src.tar.gz | tar -xz
+                       curl https://go.dev/dl/go1.20.7.src.tar.gz | tar -xz
                        cd go/src
                        ./all.bash
                '';

                meta = {
-                       version = "1.20";
+                       version = "1.20.7";
                        description = "The Go Programming language";
                };
                # ...
        };
 }

One way around this is to put the version in the attribute path so that multiple versions can live side-by-side:

{
	go_1_18 = ...;
	go_1_19 = ...;
	go_1_20 = ...;
	go = go_1_20; # some default Go version
}

This is what a lot of packages do for major releases, but minor or very old versions still get lost.

You might be thinking, "Wait, this is a git repository. Why not just checkout a previous commit to get the old version back?" If you are, then you're right! If you can get the build instructions (derivation) for an older version of a package, Nix will be able to redownload/rebuild it.

This is one of Nix's strengths. It builds packages in a very isolated and reproducible way, allowing it to easily compile years-old versions of packages and guarantee that the output will be the same.

Finding Old Versions

We now know that we can use old git commits in nixpkgs to install old versions of packages. The next question is, "How do we find the commit with the version we want?"

This is the trickier part. The Nix language used to define packages is just that - a full, Turing-complete programming language. You can't naively look at the raw source code to accurately determine the version of a package. You might be able to make some guesses if the package is simple enough, but to be certain you need to actually run the package's Nix expression.

However, evaluating the metadata of every package across thousands of commits isn't exactly fast. Doing this on the fly would make searching unbearably slow. What we need is a precomputed index of package versions.

Building a Search Index

To make searching for versions fast, we drew inspiration from @lazamar who wrote up a great blog post on how to make nixpkgs searchable.

The core idea is simple:

  1. Checkout a previous nixpkgs commit.
  2. Walk the tree of packages.
  3. For each package you find, evaluate its metadata.
  4. Output the result as JSON.
  5. You now have a mapping of commit hash -> package attribute path -> version!

As it turns out, the Nix community maintains a CI system called Hydra which does exactly this. Hydra attempts to build every package in a nixpkgs commit and then cache their output. Part of Hydra's build results is a giant JSON file containing the metadata (including versions) for every package it found.

We can load those JSON files into memory, build a prefix tree of package metadata to make it searchable, and then expose that data through an API.

Below is an example of a search for "pyt". We start by walking a prefix tree of attribute paths by following each character of the search query. When we reach the end, we return the subtree beneath the last node sorted by length.

Prefix tree highlighting the walked path
The path through the prefix tree when searching for "pyt".

It's All in the Name

This was a good start, but we weren't yet happy with the user experience. Earlier we mentioned that some package attribute paths include a version. This made specifying granular package versions confusing. For example, in nixpkgs the package for Go 1.19.11 has the attribute path go_1_19 and the version 1.19.11. To install this package, the user would have to run devbox add go_1_19@1.19.11.

Another issue was that we wanted to have a devbox update command to make upgrading packages easier. If the package is named go_1_19 then it could only ever upgrade to the latest patch version of Go 1.19.

What we needed was a way to group packages under a new name that was distinct from any one version. Internally we refer to these names as canonical names. They allow packages to be specified in a more intuitive name@version format.

To give an example for why canonical names are so useful, we'll look at the Go v1.19.11 package. In nixpkgs, the go attribute path has never pointed to this version of Go. It has only ever been available as go_1_19.

Diagram showing a canonical name grouping attribute paths
The "go" canonical name groups multiple attribute paths, making it easier to find specific versions.

When a user runs devbox add go@1.19.11, Devbox first checks to see if go matches a canonical name. If it does, Devbox looks for v1.19.11 under all attribute paths grouped by that name (go and go_1_19). If go wasn't a canonical name, then Devbox would only be able to look under the go attribute path and wouldn't find the desired version.

It's important to note that all Devbox packages can still be specified by their attribute path. We also took care to make sure that identical canonical names and attribute paths point to the same thing.

Resolving Packages

Using canonical names makes for a much better user experience, but it also introduces some ambiguity. When a user installs a package such as go@1.20, Devbox needs to somehow decide which nixpkgs commit and attribute path to use. In Nix terms, Devbox needs to resolve its input ({name}@{version}) to a flake reference (nixpkgs/{commit}#{attribute_path}).

The exact details for how Devbox resolves package versions is a topic for another blog post, but the basic rules it follows are:

  1. If a package version exists in multiple commits, pick the latest one.
  2. If a package version has multiple attribute paths, pick the shortest one.

Once Devbox picks a commit and attribute path, it locks it in a devbox.lock file as a flake ref. This does two important things. First, it ensures that package versions don't change unless the user runs devbox update. Second, it keeps the development environment compatible with Nix. If Devbox ever goes away, everything still exists as a plain flake.nix and flake.lock in your project's .devbox directory.

Conclusion

All of the above came together as the first iteration of devbox search. It ate a ton of memory, was missing a bunch of versions, and wasn't very efficient, but it proved a few things for us:

  1. Users really liked this feature. Many told us that granular package versioning was the killer feature that got them using Devbox.
  2. Doing a prefix search for packages worked really well. We wouldn't need anything fancier like full-text search.
  3. The whole thing actually worked.

In part 2 of this blog post, we'll go into more detail on how we improved the search API by automating the indexing pipeline and moving the index out of memory and into SQLite.

Stay up to Date with Jetify

If you're reading this, we'd love to hear from you about how you've been using Devbox for your projects. You can follow us on Twitter, or chat with our developers live on our Discord Server. We also welcome issues and pull requests on our Github Repo.