Tools like GitHub Copilot help developers write code faster, but what is faster than using code someone else has already written? Package managers are how software developers share and use others' code. But ask any developer; package managers are universally hated. Package managers seem like they are the source of new bugs and frustrations, and usually, they are. But the benefit of sharing code is so great. It outweighs nearly any cost. Every developer relies on someone else's code.
What is a Package?
There are two types of package managers I'm talking about - one at the operating system level that distributes binaries (usually thought of as "apps"), and one at the programming language level that distributes source code.
I loosely define a package as an archive and its metadata. The archive can be a compiled executable or simply just code. The metadata includes the version, the application's dependencies needed to run, documentation, and a checksum - verification that the contents of what was downloaded precisely match the intended contents.
A package manager has a few primary responsibilities: (1) Installing, maintaining, and removing packages. (2) Dependency resolution. The second deserves a blog post, but dependency resolution is sorting out what other packages the package depends on. You can run into circular dependencies, long chains, conflict, and more. It is a complex problem. It's proven to be a complicated problem mathematically - it is in a set of problems known as NP-complete, which means non-deterministic polynomial-time; as the problem size increases, it will take a long time to solve.
How is code shared?
Code is usually statically or dynamically linked. Statically linked means that an application is shipped with all of its dependencies. Dynamically linked implies that it may share a common package or library with another package. Dynamic linking is usually more space-efficient but opens up more issues, such as what happens when two packages require two different versions of a common dependency.
Three trends driving Package Management
- More code, more reuse. Back in the day, developers would commonly "roll their own," which meant writing their own implementation of a particular algorithm or function. Now, so much code is out there and discoverable (through GitHub) that we can find code that serves our purpose without having to write it ourselves.
- Containers. You can think of containers as a high-level package. Containers are a reasonably new abstraction - made popular in the last few years (I worked on containers at Google). Inevitability, packages have dependencies both at the code level and the operating system level. Containers allow developers to specify both in a package.
- More reuse, larger dependency graphs. The dependencies between software are getting so complex that it's difficult to reason out what depends on what. For example, in Google's large codebase, there were commonly very confusing circular dependencies.
Open problems
- Different programming languages have different package managers. Can we take common problems and generically solve them?
- There is no container package manager. Therefore, we need a package manager for the meta-package manager.
- Operating system package managers are antiquated and not fit for the future. They were developed for consumers use but aren't suitable for large-scale cloud deployments.
- No discovery tools for packages besides GitHub. How do developers find out what packages can satisfy their requirements? There are a few ways for package authors to reach potential users directly. AdWords for developers? Better distribution?