Package Manager

Getting and installing packages

Terms defined: backward-compatible, combinatorial explosion, heuristic, manifest, patch, prune, SAT solver, semantic versioning

There is no point building software if you can't install it. Inspired by the Comprehensive TeX Archive Network CTAN, most languages now have an online archive from which developers can download packages. Each package typically has a name and one or more version(s), each of which may have a list of dependencies, and the package may specify a version or range of versions for each of those dependencies.

Downloading files requires some web programming that is out of scope for this book, while installing those files uses the systems programming skills of . The piece we are missing is a way to figure out exactly what versions of different packages to install in order to create a consistent setup. If packages A and B require different versions of C, it might not be possible to use A and B together. On the other hand, if each one requires a range of versions of C, and those ranges overlap, we might be able to find a combination that works.

This chapter explores how to find a workable installation or prove that there isn't one. It is based in part on this tutorial by Maël Nison.

Satisfiability

What we are trying to do is find a version for each package that makes the assertion "P is compatible with all its dependencies" true for every package P. The general-purpose tools for doing this are called SAT solvers because they determine whether there is some assignment of values that satisfies the claim (i.e., makes it true). Finding a solution can be extremely hard in the general case, so most SAT solvers use heuristics to try to reduce the work.

What is semantic versioning?

Most software projects use semantic versioning for software releases. Each version number consists of three integers X.Y.Z, where X is the major version, Y is the minor version, and Z is the patch number. The full specification allows for more fields, but we will ignore them.

A package's authors increment its major version number every time something changes in a way that makes the package incompatible with previous versions (e.g., add a required parameter to a function). They increment the minor version number when they new functionality in a backward-compatible manner (i.e., without breaking any existing code), and change the patch number for backwards-compatible bug fixes that don't add any new features. The notation for specifying a project's dependencies looks a lot like arithmetic: >= 1.2.3 means "any version after 1.2.3", < 4 means "any version before 4.anything", and 1.0 - 3.1 means "any version in the specified range (including patches)". Note that version 2.1 is greater than version 1.99.

The semver module provides useful functions for working with semantic version identifier. semver.valid('1.2.3') checks that 1.2.3 is a valid version identifier, while semver.satisfies('2.2', '1.0 - 3.1') checks that its first argument is compatible with the range specified in its second.

How can we find a consistent set of packages?

Imagine that each package we need is represented as an axis on a graph, with its versions as the tick marks (). Each point on the graph is then a possible combination of package versions. We can block out regions of this graph using the constraints on the package versions; whatever points are left when we're done are legal installations.

Allowable versions
Finding allowable combinations of package versions.

For example, suppose we have the set of requirements shown in . There are 18 possible configurations (2 for X × 3 for Y × 3 for Z) but 16 are excluded by various incompatibilities. Of the two remaining possibilities, X/2 + Y/3 + Z/3 is strictly greater than X/2 + Y/2 + Z/2, so we would probably choose the former (). if we wound up with A/1 + B/2 versus A/2 + B/1, we would have to add rules about how to resolve ties.

Example package dependencies.
Package Requires
X/1 Y/1-2
X/1 Z/1
X/2 Y/2-3
X/2 Z/1-2
Y/1 Z/2
Y/2 Z/2-3
Y/3 Z/3
Z/1
Z/2
Z/3
Result for example package dependencies.
X Y Z Excluded
1 1 1 Y/1 - Z/1
1 1 2 X/1 - Z/2
1 1 3 X/1 - Z/3
1 2 1 Y/2 - Z/1
1 2 2 X/1 - Z/2
1 2 3 X/1 - Z/3
1 3 1 X/1 - Y/3
1 3 2 X/1 - Y/3
1 3 3 X/1 - Y/3
2 1 1 X/2 - Y/1
2 1 2 X/2 - Y/1
2 1 3 X/2 - Y/1
2 2 1 Y/2 - Z/1
2 2 2
2 2 3 X/2 - Z/3
2 3 1 Y/3 - Z/1
2 3 2 Y/3 - Z/2
2 3 3 X/2 - Z/3

To construct this table we found the transitive closure of all packages plus all of their dependencies. We then picked two and created a list of valid pairs. Choosing a third, we cross off pairs that can't be satisfied to leave triples of legal combinations. We repeat this until all packages are included in our table.

In the worst case this will create a combinatorial explosion of possibilities. Smart algorithms will try to pick additions that minimize the number of new possibilities added, or create pairs and then combine them to create pairs of pairs and so on. Our algorithm will be simple (and therefore slow), but illustrates the key idea.

How can we implement constraint satisfaction?

To avoid messing around with parsers, our programs reads a JSON data structure describing the problem. A real package manager would read the manifests of the packages needed and construct this structure. We will stick to single-digit version numbers for readability, and will use this as our first test case:

{
  "X": {
    "1": {
      "Y": ["1"]
    },
    "2": {
      "Y": ["2"]
    }
  },
  "Y": {
    "1": {},
    "2": {}
  }
}

Comments

If you ever design a data format, please include a standard way for people to add comments, because they will always want to. YAML has this, but JSON CSV don't.

To check if a configuration of specific versions of all packages is compatible with a manifest, we add each package to our active list in turn and look for violations. If there aren't any more packages on the active list and we haven't found a violation, then what we have must be a legal configuration.

import configStr from './config-str.js'

const sweep = (manifest) => {
  const names = Object.keys(manifest)
  const result = []
  recurse(manifest, names, {}, result)
}

const recurse = (manifest, names, config, result) => {
  if (names.length === 0) {
    if (allows(manifest, config)) {
      result.push({ ...config })
    }
  } else {
    const next = names[0]
    const rest = names.slice(1)
    for (const version in manifest[next]) {
      config[next] = version
      recurse(manifest, rest, config, result)
    }
  }
}
...
export default sweep

The simplest way to find configurations is to sweep over all possibilities. For debugging purposes, our function prints possibilities as it goes:

const allows = (manifest, config) => {
  for (const [leftN, leftV] of Object.entries(config)) {
    const requirements = manifest[leftN][leftV]
    for (const [rightN, rightVAll] of Object.entries(requirements)) {
      if (!rightVAll.includes(config[rightN])) {
        const title = configStr(config)
        const missing = config[rightN]
        console.log(`${title} @ ${leftN}/${leftV} ${rightN}/${missing}`)
        return false
      }
    }
  }
  console.log(configStr(config))
  return true
}

If we run this program on the two-package example shown earlier we get this output:

node driver.js ./sweep.js double-chained.json
{X:1 Y:1}
{X:1 Y:2} @ X/1 Y/2
{X:2 Y:1} @ X/2 Y/1
{X:2 Y:2}

When we run it on our triple-package example we get this:

node driver.js ./sweep.js triple.json
{X:1 Y:1 Z:1} @ Y/1 Z/1
{X:1 Y:1 Z:2} @ X/1 Z/2
{X:1 Y:1 Z:3} @ X/1 Z/3
{X:1 Y:2 Z:1} @ Y/2 Z/1
{X:1 Y:2 Z:2} @ X/1 Z/2
{X:1 Y:2 Z:3} @ X/1 Z/3
{X:1 Y:3 Z:1} @ X/1 Y/3
{X:1 Y:3 Z:2} @ X/1 Y/3
{X:1 Y:3 Z:3} @ X/1 Y/3
{X:2 Y:1 Z:1} @ X/2 Y/1
{X:2 Y:1 Z:2} @ X/2 Y/1
{X:2 Y:1 Z:3} @ X/2 Y/1
{X:2 Y:2 Z:1} @ Y/2 Z/1
{X:2 Y:2 Z:2}
{X:2 Y:2 Z:3} @ X/2 Z/3
{X:2 Y:3 Z:1} @ Y/3 Z/1
{X:2 Y:3 Z:2} @ Y/3 Z/2
{X:2 Y:3 Z:3} @ X/2 Z/3

This works, but it is doing a lot of unnecessary work. If we sort the output by the case that caught the exclusion it turns out that 9 of the 17 exclusions are redundant rediscovery of a previous-known problem:

Package exclusions.
Excluded X Y Z
X/1 - Y/3 1 3 1
1 3 2
1 3 3
X/1 - Z/2 1 1 2
1 2 2
X/1 - Z/3 1 1 3
1 2 3
X/2 - Y/1 2 1 1
2 1 2
2 1 3
X/2 - Z/3 2 2 3
2 3 3
Y/1 - Z/1 1 1 1
Y/2 - Z/1 1 2 1
2 2 1
Y/3 - Z/1 2 3 1
2 3 2
2 2 2

How can we do less work?

In order to make this more efficient we need to prune the search tree as we go along (). After all, if X and Y are incompatible, there is no need to check Z.

Pruning the search tree
Pruning options in the search tree to reduce work.

This version of the program collects possible solutions and displays them at the end. It only keeps checking a partial solution if what it has found so far looks good:

import configStr from './config-str.js'

const prune = (manifest) => {
  const names = Object.keys(manifest)
  const result = []
  recurse(manifest, names, {}, result)
  for (const config of result) {
    console.log(configStr(config))
  }
}

const recurse = (manifest, names, config, result) => {
  if (names.length === 0) {
    result.push({ ...config })
  } else {
    const next = names[0]
    const rest = names.slice(1)
    for (const version in manifest[next]) {
      config[next] = version
      if (compatible(manifest, config)) {
        recurse(manifest, rest, config, result)
      }
      delete config[next]
    }
  }
}
...
const report = (config, leftN, leftV, rightN, rightV) => {
  const title = configStr(config)
  console.log(`${title} @ ${leftN}/${leftV} ${rightN}/${rightV}`)
}

export default prune

The compatible function checks to see if adding something will leave us with a consistent configuration:

const compatible = (manifest, config) => {
  for (const [leftN, leftV] of Object.entries(config)) {
    const leftR = manifest[leftN][leftV]
    for (const [rightN, rightV] of Object.entries(config)) {
      if ((rightN in leftR) && (!leftR[rightN].includes(rightV))) {
        report(config, leftN, leftV, rightN, rightV)
        return false
      }
      const rightR = manifest[rightN][rightV]
      if ((leftN in rightR) && (!rightR[leftN].includes(leftV))) {
        report(config, leftN, leftV, rightN, rightV)
        return false
      }
    }
  }
  return true
}

Checking as we go gets us from 18 complete solutions to 11. One is workable and two are incomplete (representing 6 possible solutions that we didn't need to finish):

{X:1 Y:1 Z:1} @ Y/1 Z/1
{X:1 Y:1 Z:2} @ X/1 Z/2
{X:1 Y:1 Z:3} @ X/1 Z/3
{X:1 Y:2 Z:1} @ Y/2 Z/1
{X:1 Y:2 Z:2} @ X/1 Z/2
{X:1 Y:2 Z:3} @ X/1 Z/3
{X:1 Y:3} @ X/1 Y/3
{X:2 Y:1} @ X/2 Y/1
{X:2 Y:2 Z:1} @ Y/2 Z/1
{X:2 Y:2 Z:3} @ X/2 Z/3
{X:2 Y:3 Z:1} @ Y/3 Z/1
{X:2 Y:3 Z:2} @ Y/3 Z/2
{X:2 Y:3 Z:3} @ X/2 Z/3
{X:2 Y:2 Z:2}

Another way to look at the work is the number of steps in the search. The full search had 18×3 = 54 steps. Pruning leaves us with (12×3) + (2×2) = 40 steps so we have eliminated roughly 1/4 of the work.

What if we searched in the reverse order?

import configStr from './config-str.js'

// <reverse>
const reverse = (manifest) => {
  const names = Object.keys(manifest)
  names.reverse()
  const result = []
  recurse(manifest, names, {}, result)
  for (const config of result) {
    console.log(configStr(config))
  }
}
// </reverse>

const recurse = (manifest, names, config, result) => {
  if (names.length === 0) {
    result.push({ ...config })
  } else {
    const next = names[0]
    const rest = names.slice(1)
    for (const version in manifest[next]) {
      config[next] = version
      if (compatible(manifest, config)) {
        recurse(manifest, rest, config, result)
      }
      delete config[next]
    }
  }
}

const compatible = (manifest, config) => {
  for (const [leftN, leftV] of Object.entries(config)) {
    const leftR = manifest[leftN][leftV]
    for (const [rightN, rightV] of Object.entries(config)) {
      if ((rightN in leftR) && (!leftR[rightN].includes(rightV))) {
        report(config, leftN, leftV, rightN, rightV)
        return false
      }
      const rightR = manifest[rightN][rightV]
      if ((leftN in rightR) && (!rightR[leftN].includes(leftV))) {
        report(config, leftN, leftV, rightN, rightV)
        return false
      }
    }
  }
  return true
}

const report = (config, leftN, leftV, rightN, rightV) => {
  const title = configStr(config)
  console.log(`${title} @ ${leftN}/${leftV} ${rightN}/${rightV}`)
}

export default reverse
{Z:1 Y:1} @ Z/1 Y/1
{Z:1 Y:2} @ Z/1 Y/2
{Z:1 Y:3} @ Z/1 Y/3
{Z:2 Y:1 X:1} @ Z/2 X/1
{Z:2 Y:1 X:2} @ Y/1 X/2
{Z:2 Y:2 X:1} @ Z/2 X/1
{Z:2 Y:3} @ Z/2 Y/3
{Z:3 Y:1} @ Z/3 Y/1
{Z:3 Y:2 X:1} @ Z/3 X/1
{Z:3 Y:2 X:2} @ Z/3 X/2
{Z:3 Y:3 X:1} @ Z/3 X/1
{Z:3 Y:3 X:2} @ Z/3 X/2
{Z:2 Y:2 X:2}

Now we have (8×3) + (5×2) = 34 steps, i.e., we have eliminated roughly 1/3 of the work. That may not seem like a big difference, but if we go five levels deep at the same rate it cuts the work in half. There are lots of heuristics for searching trees. None are guaranteed to give better performance in every case, but most will give better performance in most cases.

Exercises

Comparing semantic versions

Write a function that takes an array of semantic version specifiers and sorts them in ascending order. Remember that 2.1 is greater than 1.99.

Parsing semantic versions

Using the techniques of , write a parser for a subset of the semantic versioning specification.

Using scoring functions

Many different combinations of package versions can be mutually compatible. One way to decide which actual combination to install is to create a scoring function that measures how good or bad a particular combination is. For example, a function could measure the "distance" between two versions as:

const score (X, Y) => {
  if (X.major !== Y.major) {
    return 100 * abs(X.major - Y.major)
  } else if (X.minor !== Y.minor) {
    return 10 * abs(X.minor - Y.minor)
  } else {
    return abs(X.patch - Y.patch)
  }
}
  1. Implement a working version of this function and use it to measure the total distance between the set of packages found by the solver and the set containing the most recent version of each package.

  2. Explain why this doesn't actually solve the original problem.

Using full semantic versions

Modify the constraint solver to use full semantic versions instead of single digits.

Regular releases

Some packages release new versions on a regular cycle, e.g., Version 2021.1 is released on March 1 of 2021, Version 2021.2 is released on September 1 of that year, version 2022.1 is released on March 1 of the following year, and so on.

  1. How does this make package management easier?

  2. How does it make it more difficult?

Writing unit tests

Write unit tests for the constraint solver using Mocha.

Generating test fixtures

Write a function that creates fixtures for testing the constraint solver:

  1. Its first argument is an object whose keys are (fake) package names and whose values are integers indicating the number of versions of that package to include in the test set, such as {'left': 3, 'middle': 2, 'right': 15}. Its second argument is a seed for random number generation.

  2. It generates one valid configuration, such as {'left': 2, 'middle': 2, 'right': 9}. (This is to ensure that there is at least one installable set of packages.)

  3. It then generates random constraints between the packages. (These may or may not result in other installable combinations.) When this is done, it adds constraints so that the valid configuration from the previous step is included.

Searching least first

Rewrite the constraint solver so that it searches packages by looking at those with the fewest available versions first. Does this reduce the amount of work done for the small examples in this chapter? Does it reduce the amount of work done for larger examples?

Using generators

Rewrite the constraint solver to use generators.

Using exclusions

  1. Modify the constraint solver so that it uses a list of package exclusions instead of a list of package requirements, i.e., its input tells it that version 1.2 of package Red can not work with versions 3.1 and 3.2 of package Green (which implies that Red 1.2 can work with any other versions of Green).

  2. Explain why package managers aren't built this way.