Build Manager

Updating files that depend on other files

Terms defined: automatic variable, build manager, build recipe, build rule, build target, compiled language, cycle (in a graph), dependency, directed acyclic graph (DAG), driver, interpreted language, link (a program), pattern rule, runnable documentation, stale (in build), Template Method pattern, topological order

Suppose we are using a page templating system to create a website and make a change to a single page. Our tool should translate that page, but shouldn't waste time translating others. If we change one of our template, on the other hand, the tool should realize that every page in the site is potentially affected and should automatically re-translate all of them.

Taking actions based on dependencies between files comes up in many other situations. For example, programs in compiled languages like C and Java have to be translated explicitly before they can be run. In fact, there are usually two stages to the translation: compiling each source file into some intermediate form, and then linking the compiled modules to each other and to libraries to create a runnable program (). If a source file hasn't changed, there's no need to recompile it before linking unless the interface of something it depends on has changed.

Compiling and linking
Compiling source files and linking the resulting modules.

A build manager is a tool that takes a description of what depends on what, figures out what is out of date, figures out an order in which to rebuild things, and then executes any necessary steps. They are useful even for programs written in interpreted languages like JavaScript if we want to bundle multiple files into a single loadable module () or re-create documentation from source code (). In this chapter we will create a simple build manager based on Make, Bajel, Jake, and Smith2011.

What's in a build manager?

The input to a build manager is a set of rules, each of which has:

The target of one rule can be a dependency of another rule, so the relationships between the files form a directed acyclic graph or DAG (). The graph is directed because "A depends on B" is a one-way relationship; it has to be acyclic because if something depends on itself we cannot ever finish updating it.

Respecting dependencies
How a build manager finds and respects dependencies.

We say that a target is stale if it is older than any of its dependencies. When this happens, we use the recipes to bring it up to date ().

Updating files
A build manager only updates files that are stale.

Our build manager will do four things:

  1. Read a file containing rules.

  2. Construct the dependency graph.

  3. Figure out which targets are stale.

  4. Build those targets in topological order, i.e., make sure things are built before things that depend on them are built.

Where should we start?

Our rules will be stored in YAML files like this:

- target: A
  depends:
  - B
  - C
  recipes:
  - "update A from B and C"
- target: B
  depends:
  - C
  recipes:
  - "update B from C"
- target: C
  depends: []
  recipes: []

We could equally well have used JSON, but it wouldn't have made sense to use CSV: rules have a nested structure, and CSV doesn't handle that particularly gracefully.

We are going to create our build manager in stages, so we start by writing a simple driver that loads a JavaScript source file, creates an instance of whatever class that file exports, and runs that instance with the rest of the command-line parameters:

const main = async () => {
  const BuilderClass = (await import(process.argv[2])).default
  const builder = new BuilderClass(...process.argv.slice(3))
  try {
    builder.build()
  } catch (err) {
    console.error('Build failed:', err)
  }
}

main()

This only saves us a few lines of code in this case, but we will use this idea for larger programs in future chapters.

Looking at the driver, each version of our build manager must be a class whose constructor takes a configuration file as an argument and that provides a build method. That method must create a graph from the configuration file, check that it does not contain any cycles, and then run whatever commands are needed to update stale targets. Just as we built a generic Visitor class in , we can build a generic base class for our build manager that does these steps in this order without actually implementing any of them:

import assert from 'assert'

class SkeletonBuilder {
  constructor (configFile) {
    this.configFile = configFile
  }

  build () {
    this.loadConfig()
    this.buildGraph()
    this.checkCycles()
    this.run()
  }

  loadConfig () {
    assert(false, 'not implemented')
  }

  buildGraph () {
    assert(false, 'not implemented')
  }

  checkCycles () {
    assert(false, 'not implemented')
  }

  run () {
    assert.fail('run method not implemented')
  }
}

export default SkeletonBuilder

This is a simple example of the Template Method pattern (): the parent class defines the order of the steps, while child classes fill them in. This pattern ensures that every child conceptually does the same things in the same order, even if the details vary.

Template Method pattern
The Template Method pattern in action.

We would normally implement all of the methods required by the parent class's template method at the same time. For tutorial purposes, we will write them them one by one to make the evolving code more readable. The first method we implement loads the configuration file during construction:

import assert from 'assert'
import fs from 'fs'
import yaml from 'js-yaml'

import SkeletonBuilder from './skeleton-builder.js'

class ConfigLoader extends SkeletonBuilder {
  loadConfig () {
    this.config = yaml.safeLoad(fs.readFileSync(this.configFile, 'utf-8'))

    assert(Array.isArray(this.config),
      'Configuration must be array')

    this.config.forEach(rule => {
      assert(('target' in rule) && (typeof rule.target === 'string'),
        `Rule ${JSON.stringify(rule)} does not string as 'target'`)

      assert(('depends' in rule) &&
        Array.isArray(rule.depends) &&
        rule.depends.every(dep => (typeof dep === 'string')),
        `Bad 'depends' for rule ${JSON.stringify(rule)}`)

      assert(('recipes' in rule) &&
        Array.isArray(rule.recipes) &&
        rule.recipes.every(recipe => (typeof recipe === 'string')),
        `Bad 'recipes' for rule ${JSON.stringify(rule)}`)
    })
  }
}

export default ConfigLoader

The first line does the loading; the rest of the method checks that the rules are at least superficially plausible. We need these checks because YAML is a generic file format that doesn't know anything about the extra requirements of our rules.

The next step is to turn the configuration into a graph in memory. We use graphlib to manage nodes and links rather than writing our own classes for graphs, and store the recipe to rebuild a node in that node. Two features of graphlib that took us a while to figure out are:

  1. links go from the dependency to the target, and

  2. setEdge automatically adds nodes if they aren't already present.

graphlib provides implementations of some common graph algorithms, including one to check for cycles, so we decided to write that method at this point as well:

import assert from 'assert'
import graphlib from '@dagrejs/graphlib'

import ConfigLoader from './config-loader.js'

class GraphCreator extends ConfigLoader {
  buildGraph () {
    this.graph = new graphlib.Graph()
    this.config.forEach(rule => {
      this.graph.setNode(rule.target, {
        recipes: rule.recipes
      })
      rule.depends.forEach(dep => this.graph.setEdge(dep, rule.target))
    })
  }

  checkCycles () {
    const cycles = graphlib.alg.findCycles(this.graph)
    assert.strictEqual(cycles.length, 0,
      `Dependency graph contains cycles ${cycles}`)
  }
}

export default GraphCreator

We can now create something that displays our configuration when it runs but does nothing else:

import graphlib from '@dagrejs/graphlib'

import GraphCreator from './graph-creator.js'

class DisplayOnly extends GraphCreator {
  run () {
    console.log('Graph')
    console.log(graphlib.json.write(this.graph))
    console.log('Sorted')
    console.log(graphlib.alg.topsort(this.graph))
  }
}

export default DisplayOnly

If we run this with our three simple rules as input, it shows the graph with v and w keys to represent the ends of the links ():

node driver.js ./display-only.js three-simple-rules.yml
Graph
{
  options: { directed: true, multigraph: false, compound: false },
  nodes: [
    { v: 'A', value: [Object] },
    { v: 'B', value: [Object] },
    { v: 'C', value: [Object] }
  ],
  edges: [ { v: 'B', w: 'A' }, { v: 'C', w: 'A' }, { v: 'C', w: 'B' } ]
}
Sorted
[ 'C', 'B', 'A' ]
Interpreting graph display
Interpreting the textual representation of the graph.

Let's write a quick test to make sure our cycle detector actually works:

- target: A
  depends:
  - B
  recipes:
  - "update A from B"
- target: B
  depends:
  - A
  recipes:
  - "update B from A"
node driver.js ./display-only.js circular-rules.yml
Build failed: AssertionError [ERR_ASSERTION]: Dependency graph contains \
 cycles B,A
    at DisplayOnly.checkCycles \
     (/u/stjs/build-manager/graph-creator.js:19:12)
    at DisplayOnly.build \
     (/u/stjs/build-manager/skeleton-builder.js:11:10)
    at main (/u/stjs/build-manager/driver.js:5:13) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: 1,
  expected: 0,
  operator: 'strictEqual'
}

How can we specify that a file is out of date?

The next step is to figure out which files are out of date. Make did this by comparing the timestamps on the files in question, but this isn't always reliable across networks (because computers' clocks may be very slightly out of sync) and because the operating system may only report file update times to the nearest millisecond, which seemed like a very short time in 1970 but seems very long today. More modern systems store a hash of each file's contents and compare the current hash to the stored one to see if the file has changed.

Since we already looked at hashing in , we will use the timestamp approach in our design. For testing, we will use another configuration file to specify fake timestamps for files:

A: 2
B: 5
C: 8

Since we want to associate those timestamps with files, we add a step to buildGraph to read the timestamp file and add information to the graph's nodes:

import assert from 'assert'
import fs from 'fs'
import yaml from 'js-yaml'

import GraphCreator from './graph-creator.js'

class AddTimestamps extends GraphCreator {
  constructor (configFile, timesFile) {
    super(configFile)
    this.timesFile = timesFile
  }

  buildGraph () {
    super.buildGraph()
    this.addTimestamps()
  }

  addTimestamps () {
    const times = yaml.safeLoad(fs.readFileSync(this.timesFile, 'utf-8'))
    for (const node of Object.keys(times)) {
      assert(this.graph.hasNode(node),
             `Graph does not have node ${node}`)
      this.graph.node(node).timestamp = times[node]
    }
    const missing = this.graph.nodes().filter(
      n => !('timestamp' in this.graph.node(n))
    )
    assert.strictEqual(missing.length, 0,
      `Timestamp missing for node(s) ${missing}`)
  }

  run () {
    console.log(this.graph.nodes().map(
      n => `${n}: ${JSON.stringify(this.graph.node(n))}`
    ))
  }
}

export default AddTimestamps

The steps defined in SkeletonBuilder.build don't change when we do this, so people reading the code don't have to change their mental model of what it does overall. However, if we had realized in advance that we were going to want to add timestamps from a file, we would probably have added a step for that in the template method. And if someone ever wants to inject a new step between building the graph and adding timestamps, they will have to override addTimestamps and put their step at the top before calling super.addTimestamps, which will make the code a lot harder to understand. We will reflect on this in the last section of this chapter.

Before we move on, let's make sure that adding timestamps works as we want:

node driver.js ./add-timestamps.js three-simple-rules.yml add-timestamps.yml
[
  'A: {"recipes":["update A from B and C"],"timestamp":2}',
  'B: {"recipes":["update B from C"],"timestamp":5}',
  'C: {"recipes":[],"timestamp":8}'
]

How can we update out-of-date files?

To figure out which recipes to execute and in which order, we set the pretended "current time" to the latest time of any file, then look at each file from the "bottom" to the "top" in topological order. If a file is older than any of its dependencies, we update the file and its pretended timestamp to trigger an update of anything that depends on it.

We will pretend for now that updating a file takes one unit of time, so we advance our fictional clock once for each build. Using graphlib.alg.topsort to create the topological order, we get this:

import graphlib from '@dagrejs/graphlib'

import AddTimestamps from './add-timestamps.js'

class UpdateOnTimestamps extends AddTimestamps {
  run () {
    const sorted = graphlib.alg.topsort(this.graph)
    const startTime = 1 + Math.max(...sorted.map(
      n => this.graph.node(n).timestamp))
    console.log(`${startTime}: START`)
    const endTime = sorted.reduce((currTime, node) => {
      if (this.isStale(node)) {
        console.log(`${currTime}: ${node}`)
        this.graph.node(node).recipes.forEach(
          a => console.log(`    ${a}`))
        this.graph.node(node).timestamp = currTime
        currTime += 1
      }
      return currTime
    }, startTime)
    console.log(`${endTime}: END`)
  }

  isStale (node) {
    return this.graph.predecessors(node).some(
      other => this.graph.node(other).timestamp >=
        this.graph.node(node).timestamp
    )
  }
}

export default UpdateOnTimestamps

The run method:

  1. Gets a sorted list of nodes.

  2. Sets the starting time to be one unit past the largest file time.

  3. Uses Array.reduce to operate on each node (i.e., each file) in order. If that file is stale, we print the steps we would run and then update the file's timestamp. We only advance the notional current time when we do an update.

In order to check if a file is stale, we see if any of its dependencies currently have timestamps greater than or equal to its. When we run this, it seems to do the right thing:

node driver.js ./update-timestamps.js three-simple-rules.yml add-timestamps.yml
9: START
9: B
    update B from C
10: A
    update A from B and C
11: END

How can we add generic build rules?

If our website has a hundred blog posts or a hundred pages of documentation about particular JavaScript files, we don't want to have to write a hundred nearly-identical recipes. Instead, we want to be able to write generic build rules that say, "Build all things in this set the same way." To do this, we need:

We will achieve this by overriding buildGraph to replace variables in recipes with values. Once again, object-oriented programming helps us change only what we need to change, provided we divided our problem into sensible chunks in the first place.

Make provides automatic variables with names like $< and $@ to represent the parts of a rule. Our variables will be more readable: we will use @TARGET for the target, @DEPENDENCIES for the dependencies (in order), and @DEP[1], @DEP[2], and so on for specific dependencies ().

Pattern rules
Turning patterns rules into runnable commands.

Our variable expander looks like this:

import UpdateOnTimestamps from './update-timestamps.js'

class VariableExpander extends UpdateOnTimestamps {
  buildGraph () {
    super.buildGraph()
    this.expandVariables()
  }

  expandVariables () {
    this.graph.nodes().forEach(target => {
      try {
        const dependencies = this.graph.predecessors(target)
        const recipes = this.graph.node(target).recipes
        this.graph.node(target).recipes = recipes.map(act => {
          act = act
            .replace('@TARGET', target)
            .replace('@DEPENDENCIES', dependencies.join(' '))
          dependencies.forEach((dep, i) => {
            act = act.replace(`@DEP[${i}]`, dependencies[i])
          })
          return act
        })
      } catch (error) {
        console.error(`Cannot find ${target} in graph`)
        process.exit(1)
      }
    })
  }
}

export default VariableExpander

and the first thing we do is test that it works when there aren't any variables to expand by running it on the same example we used previously:

9: START
9: B
    update B from C
10: A
    update A from B C
11: END

Now we need to add pattern rules. Our first attempt at a rules file looks like this:

- target: left.out
  depends: []
  recipes: []
  timestamp: 1
- target: left.in
  depends: []
  recipes: []
  timestamp: 2
- target: right.out
  depends: []
  recipes: []
  timestamp: 1
- target: right.in
  depends: []
  recipes: []
  timestamp: 3
- target: "%.out"
  depends:
  - "%.in"
  recipes:
  - "update @TARGET from @DEPENDENCIES"

and our first attempt at reading it extracts rules before expanding variables:

import VariableExpander from './variable-expander.js'

class PatternUserAttempt extends VariableExpander {
  buildGraph () {
    super.buildGraph()
    this.extractRules()
    this.expandVariables()
  }

  extractRules () {
    this.rules = new Map()
    this.graph.nodes().forEach(target => {
      if (target.includes('%')) {
        const data = {
          recipes: this.graph.node(target).recipes
        }
        this.rules.set(target, data)
      }
    })
    this.rules.forEach((value, key) => {
      this.graph.removeNode(key)
    })
  }
}

export default PatternUserAttempt

However, that doesn't work:

Build failed: AssertionError [ERR_ASSERTION]: Graph does not have node A
    at PatternUserAttempt.addTimestamps \
     (/u/stjs/build-manager/add-timestamps.js:21:7)
    at PatternUserAttempt.buildGraph \
     (/u/stjs/build-manager/add-timestamps.js:15:10)
    at PatternUserAttempt.buildGraph \
     (/u/stjs/build-manager/variable-expander.js:5:11)
    at PatternUserAttempt.buildGraph \
     (/u/stjs/build-manager/pattern-user-attempt.js:5:11)
    at PatternUserAttempt.build \
     (/u/stjs/build-manager/skeleton-builder.js:10:10)
    at main (/u/stjs/build-manager/driver.js:5:13) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

The problem is that our simple graph loader creates nodes for dependencies even if they aren't targets. As a result, we wind up tripping over the lack of a node for %.in before we get to extracting rules.

Errors become assertions

We didn't have the assertion in add-timestamps.js that printed the error message shown above when we first wrote it. Once we tracked down our bug, though, we added the assertion to ensure we didn't make the same mistake again, and as runnable documentation to tell the next programmer more about the code. Regular code tells the computer what to do; assertions tell the reader why it's doing it.

We can fix our problem by rewriting the rule loader to separate pattern rules from simple rules; we can tell the two apart by checking if the rule's dependencies include %. While we're here, we will timestamps as an optional field in the rules for testing purposes rather than having them in a separate file:

import assert from 'assert'
import graphlib from '@dagrejs/graphlib'

import VariableExpander from './variable-expander.js'

class PatternUserRead extends VariableExpander {
  buildGraph () {
    this.buildGraphAndRules()
    this.expandVariables()
  }

  buildGraphAndRules () {
    this.graph = new graphlib.Graph()
    this.rules = new Map()
    this.config.forEach(rule => {
      if (rule.target.includes('%')) {
        const data = {
          recipes: rule.recipes,
          depends: rule.depends
        }
        this.rules.set(rule.target, data)
      } else {
        const timestamp = ('timestamp' in rule)
          ? rule.timestamp
          : null
        this.graph.setNode(rule.target, {
          recipes: rule.recipes,
          timestamp: timestamp
        })
        rule.depends.forEach(dep => {
          assert(!dep.includes('%'),
            'Cannot have "%" in a non-pattern rule')
          this.graph.setEdge(dep, rule.target)
        })
      }
    })
  }
}

export default PatternUserRead

Before we try to run this, let's add methods to show the state of our two internal data structures:

import graphlib from '@dagrejs/graphlib'

import PatternUserRead from './pattern-user-read.js'

class PatternUserShow extends PatternUserRead {
  run () {
    console.log(JSON.stringify(this.toJSON(), null, 2))
  }

  toJSON () {
    return {
      graph: graphlib.json.write(this.graph),
      rules: Array.from(this.rules.keys()).map(key => {
        return { k: key, v: this.rules.get(key) }
      })
    }
  }
}

export default PatternUserShow
node driver.js ./pattern-user-show.js pattern-rules.yml
{
  "graph": {
    "options": {
      "directed": true,
      "multigraph": false,
      "compound": false
    },
    "nodes": [
      {
        "v": "left.out",
        "value": {
          "recipes": [],
          "timestamp": 1
        }
      },
      {
        "v": "left.in",
        "value": {
          "recipes": [],
          "timestamp": 2
        }
      },
      {
        "v": "right.out",
        "value": {
          "recipes": [],
          "timestamp": 1
        }
      },
      {
        "v": "right.in",
        "value": {
          "recipes": [],
          "timestamp": 3
        }
      }
    ],
    "edges": []
  },
  "rules": [
    {
      "k": "%.out",
      "v": {
        "recipes": [
          "update @TARGET from @DEPENDENCIES"
        ],
        "depends": [
          "%.in"
        ]
      }
    }
  ]
}

The output seems to be right, so let's try expanding rules after building the graph and rules but before expanding variables:

import PatternUserRead from './pattern-user-read.js'

class PatternUserRun extends PatternUserRead {
  buildGraph () {
    this.buildGraphAndRules()
    this.expandAllRules()
    this.expandVariables()
  }

  expandAllRules () {
    this.graph.nodes().forEach(target => {
      if (this.graph.predecessors(target).length > 0) {
        return
      }
      const data = this.graph.node(target)
      if (data.recipes.length > 0) {
        return
      }
      const rule = this.findRule(target)
      if (!rule) {
        return
      }
      this.expandRule(target, rule)
    })
  }

  findRule (target) {
    const pattern = `%.${target.split('.')[1]}`
    return this.rules.has(pattern)
      ? this.rules.get(pattern)
      : null
  }

  expandRule (target, rule) {
    const stem = target.split('.')[0]
    rule.depends
      .map(dep => dep.replace('%', stem))
      .forEach(dep => this.graph.setEdge(dep, target))
    const recipes = rule.recipes.map(act => act.replace('%', stem))
    const timestamp = this.graph.node(target).timestamp
    this.graph.setNode(target, {
      recipes: recipes,
      timestamp: timestamp
    })
  }
}

export default PatternUserRun
4: START
4: left.out
    update left.out from left.in
5: right.out
    update right.out from right.in
6: END

What should we do next?

We have added a lot of steps to our original template method, which makes it a bit of a stretch to claim that the overall operation hasn't changed. Knowing what we know now, we could go back and modify the original SkeletonBuilder.build method to include those extra steps and provide do-nothing implementations.

The root of the problem is that we didn't anticipate all the steps that would be involved when we wrote our template method. It typically takes a few child classes for this to settle down; if it never does, then Template Method is probably the wrong pattern for our situation. Realizing this isn't a failure in initial design: we always learn about our problem as we try to capture it in code, and if we can anticipate 100% of the issues that are going to come up, it's time to put what we've learned in a library for future use.

Exercises

Handle failure

  1. Modify the build manager to accommodate build steps that fail.

  2. Write Mocha tests to check that this change works correctly.

Dry run

Add an option to the build manager to show what commands would be executed and why if a build were actually run. For example, the output should display things like, "'update A' because A older than B".

Change directories

Modify the build manager so that:

node build.js -C some/sub/directory rules.yml timestamps.yml

runs the build in the specified directory rather than the current directory.

Merge files

Modify the build manager so that it can read multiple configuration files and execute their combines rules.

Show recipes

Add a method to build manager to display all unique recipes, i.e., all of the commands it might execute if asked to rebuild everything.

Conditional execution

Modify the build manager so that:

  1. The user can pass variable=true and variable=false arguments on the command-line to define variables.

  2. Rules can contain an if: variable field.

  3. Those rules are only executed if the variable is defined and true.

  4. Write Mocha tests to check that this works correctly.

Define filesets

Modify the build manager so that users can define sets of files:

fileset:
  name: everything
  contains:
    - X
    - Y
    - Z

and then refer to them later:

- target: P
  depends:
  - @everything

Globbing

Modify the build manager so that it can dynamically construct a set of files:

glob:
  name: allAvailableInputs
  pattern: "./*.in"

and then refer to them later:

- target: P
  depends:
  - @allAvailableInputs

Use hashes

  1. Write a program called build-init.js that calculates a hash for every file mentioned in the build configuration and stores the hash along with the file's name in build-hash.json.

  2. Modify the build manager to compare the current hashes of files with those stored in build-hash.json in order to determine what is out of date, and to update build-hash.json each time it runs.

Auxiliary functions

  1. Modify the builder manager so that it takes an extra argument auxiliaries containing zero or more named functions:

    const builder = new ExtensibleBuilder(configFile, timesFile, {
      slice: (node, graph) => simplify(node, graph, 1)
    })
    
  2. Modify the run method to call these functions before executing the rules for a node, and to only execute the rules if all of them return true.

  3. Write Mocha tests to check that this works correctly.