Build Manager

Updating files that depend on other files

Terms defined: Template Method pattern, automatic variable, build manager, build recipe, build rule, build target, compiled language, cycle (in a graph), dependency, directed acyclic graph, driver, interpreted language, link (a program), pattern rule, runnable documentation, stale (in build), topological order

Suppose we are using a page templating system to create a website (). If we a change a single page our tool should translate it, but shouldn't waste time translating others. If we change a template, on the other hand, the tool should realize that every page in the site is potentially affected and automatically re-translate all of them.

Choosing what actions to take based on how files depend on one another is a common pattern. For example, programs in compiled languages like C and Java have to be translated into lower-level forms before they can run. In fact, there are usually two stages to the translation: compiling each source file into some intermediate form, and then linking the compiled modules to each other and to libraries to create a runnable program (). If a source file hasn't changed, there's no need to recompile it before linking.

Compiling and linking
Compiling source files and linking the resulting modules.

A build manager takes a description of what depends on what, figures out which files are out of date, determines an order in which to rebuild things, and then executes any necessary steps. Originally created to manage compilation, they are also useful for programs written in interpreted languages like JavaScript when we want to bundle multiple modules into a single loadable file () or re-create documentation from source code (). In this chapter we will create a simple build manager based on Make, Bajel, Jake, and other systems discussed in Smith2011.

What's in a build manager?

The input to a build manager is a set of rules, each of which has:

The target of one rule can be a dependency of another rule, so the relationships between the files form a directed acyclic graph or DAG (). The graph is directed because "A depends on B" is a one-way relationship; it cannot contain cycles (or loops) because if something depends on itself we can never finish updating it. We say that a target is stale if it is older than any of its dependencies. When this happens, we use the recipes to bring it up to date.

Respecting dependencies
How a build manager finds and respects dependencies.

Our build manager must:

  1. Read a file containing rules.

  2. Construct the dependency graph.

  3. Figure out which targets are stale.

  4. Build those targets, making sure to build things before anything that depends on them is built.

Topological order

A topological ordering of a graph arranges the nodes so that every node comes after everything it depends on. For example, if A depends on both B and C, then (B, C, A) and (C, B, A) are both valid topological orders of the graph.

Where should we start?

We will store our rules in YAML files like this:

- target: A
  depends:
  - B
  - C
  recipes:
  - "update A from B and C"
- target: B
  depends:
  - C
  recipes:
  - "update B from C"
- target: C
  depends: []
  recipes: []

We could equally well have used JSON, but it wouldn't have made sense to use CSV: rules have a nested structure, and CSV doesn't represent nesting particularly gracefully.

We are going to create our build manager in stages, so we start by writing a simple driver that loads a JavaScript source file, creates an object of whatever class that file exports, and runs the .build method of that object with the rest of the command-line parameters:

const main = async () => {
  const BuilderClass = (await import(process.argv[2])).default
  const builder = new BuilderClass(...process.argv.slice(3))
  try {
    builder.build()
  } catch (err) {
    console.error('Build failed:', err)
  }
}

main()

We use the import function to dynamically load files containing in as well. It only saves us a few lines of code in this case, but we will use this idea of a general-purpose driver for larger programs in future chapters.

To work with our driver, each version of our build manager must be a class that satisfies two requirements:

  1. Its constructor must take a configuration file as an argument.

  2. It must provide a build method that needs no arguments.

The build method must create a graph from the configuration file, check that it does not contain any cycles, and then run whatever commands are needed to update stale targets. Just as we built a generic Visitor class in , we can build a generic base class for our build manager that does these steps in this order without actually implementing any of them:

import assert from 'assert'

class SkeletonBuilder {
  constructor (configFile) {
    this.configFile = configFile
  }

  build () {
    this.loadConfig()
    this.buildGraph()
    this.checkCycles()
    this.run()
  }

  loadConfig () {
    assert(false, 'not implemented')
  }

  buildGraph () {
    assert(false, 'not implemented')
  }

  checkCycles () {
    assert(false, 'not implemented')
  }

  run () {
    assert.fail('run method not implemented')
  }
}

export default SkeletonBuilder

This is an example of the Template Method design pattern: the parent class defines the order of the steps and child classes fill them in (). This design pattern ensures that every child does the same things in the same order, even if the details of how vary from case to case.

Template Method pattern
The Template Method pattern in action.

We would normally implement all of the methods required by the build method at the same time, but to make the evolving code easier to follow we will write them them one by one. The loadConfig method loads the configuration file as the builder object is being constructed:

import assert from 'assert'
import fs from 'fs'
import yaml from 'js-yaml'

import SkeletonBuilder from './skeleton-builder.js'

class ConfigLoader extends SkeletonBuilder {
  loadConfig () {
    this.config = yaml.safeLoad(fs.readFileSync(this.configFile, 'utf-8'))

    assert(Array.isArray(this.config),
      'Configuration must be array')

    this.config.forEach(rule => {
      assert(('target' in rule) && (typeof rule.target === 'string'),
        `Rule ${JSON.stringify(rule)} does not string as 'target'`)

      assert(('depends' in rule) &&
        Array.isArray(rule.depends) &&
        rule.depends.every(dep => (typeof dep === 'string')),
        `Bad 'depends' for rule ${JSON.stringify(rule)}`)

      assert(('recipes' in rule) &&
        Array.isArray(rule.recipes) &&
        rule.recipes.every(recipe => (typeof recipe === 'string')),
        `Bad 'recipes' for rule ${JSON.stringify(rule)}`)
    })
  }
}

export default ConfigLoader

The first line does the loading; the rest of the method checks that the rules are at least superficially plausible. We need these checks because YAML is a generic file format that doesn't know anything about the extra requirements of our rules. And as we first saw in , we have to specify that the character encoding of our file is UTF-8 so that JavaScript knows how to convert bytes into text.

The next step is to turn the configuration into a graph in memory. We use the graphlib module to manage nodes and links rather than writing our own classes for graphs, and store the recipe to rebuild a node in that node. Two features of graphlib that took us a while to figure out are that:

  1. links go from the dependency to the target, and

  2. setEdge automatically adds nodes if they aren't already present.

graphlib provides implementations of some common graph algorithms, including one to check for cycles, so we might as well write that method at this point as well:

import assert from 'assert'
import graphlib from '@dagrejs/graphlib'

import ConfigLoader from './config-loader.js'

class GraphCreator extends ConfigLoader {
  buildGraph () {
    this.graph = new graphlib.Graph()
    this.config.forEach(rule => {
      this.graph.setNode(rule.target, {
        recipes: rule.recipes
      })
      rule.depends.forEach(dep => this.graph.setEdge(dep, rule.target))
    })
  }

  checkCycles () {
    const cycles = graphlib.alg.findCycles(this.graph)
    assert.strictEqual(cycles.length, 0,
      `Dependency graph contains cycles ${cycles}`)
  }
}

export default GraphCreator

We can now create something that displays our configuration when it runs but does nothing else:

import graphlib from '@dagrejs/graphlib'

import GraphCreator from './graph-creator.js'

class DisplayOnly extends GraphCreator {
  run () {
    console.log('Graph')
    console.log(graphlib.json.write(this.graph))
    console.log('Sorted')
    console.log(graphlib.alg.topsort(this.graph))
  }
}

export default DisplayOnly

If we run this with our three simple rules as input, it shows the graph with v and w keys to represent the ends of the links:

node driver.js ./display-only.js three-simple-rules.yml

Graph
{
  options: { directed: true, multigraph: false, compound: false },
  nodes: [
    { v: 'A', value: [Object] },
    { v: 'B', value: [Object] },
    { v: 'C', value: [Object] }
  ],
  edges: [ { v: 'B', w: 'A' }, { v: 'C', w: 'A' }, { v: 'C', w: 'B' } ]
}
Sorted
[ 'C', 'B', 'A' ]

Let's write a quick test to make sure the cycle detector works as intended:

- target: A
  depends:
  - B
  recipes:
  - "update A from B"
- target: B
  depends:
  - A
  recipes:
  - "update B from A"

node driver.js ./display-only.js circular-rules.yml

Build failed: AssertionError [ERR_ASSERTION]: Dependency graph contains \
 cycles B,A
    at DisplayOnly.checkCycles \
     (/u/stjs/build-manager/graph-creator.js:19:12)
    at DisplayOnly.build \
     (/u/stjs/build-manager/skeleton-builder.js:11:10)
    at main (/u/stjs/build-manager/driver.js:5:13) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: 1,
  expected: 0,
  operator: 'strictEqual'
}

How can we specify that a file is out of date?

The next step is to figure out which files are out of date. Make does this by comparing the timestamps of the files in question, but this isn't always reliable: computers' clocks may be slightly out of sync, which can produce a wrong answer on a networked filesystem, and the operating system may only report file update times to the nearest millisecond (which seemed very short in 1970 but seems very long today).

More modern build systems store a hash of each file's contents and compare the current hash to the stored one to see if the file has changed. Since we already looked at hashing in , we will use the timestamp approach here. And instead of using a mock filesystem as we did in , we will simply load another configuration file that specifies fake timestamps for files:

A: 2
B: 5
C: 8

Since we want to associate those timestamps with files, we add a step to buildGraph to read the timestamp file and add information to the graph's nodes:

import assert from 'assert'
import fs from 'fs'
import yaml from 'js-yaml'

import GraphCreator from './graph-creator.js'

class AddTimestamps extends GraphCreator {
  constructor (configFile, timesFile) {
    super(configFile)
    this.timesFile = timesFile
  }

  buildGraph () {
    super.buildGraph()
    this.addTimestamps()
  }

  addTimestamps () {
    const times = yaml.safeLoad(fs.readFileSync(this.timesFile, 'utf-8'))
    for (const node of Object.keys(times)) {
      assert(this.graph.hasNode(node),
             `Graph does not have node ${node}`)
      this.graph.node(node).timestamp = times[node]
    }
    const missing = this.graph.nodes().filter(
      n => !('timestamp' in this.graph.node(n))
    )
    assert.strictEqual(missing.length, 0,
      `Timestamp missing for node(s) ${missing}`)
  }

  run () {
    console.log(this.graph.nodes().map(
      n => `${n}: ${JSON.stringify(this.graph.node(n))}`
    ))
  }
}

export default AddTimestamps

Not quite what we were expecting

The steps defined in SkeletonBuilder.build don't change when we do this, so people reading the code don't have to change their mental model of what it does overall. However, if we had realized in advance that we were going to want to add timestamps from a file, we would probably have added a step for that in the template method. And if someone ever wants to inject a new step between building the graph and adding timestamps, they will have to override addTimestamps and put their step at the top before calling super.addTimestamps, which will make the code a lot harder to understand. We will reflect on this in the last section of this chapter.

Before we move on, let's make sure that adding timestamps works as we want:

node driver.js ./add-timestamps.js three-simple-rules.yml add-timestamps.yml

[
  'A: {"recipes":["update A from B and C"],"timestamp":2}',
  'B: {"recipes":["update B from C"],"timestamp":5}',
  'C: {"recipes":[],"timestamp":8}'
]

How can we update out-of-date files?

To figure out which recipes to execute and in which order, we set the pretended current time to the latest time of any file, then look at each file in topological order. If a file is older than any of its dependencies, we update the file and its pretended timestamp to trigger an update of anything that depends on it.

We can pretend that updating a file always takes one unit of time, so we advance our fictional clock by one for each build. Using graphlib.alg.topsort to create the topological order, we get this:

import graphlib from '@dagrejs/graphlib'

import AddTimestamps from './add-timestamps.js'

class UpdateOnTimestamps extends AddTimestamps {
  run () {
    const sorted = graphlib.alg.topsort(this.graph)
    const startTime = 1 + Math.max(...sorted.map(
      n => this.graph.node(n).timestamp))
    console.log(`${startTime}: START`)
    const endTime = sorted.reduce((currTime, node) => {
      if (this.isStale(node)) {
        console.log(`${currTime}: ${node}`)
        this.graph.node(node).recipes.forEach(
          a => console.log(`    ${a}`))
        this.graph.node(node).timestamp = currTime
        currTime += 1
      }
      return currTime
    }, startTime)
    console.log(`${endTime}: END`)
  }

  isStale (node) {
    return this.graph.predecessors(node).some(
      other => this.graph.node(other).timestamp >=
        this.graph.node(node).timestamp
    )
  }
}

export default UpdateOnTimestamps

The run method:

  1. Gets a sorted list of nodes.

  2. Sets the starting time to be one unit past the largest file time.

  3. Uses Array.reduce to operate on each node (i.e., each file) in order. If that file is stale, we print the steps we would run and then update the file's timestamp. We only advance the notional current time when we do an update.

In order to check if a file is stale, we see if any of its dependencies currently have timestamps greater than or equal to its. When we run this, it seems to do the right thing:

node driver.js ./update-timestamps.js three-simple-rules.yml add-timestamps.yml

9: START
9: B
    update B from C
10: A
    update A from B and C
11: END

How can we add generic build rules?

If our website has a hundred blog posts or a hundred pages of documentation about particular JavaScript files, we don't want to have to write a hundred nearly-identical recipes. Instead, we want to be able to write generic build rules that say, "Build all things of this kind the same way." These generic rules need:

We will achieve this by overriding buildGraph to replace variables in recipes with values. Once again, object-oriented programming helps us change only what we need to change, provided we divided our problem into sensible chunks in the first place.

Make provides automatic variables with names like $< and $@ to represent the parts of a rule. Our variables will be more readable: we will use @TARGET for the target, @DEPENDENCIES for the dependencies (in order), and @DEP[1], @DEP[2], and so on for specific dependencies ().

Pattern rules
Turning patterns rules into runnable commands.

Our variable expander looks like this:

import UpdateOnTimestamps from './update-timestamps.js'

class VariableExpander extends UpdateOnTimestamps {
  buildGraph () {
    super.buildGraph()
    this.expandVariables()
  }

  expandVariables () {
    this.graph.nodes().forEach(target => {
      try {
        const dependencies = this.graph.predecessors(target)
        const recipes = this.graph.node(target).recipes
        this.graph.node(target).recipes = recipes.map(act => {
          act = act
            .replace('@TARGET', target)
            .replace('@DEPENDENCIES', dependencies.join(' '))
          dependencies.forEach((dep, i) => {
            act = act.replace(`@DEP[${i}]`, dependencies[i])
          })
          return act
        })
      } catch (error) {
        console.error(`Cannot find ${target} in graph`)
        process.exit(1)
      }
    })
  }
}

export default VariableExpander

The first thing we do is test that it works when there aren't any variables to expand by running it on the same example we used previously:

9: START
9: B
    update B from C
10: A
    update A from B C
11: END

This is perhaps the most important reason to create tests: they tell us right away if something we have added or changed has broken something that used to work. That gives us a firm base to build on as we debug the new code.

Now we need to add pattern rules. Our first attempt at a rules file looks like this:

- target: left.out
  depends: []
  recipes: []
  timestamp: 1
- target: left.in
  depends: []
  recipes: []
  timestamp: 2
- target: right.out
  depends: []
  recipes: []
  timestamp: 1
- target: right.in
  depends: []
  recipes: []
  timestamp: 3
- target: "%.out"
  depends:
  - "%.in"
  recipes:
  - "update @TARGET from @DEPENDENCIES"

and our first attempt at reading it extracts rules before expanding variables:

import VariableExpander from './variable-expander.js'

class PatternUserAttempt extends VariableExpander {
  buildGraph () {
    super.buildGraph()
    this.extractRules()
    this.expandVariables()
  }

  extractRules () {
    this.rules = new Map()
    this.graph.nodes().forEach(target => {
      if (target.includes('%')) {
        const data = {
          recipes: this.graph.node(target).recipes
        }
        this.rules.set(target, data)
      }
    })
    this.rules.forEach((value, key) => {
      this.graph.removeNode(key)
    })
  }
}

export default PatternUserAttempt

However, that doesn't work:

Build failed: AssertionError [ERR_ASSERTION]: Graph does not have node A
    at PatternUserAttempt.addTimestamps \
    (/u/stjs/build-manager/add-timestamps.js:21:7)
    at PatternUserAttempt.buildGraph \
    (/u/stjs/build-manager/add-timestamps.js:15:10)
    at PatternUserAttempt.buildGraph \
    (/u/stjs/build-manager/variable-expander.js:5:11)
    at PatternUserAttempt.buildGraph \
    (/u/stjs/build-manager/pattern-user-attempt.js:5:11)
    at PatternUserAttempt.build \
    (/u/stjs/build-manager/skeleton-builder.js:10:10)
    at main (/u/stjs/build-manager/driver.js:5:13) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}

The problem is that our simple graph loader creates nodes for dependencies even if they aren't targets. As a result, we wind up tripping over the lack of a node for %.in before we get to extracting rules.

Errors become assertions

When we first wrote add-timestamps.js, it didn't contain the assertion that printed the error message shown above. Once we tracked down our bug, though, we added the assertion to ensure we didn't make the same mistake again, and as runnable documentation to tell the next programmer more about the code. Regular code tells the computer what to do; assertions with meaningful error messages tell the reader why.

We can fix our problem by rewriting the rule loader to separate pattern rules from simple rules; we can tell the two apart by checking if the rule's dependencies include %. While we're here, we will enable timestamps as an optional field in the rules for testing purposes rather than having them in a separate file:

import assert from 'assert'
import graphlib from '@dagrejs/graphlib'

import VariableExpander from './variable-expander.js'

class PatternUserRead extends VariableExpander {
  buildGraph () {
    this.buildGraphAndRules()
    this.expandVariables()
  }

  buildGraphAndRules () {
    this.graph = new graphlib.Graph()
    this.rules = new Map()
    this.config.forEach(rule => {
      if (rule.target.includes('%')) {
        const data = {
          recipes: rule.recipes,
          depends: rule.depends
        }
        this.rules.set(rule.target, data)
      } else {
        const timestamp = ('timestamp' in rule)
          ? rule.timestamp
          : null
        this.graph.setNode(rule.target, {
          recipes: rule.recipes,
          timestamp: timestamp
        })
        rule.depends.forEach(dep => {
          assert(!dep.includes('%'),
            'Cannot have "%" in a non-pattern rule')
          this.graph.setEdge(dep, rule.target)
        })
      }
    })
  }
}

export default PatternUserRead

Before we try to run this, let's add methods to show the state of our two internal data structures:

import graphlib from '@dagrejs/graphlib'

import PatternUserRead from './pattern-user-read.js'

class PatternUserShow extends PatternUserRead {
  run () {
    console.log(JSON.stringify(this.toJSON(), null, 2))
  }

  toJSON () {
    return {
      graph: graphlib.json.write(this.graph),
      rules: Array.from(this.rules.keys()).map(key => {
        return { k: key, v: this.rules.get(key) }
      })
    }
  }
}

export default PatternUserShow

node driver.js ./pattern-user-show.js pattern-rules.yml

{
  "graph": {
    "options": {
      "directed": true,
      "multigraph": false,
      "compound": false
    },
    "nodes": [
      {
        "v": "left.out",
        "value": {
          "recipes": [],
          "timestamp": 1
        }
      },
      {
        "v": "left.in",
        "value": {
          "recipes": [],
          "timestamp": 2
        }
      },
      {
        "v": "right.out",
        "value": {
          "recipes": [],
          "timestamp": 1
        }
      },
      {
        "v": "right.in",
        "value": {
          "recipes": [],
          "timestamp": 3
        }
      }
    ],
    "edges": []
  },
  "rules": [
    {
      "k": "%.out",
      "v": {
        "recipes": [
          "update @TARGET from @DEPENDENCIES"
        ],
        "depends": [
          "%.in"
        ]
      }
    }
  ]
}

The output seems to be right, so let's try expanding rules after building the graph and rules but before expanding variables:

import PatternUserRead from './pattern-user-read.js'

class PatternUserRun extends PatternUserRead {
  buildGraph () {
    this.buildGraphAndRules()
    this.expandAllRules()
    this.expandVariables()
  }

  expandAllRules () {
    this.graph.nodes().forEach(target => {
      if (this.graph.predecessors(target).length > 0) {
        return
      }
      const data = this.graph.node(target)
      if (data.recipes.length > 0) {
        return
      }
      const rule = this.findRule(target)
      if (!rule) {
        return
      }
      this.expandRule(target, rule)
    })
  }

  findRule (target) {
    const pattern = `%.${target.split('.')[1]}`
    return this.rules.has(pattern)
      ? this.rules.get(pattern)
      : null
  }

  expandRule (target, rule) {
    const stem = target.split('.')[0]
    rule.depends
      .map(dep => dep.replace('%', stem))
      .forEach(dep => this.graph.setEdge(dep, target))
    const recipes = rule.recipes.map(act => act.replace('%', stem))
    const timestamp = this.graph.node(target).timestamp
    this.graph.setNode(target, {
      recipes: recipes,
      timestamp: timestamp
    })
  }
}

export default PatternUserRun

4: START
4: left.out
    update left.out from left.in
5: right.out
    update right.out from right.in
6: END

What should we do next?

We have added a lot of steps to our original template method, which makes it a bit of a stretch to claim that the overall operation hasn't changed. Knowing what we know now, we could go back and modify the original SkeletonBuilder.build method to include those extra steps and provide do-nothing implementations.

The root of the problem is that we didn't anticipate all the steps that would be involved when we wrote our template method. It typically takes a few child classes for this to settle down; if it never does, then Template Method is probably the wrong pattern for our situation. Realizing this isn't a failure in initial design: we always learn about our problem as we try to capture it in code, and if we know enough to anticipate 100% of the issues that are going to come up, it's time to put what we've learned in a library for future use.

Exercises

Handle failure

  1. Modify the build manager to accommodate build steps that fail.

  2. Write Mocha tests to check that this change works correctly.

Dry run

Add an option to the build manager to show what commands would be executed and why if a build were actually run. For example, the output should display things like, "'update A' because A older than B".

Change directories

Modify the build manager so that:

node build.js -C some/sub/directory rules.yml timestamps.yml

runs the build in the specified directory rather than the current directory.

Merge files

Modify the build manager so that it can read multiple configuration files and execute their combines rules.

Show recipes

Add a method to build manager to display all unique recipes, i.e., all of the commands it might execute if asked to rebuild everything.

Conditional execution

Modify the build manager so that:

  1. The user can pass variable=true and variable=false arguments on the command-line to define variables.

  2. Rules can contain an if: variable field.

  3. Those rules are only executed if the variable is defined and true.

  4. Write Mocha tests to check that this works correctly.

Define filesets

Modify the build manager so that users can define sets of files:

fileset:
  name: everything
  contains:
    - X
    - Y
    - Z

and then refer to them later:

- target: P
  depends:
  - @everything

Globbing

Modify the build manager so that it can dynamically construct a set of files:

glob:
  name: allAvailableInputs
  pattern: "./*.in"

and then refer to them later:

- target: P
  depends:
  - @allAvailableInputs

Use hashes

  1. Write a program called build-init.js that calculates a hash for every file mentioned in the build configuration and stores the hash along with the file's name in build-hash.json.

  2. Modify the build manager to compare the current hashes of files with those stored in build-hash.json in order to determine what is out of date, and to update build-hash.json each time it runs.

Auxiliary functions

  1. Modify the builder manager so that it takes an extra argument auxiliaries containing zero or more named functions:

    const builder = new ExtensibleBuilder(configFile, timesFile, {
      slice: (node, graph) => simplify(node, graph, 1)
    })
    
  2. Modify the run method to call these functions before executing the rules for a node, and to only execute the rules if all of them return true.

  3. Write Mocha tests to check that this works correctly.