Style Checker

Checking that code conforms to style guidelines

Terms defined: Adapter pattern, Iterator pattern, Markdown, abstract syntax tree, dynamic lookup, generator function, intrinsic complexity, linter, walk (a tree)

Programmers argue endlessly about the best way to format their programs, but everyone agrees that the most important thing is to be consistent Binkley2012,Johnson2019. Since checking rules by hand is tedious, most programmers use tools to compare code against various rules and report any violations. Programs that do this are often called linters in honor of an early one for C named lint (because it looked for fluff in source code).

In this chapter we will build a simple linter of our own inspired by ESLint, which we use to check the code in this book. Our tool will parse source code to create a data structure, then go through that data structure and apply rules for each part of the program. It will also introduce us to one of the key ideas of this book, which is that source code is just another kind of data.

Don't define your own style

Just as the world doesn't need more file format () it also doesn't need more programming styles, or more arguments among programmers about whether there should be spaces before curly braces or not. Standard JS may not do everything exactly the way you want, but adopting it increases the odds that other programmers will be able to read your code at first glance.

How can we parse JavaScript to create an AST?

A parser for a simple language like arithmetic or JSON is relatively easy to write. A parser for a language as complex as JavaScript is much more work, so we will use one called Acorn instead. Acorn takes a string containing source code as input and produces an abstract syntax tree (AST) whose nodes store information about what's in the program (). An AST is for a program what the DOM is for HTML: an in-memory representation that is easy for software to inspect and manipulate.

A small parse tree
The parse tree of a simple program.

ASTs can be quite complex—for example, the JSON representation of the AST for a single constant declaration is 84 lines long:

import acorn from 'acorn'

const ast = acorn.parse('const x = 0', { locations: true })
console.log(JSON.stringify(ast, null, 2))
{
  "type": "Program",
  "start": 0,
  "end": 11,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
...
            "value": 0,
            "raw": "0"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}

Acorn's output is in Esprima format (so-called because it was originally defined by a tool with that name). The format's specification is very detailed, but we can usually figure out most of what we need by inspection. For example, here is the output for a 15-line program:

import acorn from 'acorn'

const program = `const value = 2

const double = (x) => {
  const y = 2 * x
  return y
}

const result = double(value)
console.log(result)
`

const ast = acorn.parse(program, { locations: true })
console.log(JSON.stringify(ast, null, 2))
{
  "type": "Program",
  "start": 0,
  "end": 122,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
...
          "line": 1,
          "column": 0
        },
        "end": {
          "line": 1,
          "column": 15
        }
      },
      "declarations": [
...480 more lines...

Yes, it really is almost 500 lines long…

How can we find things in an AST?

If we want to find functions, variables, or anything else in an AST we need to walk the tree, i.e., to visit each node in turn. The acorn-walk library will do this for us using the Visitor design pattern we first saw in If we provide a function to act on nodes of type Identifier, acorn-walk will call that function each time it finds an identifier. We can use other options to say that we want to record the locations of nodes (i.e., their line numbers) and to collect comments in an array called onComment. Our function can do whatever we want; for demonstration purposes we will add nodes to an array called state and report them all at the end ().

Walking a tree
Walking a tree to perform an operation at each node.
import acorn from 'acorn'
import walk from 'acorn-walk'

const program = `// Constant
const value = 2

// Function
const double = (x) => {
  const y = 2 * x
  return y
}

// Main body
const result = double(value)
console.log(result)
`

const options = {
  locations: true,
  onComment: []
}
const ast = acorn.parse(program, options)

const state = []
walk.simple(ast, {
  Identifier: (node, state) => {
    state.push(node)
  }
}, null, state)

state.forEach(node => console.log(
  `identifier ${node.name} on line ${node.loc.start.line}`
))
const comments = options.onComment.map(
  node => node.loc.start.line
).join(', ')
console.log(`comments on lines ${comments}`)
identifier x on line 6
identifier y on line 7
identifier double on line 11
identifier value on line 11
identifier console on line 12
identifier result on line 12
comments on lines 1, 4, 10

There's more than one way to do it

walk.simple takes four arguments:

  1. The root node of the AST, which is used as the starting point.

  2. An object containing callback functions for handling various kinds of nodes.

  3. Another object that specifies what algorithm to use—we have set this to null to use the default because we don't particularly care about the order in which the nodes are processed.

  4. Something we want passed in to each of the node handlers, which in our case is the state array. If our node handling functions don't require any extra data from one call to the next we can leave this out; if we want to accumulate information across calls, this argument acts as the Visitor's memory.

Any general-purpose implementation of the Visitor pattern is going to need these four things, but as we will see below, we can implement them in different ways.

How can we apply checks?

We don't just want to collect nodes: we want to check their properties against a set of rules. One way to do this would be to call walk.simple once for each rule, passing it a function that checks just that rule. Another way—the one we'll use—is to write a generic function that checks a rule and records any nodes that don't satisfy it, and then call that function once for each rule inside our Identifier handler. This may see like extra work, but it ensures that all of our rule-checkers store their results in the same way, which in turn means that we can write one reporting function and be sure it will handle everything.

The function applyCheck takes the current state (where we are accumulating rule violations), a label that identifies this rule (so that violations of it can be stored together), the node, and a logical value telling it whether the node passed the test or not. If the node failed the test we make sure that state contains a list with the appropriate label and then append this node to it:


const applyCheck = (state, label, node, passes) => {
  if (!passes) {
    if (!(label in state)) {
      state[label] = []
    }
    state[label].push(node)
  }
}

This "create storage space on demand" pattern is widely used but doesn't have a well-known name.

We can now put a call to applyCheck inside the handler for Identifier:


const ast = acorn.parse(program, { locations: true })

const state = {}
walk.simple(ast, {
  Identifier: (node, state) => {
    applyCheck(state, 'name_length', node, node.name.length >= 4)
  }
}, null, state)

state.name_length.forEach(
  node => console.log(`${node.name} at line ${node.loc.start.line}`))

We can't just use applyCheck as the handler for Identifier because walk.simple wouldn't know how to call it. This is a (very simple) example of the Adapter design pattern: we write a function or class to connect the code we want to call to the already-written code that is going to call it.

The output for the same sample program as before is:

x at line 6
y at line 7

The exercises will ask why the parameter x doesn't show up as a violation of our rule that variables' names must be at least four characters long.

How does the AST walker work?

The AST walker uses the Visitor pattern, but how does it actually work? We can build our own by defining a class with methods that walk the tree, take action depending on the kind of node, and then go through the children of that node (if any). The user can then derive a class of their own from this and override the set of action methods they're interested in.

One key difference between our implementation and acorn-walk's is that our methods don't need to take state as a parameter because it's contained in the object that they're part of. That simplifies the methods—one less parameter—but it does mean that anyone who wants to use our visitor has to derive a class, which is a bit more complicated than writing a function. This tradeoff is a sign that managing state is part of the problem's intrinsic complexity: we can move it around, but we can't get rid of it.

The other difference between our visitor and acorn-walk is that our class uses dynamic lookup (a form of introspection) to look up a method with the same name as the node type in the object. While we normally refer to a particular method of an object using object.method, we can also look them up by asking for object[name] in the same way that we would look up any other property of any other object. Our completed class looks like this:


class Walker {
  // Construct a new AST tree walker.
  constructor (ast) {
    this.ast = ast
  }

  // Walk the tree.
  walk (accumulator) {
    this.stack = []
    this._walk(this.ast, accumulator)
    return accumulator
  }

  // Act on node and then on children.
  _walk (node, accumulator) {
    if (node && (typeof node === 'object') && ('type' in node)) {
      this._doNode(node, accumulator)
      this._doChildren(node, accumulator)
    }
  }

  // Handle a single node by lookup.
  _doNode (node, accumulator) {
    if (node.type in this) {
      this[node.type](node, accumulator)
    }
  }

  // Recurse for anything interesting within the node.
  _doChildren (node, accumulator) {
    this.stack.push(node)
    for (const key in node) {
      if (Array.isArray(node[key])) {
        node[key].forEach(child => {
          this._walk(child, accumulator)
        })
      } else if (typeof node[key] === 'object') {
        this._walk(node[key], accumulator)
      }
    }
    this.stack.pop(node)
  }

  // Is the current node a child of some other type of node?
  _childOf (nodeTypes) {
    return this.stack &&
      nodeTypes.includes(this.stack.slice(-1)[0].type)
  }
}

The code we need to use it is:

import acorn from 'acorn'


// Walk to accumulate variable and parameter definitions.
class VariableWalker extends Walker {
  Identifier (node, accumulator) {
    if (this._childOf(['ArrowFunctionExpression',
      'VariableDeclarator'])) {
      accumulator.push(node.name)
    }
  }
}

// Test.
const program = `const value = 2

const double = (x) => {
  const y = 2 * x
  return y
}

const result = double(value)
console.log(result)
`

const ast = acorn.parse(program, { locations: true })
const walker = new VariableWalker(ast)
const accumulator = []
walker.walk(accumulator)
console.log('definitions are', accumulator)

and its output is:

definitions are [ 'value', 'double', 'x', 'y', 'result' ]

We think this approach to implementing the Visitor pattern is easier to understand and extend than one that relies on callbacks, but that could just be a reflection of our background and experience. As with code style, the most important thing is consistency: if we implement Visitor using classes in one place, we should implement it that way everywhere.

How else could the AST walker work?

A third approach to this problem uses the Iterator design pattern. Instead of taking the computation to the nodes as a visitor does, an iterator returns the elements of a complex structure one by one for processing (). One way to think about it is that the Visitor pattern encapsulates recursion, while the Iterator pattern turns everything into a for loop.

The Iterator pattern
Finding nodes in the tree using the Iterator pattern.

We can implement the Iterator pattern in JavaScript using generator functions. If we declare a function using function * (with an asterisk) instead of function then we can use the yield keyword to return a value and suspend processing to be resumed later. The result of yield is a two-part structure with a value and a flag showing whether or not processing is done:

function * threeWords () {
  yield 'first'
  yield 'second'
  yield 'third'
}

const gen = threeWords()

console.log(gen.next())
console.log(gen.next())
console.log(gen.next())
console.log(gen.next())
{ value: 'first', done: false }
{ value: 'second', done: false }
{ value: 'third', done: false }
{ value: undefined, done: true }

A generator function doesn't actually generate anything; instead, it creates an object that we can then ask for values repeatedly. This gives us a way to have several generators in play at the same time.

As another example, this generator takes a string and produces its vowels one by one:

function * getVowels (text) {
  for (const char of text) {
    if ('AEIOUaeiou'.includes(char)) {
      yield char
    }
  }
}

const test = 'this is a test'
const gen = getVowels(test)
let current = gen.next()
while (!current.done) {
  console.log(current.value)
  current = gen.next()
}
i
i
a
e

Instead of a while loop it is much more common to use for...of, which knows how to work with generators:


for (const vowel of getVowels(test)) {
  console.log(vowel)
}

Finally, just as function * says "this function is a generator", yield * says "yield the values from a nested generator one by one". We can use it to walk irregular structures like nested arrays:

function * getNodes (here) {
  if (typeof here === 'string') {
    yield here
  } else if (Array.isArray(here)) {
    for (const child of here) {
      yield * getNodes(child)
    }
  } else {
    throw new Error(`unknown type "${typeof here}"`)
  }
}

const nested = ['first', ['second', 'third']]
for (const value of getNodes(nested)) {
  console.log(value)
}

Let's use generators to count the number of expressions of various types in a program. The generator function that visits each node is:


function * getNodes (node) {
  if (node && (typeof node === 'object') && ('type' in node)) {
    yield node
    for (const key in node) {
      if (Array.isArray(node[key])) {
        for (const child of node[key]) {
          yield * getNodes(child)
        }
      } else if (typeof node[key] === 'object') {
        yield * getNodes(node[key])
      }
    }
  }
}

and the program that uses it is:


const ast = acorn.parse(program, { locations: true })
const result = {}
for (const node of getNodes(ast)) {
  if (node.type === 'BinaryExpression') {
    if (node.operator in result) {
      result[node.operator] += 1
    } else {
      result[node.operator] = 1
    }
  }
}
console.log('counts are', result)

When we run it with our usual test program as input, we get:

counts are { '*': 2, '+': 1 }

Generators are a clean solution to many hard problems, but we find it more difficult to check variable identifiers using generators than using the class-based Visitor approach because we want to accumulate violations to report later. Again, this could be a reflection of what we're used to rather than anything intrinsic; as with coding style, the most important thing is to be consistent.

What other kinds of analysis can we do?

As one final example, consider the problem of keeping track of which methods are defined where in a deeply-nested class hierarchy. (This problem comes up in some of the later chapters in this book: we wrote so many classes that incrementally extended their predecessors for pedagogical purposes that we lost track of what was defined where.) To create a table of method definitions, we first need to find the ancestors of the last class in the hierarchy:

import assert from 'assert'
import acorn from 'acorn'
import fs from 'fs'
import path from 'path'
import walk from 'acorn-walk'

class FindAncestors {
  find (dirname, filename, className) {
    return this.traceAncestry(dirname, filename, className, [])
  }

  traceAncestry (dirname, filename, className, accum) {
    const fullPath = path.join(dirname, filename)
    const program = fs.readFileSync(fullPath, 'utf-8')
    const options = { locations: true, sourceType: 'module' }
    const ast = acorn.parse(program, options)
    const classDef = this.findClassDef(filename, ast, className)
    accum.push({ filename, className, classDef })
    const ancestorName = this.getAncestor(classDef)
    if (ancestorName === null) {
      return accum
    }
    const ancestorFile = this.findImport(filename, ast, ancestorName)
    return this.traceAncestry(dirname, ancestorFile, ancestorName, accum)
  }

}

export default FindAncestors

Finding class definitions is a straightforward extension of what we have already done:


  findClassDef (filename, ast, className) {
    const state = []
    walk.simple(ast, {
      ClassDeclaration: (node, state) => {
        if ((node.id.type === 'Identifier') &&
            (node.id.name === className)) {
          state.push(node)
        }
      }
    }, null, state)
    assert(state.length === 1,
      `No definition for ${className} in ${filename}`)
    return state[0]
  }

To test this code, we start with the last of these three short files:

class Upper {
  constructor () {
    this.name = 'upper'
  }

  report () {
    console.log(this.modify(this.name))
  }

  modify (text) {
    return text.toUpperCase()
  }
}

module.exports = Upper
import Upper from './upper.js'

class Middle extends Upper {
  constructor () {
    super()
    this.range = 'middle'
  }

  modify (text) {
    return `** ${super.modify(text)} **`
  }
}

export default Middle
import Middle from './middle.js'

class Lower extends Middle {
  report () {
    console.log(this.additional())
  }

  additional () {
    return 'lower'
  }
}

export default Lower
Lower in lower.js
Middle in ./middle.js
Upper in ./upper.js

Good: we can recover the chain of inheritance. Finding method definitions is also straightforward:

import FindAncestors from './find-ancestors.js'

class FindMethods extends FindAncestors {
  find (dirname, filename, className) {
    const classes = super.find(dirname, filename, className)
    classes.forEach(record => {
      record.methods = this.findMethods(record.classDef)
    })
    return classes
  }

  findMethods (classDef) {
    return classDef.body.body
      .filter(item => item.type === 'MethodDefinition')
      .map(item => {
        if (item.kind === 'constructor') {
          return 'constructor'
        } else if (item.kind === 'method') {
          return item.key.name
        } else {
          return null
        }
      })
      .filter(item => item !== null)
  }
}

export default FindMethods

And finally, we can print a Markdown-formatted table showing which methods are defined in which class:

| method | Upper | Middle | Lower |
| ---- | ---- | ---- | ---- |
| additional | . | . | X |
| constructor | X | X | . |
| modify | X | X | . |
| report | X | . | X |

This may seem rather pointless for our toy example, but it proves its worth when we are looking at something like the virtual machine we will build in , which has a method definition table like this:

| method | DebuggerBase | DebuggerInteractive | DebuggerTest | DebuggerExit |
| ---- | ---- | ---- | ---- | ---- |
| clear | . | X | . | . |
| constructor | X | X | X | . |
| exit | . | X | . | X |
| getCommand | . | X | . | . |
| handle | . | X | . | . |
| help | . | X | . | . |
| input | . | X | X | . |
| interact | . | X | . | . |
| list | . | X | . | . |
| message | X | . | X | . |
| next | . | X | . | . |
| print | . | X | . | . |
| run | . | X | . | . |
| setTester | . | . | X | . |
| setVM | X | . | . | . |
| stop | . | X | . | . |
| variables | . | X | . | . |

Exercises

Function length

Derive a class from Walker that reports the length in lines of each function defined in the code being checked.

Expression depth

Derive a class from Walker that reports how deep each top-level expression in the source code is. For example, the depth of 1 + 2 * 3 is 2, while the depth of max(1 + 2 + 3) is 3 (one level for the function call, one for the first addition, and one for the nested addition).

Downward and upward

Modify Walker so that users can specify one action to take at a node on the way down the tree and a separate action to take on the way up. (Hint: require users to specify Nodename_downward and/or Nodename_upward methods in their class, then use string concatenation to construct method names while traversing the tree.)

Aggregating across files

Create a command-line program called sniff.js that checks for style violations in any number of source files. The first command-line argument to sniff.js must be a JavaScript source file that exports a class derived from Walker called Check that implements the checks the user wants. The other command-line arguments must be the names of JavaScript source files to be checked:

node sniff.js my-check.js source-1.js source-2.js

Finding assertions

Write a program find-assertions.js that finds all calls to assert or assert.something and prints the assertion message (if any).

Finding a missing parameter

  1. Why doesn't the parameter x show up as a rule violation in the example where we check name lengths?

  2. Modify the example so that it does.

Finding nested indexes

Write a tool that finds places where nested indexing is used, i.e., where the program contains expression like arr[table[i]].

Dynamic lookup

  1. Write a function dynamicExecution that takes an object, the name of a method, and zero or more parameters as arguments and calls that method on that object:

    dynamicExecution(obj, 'meth', 1, 'a')
    // same as obj.meth(1, 'a')
    
  2. What doesn't this work for?

Generators and arrays

  1. Write a generator that takes a two-dimensional table represented as an array of arrays and returns the values in column-major order.

  2. Write another generator that takes a similar table and returns the values in row-major order.

Generators and identifiers

Rewrite the tool to check identifier lengths using a generator.