Style Checker

Checking that code conforms to style guidelines

Terms defined: abstract syntax tree (AST), dynamic lookup, generator function, Iterator pattern, linter, Markdown, walk (a tree)

Programmers argue endlessly about the best way to format their programs, but (almost) everyone agrees that the most important thing is to be consistent. Since checking rules by hand is tedious, most programmers rely on tools that compare code against various rules and report any violations. Tools like this are called linters in honor of an early one for C named lint because it looked for fluff in source code.

In this chapter we will build a linter of our own inspired by ESLint (which we use to check the code in this book). Our tool will parse source code to create a data structure, then go through that data structure and apply rules for each part of the program. It will also introduce us to one of the key ideas of this book, which is that source code is just another kind of data.

How can we parse JavaScript to create an AST?

A parser for a simple language like arithmetic or JSON is relatively easy to write (). A parser for a language as complex as JavaScript is much more work, so we will use one called Acorn instead. Acorn takes source code as input and produces an abstract syntax tree (AST) whose nodes store information about what's in the program (). ASTs can be quite complex---for exmaple, the JSON representation of the AST for a single constant declaration is 85 lines long:

import acorn from 'acorn'

const ast = acorn.parse('const x = 0', { locations: true })
console.log(JSON.stringify(ast, null, 2))
{
  "type": "Program",
  "start": 0,
  "end": 11,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
...
            "value": 0,
            "raw": "0"
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "script"
}
A small parse tree
The parse tree of a simple program.

Acorn's output is in Esprima format. The specification is very detailed, but we can usually figure out most of what we need by inspection. For example, here's the output for a 16-line program:

import acorn from 'acorn'

const program = `const value = 2

const double = (x) => {
  const y = 2 * x
  return y
}

const result = double(value)
console.log(result)
`

const ast = acorn.parse(program, { locations: true })
console.log(JSON.stringify(ast, null, 2))
{
  "type": "Program",
  "start": 0,
  "end": 122,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
...
          "line": 1,
          "column": 0
        },
        "end": {
          "line": 1,
          "column": 15
        }
      },
      "declarations": [
...480 more lines...

How can we find things in an AST?

If we want to find functions, variables, or anything else in an AST we need to walk the tree, i.e., to visit each node in turn. The acorn-walk library will do this for us. We provide a function to act on nodes of type Identifier and use options to say that we want to record locations and to collect comments in the array onComment. Our function can do whatever we want, so for demonstration purposes we create an array called state to record declaration nodes as they're found and then report them all at the end ().

Walking a tree
Walking a tree to perform an operation at each node.
import acorn from 'acorn'
import walk from 'acorn-walk'

const program = `// Constant
const value = 2

// Function
const double = (x) => {
  const y = 2 * x
  return y
}

// Main body
const result = double(value)
console.log(result)
`

const options = {
  locations: true,
  onComment: []
}
const ast = acorn.parse(program, options)

const state = []
walk.simple(ast, {
  Identifier: (node, state) => {
    state.push(node)
  }
}, null, state)

state.forEach(node => console.log(
  `identifier ${node.name} on line ${node.loc.start.line}`
))
const comments = options.onComment.map(
  node => node.loc.start.line
).join(', ')
console.log(`comments on lines ${comments}`)
identifier x on line 6
identifier y on line 7
identifier double on line 11
identifier value on line 11
identifier console on line 12
identifier result on line 12
comments on lines 1, 4, 10

How can we apply checks?

We don't just want to collect nodes: we want to check their properties. A little function called applyCheck accumulates results on demand, building up a collection of lists:

const applyCheck = (state, label, node, passes) => {
  if (!passes) {
    if (!(label in state)) {
      state[label] = []
    }
    state[label].push(node)
  }
}

Using this, our main program is:

const ast = acorn.parse(program, { locations: true })

const state = {}
walk.simple(ast, {
  Identifier: (node, state) => {
    applyCheck(state, 'name_length', node, node.name.length >= 4)
  }
}, null, state)

state.name_length.forEach(
  node => console.log(`${node.name} at line ${node.loc.start.line}`))

and the output for the same sample program as before is:

x at line 6
y at line 7

The exercises will ask why the parameter x doesn't show up as a violation of our rule that variables' names must be at least four characters long.

How does the AST walker work?

The AST walker uses the Visitor pattern first seen in . We can build our own by defining a class with methods that walk the tree, take action depending on the kind of node, and then go through the children of that node (if any). The user can then derive a class of their own from this and override the set of action methods they're interested in.

The key difference between our visitor and acorn-walk is that our class uses dynamic lookup to look up a method with the same name as the node type in the object. While we normally refer to a particular method of an object using object.method, we can also look them up by asking for object[name] where name is a variable or expression that produces a string. We think this approach to implementing the Visitor pattern is easier to understand and extend than one that relies on callbacks, but that's a reflection of our background and experience rather than intrinsic to the code itself.

Our class looks like this:

class Walker {
  // Construct a new AST tree walker.
  constructor (ast) {
    this.ast = ast
  }

  // Walk the tree.
  walk (accumulator) {
    this.stack = []
    this._walk(this.ast, accumulator)
    return accumulator
  }

  // Act on node and then on children.
  _walk (node, accumulator) {
    if (node && (typeof node === 'object') && ('type' in node)) {
      this._doNode(node, accumulator)
      this._doChildren(node, accumulator)
    }
  }

  // Handle a single node by lookup.
  _doNode (node, accumulator) {
    if (node.type in this) {
      this[node.type](node, accumulator)
    }
  }

  // Recurse for anything interesting within the node.
  _doChildren (node, accumulator) {
    this.stack.push(node)
    for (const key in node) {
      if (Array.isArray(node[key])) {
        node[key].forEach(child => {
          this._walk(child, accumulator)
        })
      } else if (typeof node[key] === 'object') {
        this._walk(node[key], accumulator)
      }
    }
    this.stack.pop(node)
  }

  // Is the current node a child of some other type of node?
  _childOf (nodeTypes) {
    return this.stack &&
      nodeTypes.includes(this.stack.slice(-1)[0].type)
  }
}

The code we need to use it is:

import acorn from 'acorn'
...
// Walk to accumulate variable and parameter definitions.
class VariableWalker extends Walker {
  Identifier (node, accumulator) {
    if (this._childOf(['ArrowFunctionExpression',
      'VariableDeclarator'])) {
      accumulator.push(node.name)
    }
  }
}

// Test.
const program = `const value = 2

const double = (x) => {
  const y = 2 * x
  return y
}

const result = double(value)
console.log(result)
`

const ast = acorn.parse(program, { locations: true })
const walker = new VariableWalker(ast)
const accumulator = []
walker.walk(accumulator)
console.log('definitions are', accumulator)

and its output is:

definitions are [ 'value', 'double', 'x', 'y', 'result' ]

How else could the AST walker work?

Yet another approach to this problem uses the Iterator pattern. Instead of taking the computation to the nodes as a visitor does, an iterator returns the elements of a complex structure one by one for processing (). One way to think about it is that the Visitor pattern encapsulates recursion, while the Iterator pattern turns everything into a for loop.

The Iterator pattern
Finding nodes in the tree using the Iterator pattern.

We can implement the Iterator pattern in JavaScript using generator functions. If we declare a function using function * rather than just function then we can use the yield keyword to return a value and suspend processing to be resumed later. The result of yield is a two-part structure with a value and a flag showing whether or not processing is done:

function * threeWords () {
  yield 'first'
  yield 'second'
  yield 'third'
}

const gen = threeWords()

console.log(gen.next())
console.log(gen.next())
console.log(gen.next())
console.log(gen.next())
{ value: 'first', done: false }
{ value: 'second', done: false }
{ value: 'third', done: false }
{ value: undefined, done: true }

It's important to note that a generator function creates an object that we can then ask for values repeatedly. This gives us a way to have several generators in play at the same time.

This generator takes an irregular nested array of strings and yields a string and another generator (using yield* to mean "uses its values until they run out"):

function * getNodes (here) {
  if (typeof here === 'string') {
    yield here
  } else if (Array.isArray(here)) {
    for (const child of here) {
      yield * getNodes(child)
    }
  } else {
    throw new Error(`unknown type "${typeof here}"`)
  }
}

export default getNodes

We can manage iteration explicitly:

import getNodes from './generator-tree.js'

const nested = ['first', ['second', 'third']]
const gen = getNodes(nested)
let current = gen.next()
while (!current.done) {
  console.log(current.value)
  current = gen.next()
}
first
second
third

but for…of knows how to work with generators, and is the usual way to manage them:

import getNodes from './generator-tree.js'

const nested = ['first', ['second', 'third']]
for (const value of getNodes(nested)) {
  console.log(value)
}
first
second
third

Let's use this to count the number of expressions of various types in a program. The generator function that visits each node is:

function * getNodes (node) {
  if (node && (typeof node === 'object') && ('type' in node)) {
    yield node
    for (const key in node) {
      if (Array.isArray(node[key])) {
        for (const child of node[key]) {
          yield * getNodes(child)
        }
      } else if (typeof node[key] === 'object') {
        yield * getNodes(node[key])
      }
    }
  }
}

and the program that uses it is:

const ast = acorn.parse(program, { locations: true })
const result = {}
for (const node of getNodes(ast)) {
  if (node.type === 'BinaryExpression') {
    if (node.operator in result) {
      result[node.operator] += 1
    } else {
      result[node.operator] = 1
    }
  }
}
console.log('counts are', result)

When we run it with our usual test program as input, we get:

counts are { '*': 2, '+': 1 }

Generators are a clean solution to many hard problems, but since we have to maintain state ourselves, we find it more difficult to check variable identifiers using generators than using the class-based Visitor approach. Again, this could be a reflection of what we're used to rather than anything intrinsic; as with coding style, the most important thing is to be consistent.

What other kinds of analysis can we do?

As one final example, consider the problem of keeping trac of which methods are defined where in a deeply-nested class hierarchy. (This problem came up in some of the later chapters in this book: we wrote so many classes that incrementally extended their predecessors for pedagogical purposes that we lost track of what was defined where.) To create a table of method definitions, we first need to find the ancestors of the last class in the hierarchy:

import assert from 'assert'
import acorn from 'acorn'
import fs from 'fs'
import path from 'path'
import walk from 'acorn-walk'

class FindAncestors {
  find (dirname, filename, className) {
    return this.traceAncestry(dirname, filename, className, [])
  }

  traceAncestry (dirname, filename, className, accum) {
    const fullPath = path.join(dirname, filename)
    const program = fs.readFileSync(fullPath, 'utf-8')
    const options = { locations: true, sourceType: 'module' }
    const ast = acorn.parse(program, options)
    const classDef = this.findClassDef(filename, ast, className)
    accum.push({ filename, className, classDef })
    const ancestorName = this.getAncestor(classDef)
    if (ancestorName === null) {
      return accum
    }
    const ancestorFile = this.findImport(filename, ast, ancestorName)
    return this.traceAncestry(dirname, ancestorFile, ancestorName, accum)
  }
...
}

export default FindAncestors

Finding class definitions is a straightforward extension of what we have already done:

  findClassDef (filename, ast, className) {
    const state = []
    walk.simple(ast, {
      ClassDeclaration: (node, state) => {
        if ((node.id.type === 'Identifier') &&
            (node.id.name === className)) {
          state.push(node)
        }
      }
    }, null, state)
    assert(state.length === 1,
      `No definition for ${className} in ${filename}`)
    return state[0]
  }

To test this, we start with the last of these three short files:

class Upper {
  constructor () {
    this.name = 'upper'
  }

  report () {
    console.log(this.modify(this.name))
  }

  modify (text) {
    return text.toUpperCase()
  }
}

module.exports = Upper
import Upper from './upper.js'

class Middle extends Upper {
  constructor () {
    super()
    this.range = 'middle'
  }

  modify (text) {
    return `** ${super.modify(text)} **`
  }
}

export default Middle
import Middle from './middle.js'

class Lower extends Middle {
  report () {
    console.log(this.additional())
  }

  additional () {
    return 'lower'
  }
}

export default Lower
Lower in lower.js
Middle in ./middle.js
Upper in ./upper.js

Good: we can recover the chain of inheritance. Finding method definitions is also straightforward:

import FindAncestors from './find-ancestors.js'

class FindMethods extends FindAncestors {
  find (dirname, filename, className) {
    const classes = super.find(dirname, filename, className)
    classes.forEach(record => {
      record.methods = this.findMethods(record.classDef)
    })
    return classes
  }

  findMethods (classDef) {
    return classDef.body.body
      .filter(item => item.type === 'MethodDefinition')
      .map(item => {
        if (item.kind === 'constructor') {
          return 'constructor'
        } else if (item.kind === 'method') {
          return item.key.name
        } else {
          return null
        }
      })
      .filter(item => item !== null)
  }
}

export default FindMethods

And finally, we can print a Markdown-formatted table showing which methods are defined in which class:

method Upper Middle Lower
additional . . X
constructor X X .
modify X X .
report X . X

This may seem rather pointless for our toy example, but it proves its worth when we are looking at something like the virtual machine we will build in :

method DebuggerBase DebuggerInteractive DebuggerTest DebuggerExit
clear . X . .
constructor X X X .
exit . X . X
getCommand . X . .
handle . X . .
help . X . .
input . X X .
interact . X . .
list . X . .
message X . X .
next . X . .
print . X . .
run . X . .
setTester . . X .
setVM X . . .
stop . X . .
variables . X . .

Exercises

Function length

Derive a class from Walker that reports the length in lines of each function defined in the code being checked.

Expression depth

Derive a class from Walker that reports how deep each top-level expression in the source code is. For example, the depth of 1 + 2 * 3 is 2, while the depth of max(1 + 2 + 3) is 3 (one level for the function call, one for the first addition, and one for the nested addition).

Downward and upward

Modify Walker so that users can specify one action to take at a node on the way down the tree and a separate action to take on the way up. (Hint: require users to specify Nodename_downward and/or Nodename_upward methods in their class, then use string concatenation to construct method names while traversing the tree.)

Aggregating across files

Create a command-line program called sniff.js that checks for style violations in any number of source files. The first command-line argument to sniff.js must be a JavaScript source file that exports a class derived from Walker called Check that implements the checks the user wants. The other command-line arguments must be the names of JavaScript source files to be checked:

node sniff.js my-check.js source-1.js source-2.js

Finding assertions

Write a program find-assertions.js that finds all calls to assert or assert.something and prints the assertion message (if any).

Finding a missing parameter

  1. Why doesn't the parameter x show up as a rule violation in the example where we check name lengths?

  2. Modify the example so that it does.

Finding nested indexes

Write a tool that finds places where nested indexing is used, i.e., where the program contains expression like arr[table[i]].

Dynamic lookup

  1. Write a function dynamicExecution that takes an object, the name of a method, and zero or more parameters as arguments and calls that method on that object:

    dynamicExecution(obj, 'meth', 1, 'a')
    // same as obj.meth(1, 'a')
    
  2. What doesn't this work for?

Generators and arrays

  1. Write a generator that takes a two-dimensional table represented as an array of arrays and returns the values in column-major order.

  2. Write another generator that takes a similar table and returns the values in row-major order.

Generators and identifiers

Rewrite the tool to check identifier lengths using a generator.