Code Generator

Modifying code to track coverage and execution times

Terms defined: byte code, code coverage (in testing), Decorator pattern, macro, nested function

We've been writing tests since , but how much of our code do they actually exercise? One way to find out is to use a code coverage tool like Istanbul that watches a program while it executes and keeps track of which lines have run and which haven't. Running each line at least once doesn't guarantee that the code is bug-free, but any code that isn't run during tested shouldn't be trusted.

Our code coverage tool will keep track of which functions have and haven't been called. Since we don't want to rewrite Node to make a note each time every function is called, we will modify the functions themselves by parsing the code with Acorn, inserting the instructions we need into the AST, and then turning the AST back into code.

How can we replace a function with another function?

The first thing we need is a way to wrap up an arbitrary function call. If we declare a function in JavaScript with a parameter like ...args, all of the "extra" arguments in the call that don't line up with regular parameters are stuffed into args (). We can also call a function by putting values in a variable and using func(...var) to spread those values out.

Spreading parameters
Using ...args to capture and spread parameters.

We can use ...args to capture all of the arguments to a function call and forward them to another function. Let's start by creating functions with a varying number of parameters that run to completion or throw an exception, then run them to make sure they do what we want:

let funcZero = () => console.log('funcZero')

let funcOne = (first) => console.log(`funcOne(${first})`)

let funcTwo = (first, second) => console.log(`funcTwo(${first}, ${second})`)

let funcError = () => {
  console.log('funcError')
  throw new Error('from funcError')
  console.log('should not reach this')
}

const runAll = (title) => {
  console.log(title)
  funcZero()
  funcOne(1)
  funcTwo(1, 2)
  try {
    funcError()
  } catch (error) {
    console.log(`caught ${error} as expected`)
  }
  console.log()
}

runAll('first time')

We can now write a function that takes a function as an input and creates a new function that handles all of the errors in the original function:

const replace = (func) => {
  return (...args) => {
    console.log('before')
    try {
      const result = func(...args)
      console.log('after')
      return result
    } catch (error) {
      console.log('error')
      throw error
    }
  }
}

funcZero = replace(funcZero)
funcOne = replace(funcOne)
funcTwo = replace(funcTwo)
funcError = replace(funcError)

runAll('second time')

Let's try it out:

first time
funcZero
funcOne(1)
funcTwo(1, 2)
funcError
caught Error: from funcError as expected

second time
before
funcZero
after
before
funcOne(1)
after
before
funcTwo(1, 2)
after
before
funcError
error
caught Error: from funcError as expected

This is an example of the Decorator pattern. A decorator is a function whose job is to modify the behavior of other functions in some general ways. Decorators are built in to some languages (like Python), and we can add them to most others as we have done here.

How can we generate JavaScript?

We could use a decorator to replace every function in our program with one that kept track of whether or not it was called, but we would have to apply it to every one of our functions. What we really want is a way to do this automatically for everything; for that, we need to parse and generate code.

Other ways to do it

A third way to achieve what we want is to let the system turn code into runnable instructions and then modify those instructions. We can't do this because Node doesn't save the generated byte code for us to play with. In other languages, such as Java, this can be an attractive approach.

Our tool will parse the JavaScript with Acorn to create an AST, modify the AST, and then use a library called Escodegen to turn the AST back into JavaScript. To start, let's look at the Acorn parse tree for a simple function definition, which is 76 lines of pretty-printed JSON:

import acorn from 'acorn'

const text = `const func = (param) => {
  return param + 1
}`

const ast = acorn.parse(text, { sourceType: 'module' })
console.log(JSON.stringify(ast, null, 2))
{
  "type": "Program",
  "start": 0,
  "end": 46,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 0,
      "end": 46,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 6,
          "end": 46,
          "id": {
            "type": "Identifier",
            "start": 6,
            "end": 10,
            "name": "func"
          },
          "init": {
            "type": "ArrowFunctionExpression",
            "start": 13,
            "end": 46,
            "id": null,
            "expression": false,
            "generator": false,
            "async": false,
            "params": [
              {
                "type": "Identifier",
                "start": 14,
                "end": 19,
                "name": "param"
              }
            ],
            "body": {
              "type": "BlockStatement",
              "start": 24,
              "end": 46,
              "body": [
                {
                  "type": "ReturnStatement",
                  "start": 28,
                  "end": 44,
                  "argument": {
                    "type": "BinaryExpression",
                    "start": 35,
                    "end": 44,
                    "left": {
                      "type": "Identifier",
                      "start": 35,
                      "end": 40,
                      "name": "param"
                    },
                    "operator": "+",
                    "right": {
                      "type": "Literal",
                      "start": 43,
                      "end": 44,
                      "value": 1,
                      "raw": "1"
                    }
                  }
                }
              ]
            }
          }
        }
      ],
      "kind": "const"
    }
  ],
  "sourceType": "module"
}

After inspecting a few nodes, we can try to create a few of our own and turn them into code:

import escodegen from 'escodegen'

const result = escodegen.generate({
  type: 'BinaryExpression',
  operator: '+',
  left: { type: 'Literal', value: 40 },
  right: { type: 'Literal', value: 2 }
})
console.log(result)
40 + 2

How can we count how often functions are executed?

Our tool will find all the function declaration nodes in the program and insert a node to increment an entry in a global variable called __counters. Our test case is:

const TEXT = `
const funcOuter = (param) => {
  return param + 1
}
const funcInner = (param) => {
  return param + 1
}
for (const i of [1, 3, 5]) {
  funcOuter(funcInner(i) + funcInner(i))
}
`

and the main function of our program is:

const main = () => {
  const ast = acorn.parse(TEXT, { sourceType: 'module' })

  const allNodes = []
  walk.simple(ast, {
    VariableDeclarator: (node, state) => {
      if (node.init && (node.init.type === 'ArrowFunctionExpression')) {
        state.push(node)
      }
    }
  }, null, allNodes)

  const names = {}
  allNodes.forEach(node => insertCounter(names, node))
  console.log(initializeCounters(names))
  console.log(escodegen.generate(ast))
  console.log(reportCounters())
}

To insert a count we call insertCounter, which records the function's name and modifies the node:

const insertCounter = (names, node) => {
  const name = node.id.name
  names[name] = 0

  const body = node.init.body.body
  const increment =
    acorn.parse(`__counters['${name}'] += 1`, { sourceType: 'module' })
  body.unshift(increment)
}

Notice how we don't try to build the nodes by hand, but instead construct the string we need, use Acorn to parse that, and use the result. Doing this saves us from embedding multiple lines of JSON in our program and also ensures that if the AST for our code ever changes, the program will do the right thing automatically.

Finally, we need to add a couple of helper functions:

const initializeCounters = (names) => {
  const body = Object.keys(names).map(n => `'${n}': 0`).join(',\n')
  return 'const __counters = {\n' + body + '\n}'
}

const reportCounters = () => {
  return 'console.log(__counters)'
}

and to run it to make sure it all works:

const __counters = {
'funcOuter': 0,
'funcInner': 0
}
const funcOuter = param => {
        __counters['funcOuter'] += 1;
    return param + 1;
};
const funcInner = param => {
        __counters['funcInner'] += 1;
    return param + 1;
};
for (const i of [
        1,
        3,
        5
    ]) {
    funcOuter(funcInner(i) + funcInner(i));
}
console.log(__counters)

Too simple to be safe

Our simple approach doesn't work if functions can have the same names, which they can if we use modules or nested functions. One way to solve this would be to manufacture a label from the function's name and the line number in the source code.

How can we time function execution?

We can use this same strategy to do many other things. For example, we can find out how long it takes functions to run by wrapping them up in code that records the start and end time of each call. As before, we find the nodes of interest and decorate them, then stitch the result together with a bit of administrative code:

const timeFunc = (text) => {
  const ast = acorn.parse(text, { sourceType: 'module' })
  const allNodes = gatherNodes(ast)
  allNodes.forEach(node => wrapFuncDef(node))
  return [
    initializeCounters(allNodes),
    escodegen.generate(ast),
    reportCounters()
  ].join('\n')
}

Gathering nodes is straightforward:

const gatherNodes = (ast) => {
  const allNodes = []
  walk.simple(ast, {
    VariableDeclarator: (node, state) => {
      if (node.init && (node.init.type === 'ArrowFunctionExpression')) {
        state.push(node)
      }
    }
  }, null, allNodes)
  return allNodes
}

as is wrapping the function definition:

const wrapFuncDef = (originalAst) => {
  const name = originalAst.id.name
  const wrapperAst = makeWrapperAst(name)
  wrapperAst.init.body.body[0].declarations[0].init = originalAst.init
  originalAst.init = wrapperAst.init
}

The only big difference is how we make the wrapper function. We create it with a placeholder for the original function so that we have a spot in the AST to insert the actual code:

const timeFunc = (text) => {
  const ast = acorn.parse(text, { sourceType: 'module' })
  const allNodes = gatherNodes(ast)
  allNodes.forEach(node => wrapFuncDef(node))
  return [
    initializeCounters(allNodes),
    escodegen.generate(ast),
    reportCounters()
  ].join('\n')
}

One more test:

const __counters = {
'assignment': 0,
'readFile': 0
}
const assignment = (...originalArgs) => {
    const originalFunc = range => {
        let j = 0;
        for (let i = 0; i < range; i += 1) {
            j = i;
        }
    };
    const startTime = Date.now();
    try {
        const result = originalFunc(...originalArgs);
        const endTime = Date.now();
        __counters['assignment'] += endTime - startTime;
        return result;
    } catch (error) {
        const endTime = Date.now();
        __counters['assignment'] += endTime - startTime;
        throw error;
    }
};
const readFile = (...originalArgs) => {
    const originalFunc = (range, filename) => {
        for (let i = 0; i < range; i += 1) {
            fs.readFileSync(filename, 'utf-8');
        }
    };
    const startTime = Date.now();
    try {
        const result = originalFunc(...originalArgs);
        const endTime = Date.now();
        __counters['readFile'] += endTime - startTime;
        return result;
    } catch (error) {
        const endTime = Date.now();
        __counters['readFile'] += endTime - startTime;
        throw error;
    }
};
const numLoops = 100000;
assignment(numLoops);
readFile(numLoops, 'index.md');
console.log(__counters)
OUTPUT
{ assignment: 1, readFile: 2807 }

Source-to-source translation is widely used in JavaScript: tools like Babel use it to turn modern features like async and await () into code that older browsers can understand. The technique is so powerful that it is built into languages like Scheme, which allow programmers to add new syntax to the language by defining macros. Depending on how carefully they are used, macros can make programs extremely elegant, practically incomprehensible, or both.

Exercises

JSON to JavaScript

Write a tool that uses Escodegen to translate simple expressions written in JSON into runnable JavaScript. For example, the tool should translate:

['+', 3, ['*', 5, 'a']]

into:

3 + (5 * a)

JavaScript to HTML

Write a function that takes nested JavaScript function calls for generating HTML like this:

div(h1('title'), p('explanation'))

and turns them into HTML like this:

<div><h1>title</h1><p>explanation</p></div>

Handling modules

Modify the code that counts the number of times a function is called to handle functions with the same name from different modules.

Tracking calls

Write a decorator that takes a function as its argument and returns a new function that behaves exactly the same way except that it keeps track of who called it.

  1. The program contains a stack where decorated functions push and pop their names as they are called and as they exit.

  2. Each time a function is called it adds a record to an array to record its name and the name at the top of the stack (i.e., the most-recently-called decorated function).

Counting classical function definitions

Modify the code generator to handle functions declared with the function keyword as well as functions declared using =>.

Recording input file size

  1. Write a program that replaces all calls to fs.readFileSync with calls to readFileSyncCount.

  2. Write the function readFileSyncCount to read and return a file using fs.readFileSync but to also record the file's name and size in bytes.

  3. Write a third function reportInputFileSizes that reports what files were read and how large they were.

  4. Write tests for these functions using Mocha and mock-fs.

Checking argument types

Write a tool that modifies functions to check the types of their arguments at run-time.

  1. Each function is replaced by a function that passes all of its arguments to checkArgs along with the function's name, then continues with the function's original operation.

  2. The first time checkArgs is called for a particular function it records the actual types of the arguments.

  3. On subsequent calls, it checks that the argument types match those of the first call and throws an exception if they do not.

Two-dimensional arrays

The function make2D takes a row length and one or more values and creates a two-dimensional array from those values:

make2D(2, 'a', 'b', 'c', 'd')
// produces [['a', 'b'], ['c', 'd']]

Write a function that searches code to find calls to make2D and replaces them with inline arrays-of-arrays. This function only has to work for calls with a fixed row length, i.e., it does not have to handle make2D(N, 'a', 'b').

From require to import

Write a function that searches code for simple calls to require and replaces them with calls to import. This function only needs to work for the simplest case; for example, if the input is:

const name = require('module')

then the output is:

import name from 'module'

Removing empty constructors

Write a function that removes empty constructors from class definitions. For example, if the input is:

class Example {
  constructor () {
  }

  someMethod () {
    console.log('some method')
  }
}

then the output should be:

class Example {
  someMethod () {
    console.log('some method')
  }
}