File Interpolator

Managing source files that have been broken into pieces

Terms defined: header file, literate programming, loader, sandbox, search path, shell variable

Many of the examples in these lessons are too long to show comfortably in one block of code on a printed page, so we needed a way to break them up. As an experiment, we wrote a custom module loader that reads a source file containing specially-formatted comments and then reads and inserts the files specified in those comments before running the code (). Modern programming languages don't work this way, but C and C++ do this with header files, and static site generators () do this to share fragments of HTML.

Using file inclusions
Including fragments of code to create runnable programs.

The special comments in our source files contain two fields: the text to put in the displayed version and file to include when loading:


class Something {
  /*+ constructor + constructor.js +*/

  /*+ a long method + long_method.js +*/

  /*+ another method + another_method.js +*/
}

We got this to work, but decided to use a different approach in this book. The stumbling block was that the style-checking tool ESLint didn't know what to make of our inclusions, so we would either have to modify it or build a style checker of our own. (We will actually do that in , but we won't go nearly as far as ESLint.)

Despite being a dead end, the inclusion tool is a good way to show how JavaScript turns source code into something it can execute. We need to be able to do this in the next couple of chapters, so we might as well tackle it now.

How can we evaluate JavaScript dynamically?

We want to display files as they are on the web and in print, but interpolate the files referenced in special comments when we load things with import. To do this, we need to understand the lifecycle of a JavaScript program. When we ask for a file, Node reads the text, translates it into runnable instructions, and runs those instructions. We can do the second and third steps whenever we want using a function called eval, which takes a string as input and executes it as if it were part of the program ().

How eval works
eval vs. normal translation and execution.

This is not a good idea

eval is a security risk: arbitrary code can do arbitrary things, so if we take a string typed in by a user and execute it without any checks it could email our bookmark list to villains all over the world, erase our hard drive, or do anything else that code can do (which is pretty much anything). Browsers do their best to run code in a sandbox for safety, but Node doesn't, so it's up to us to be (very) careful.

To see eval in action, let's evaluate an expression:


console.log(eval('2 + 2'))

4

Notice that the input to eval is not 2 + 2, but rather a string containing the digit 2, a space, a plus sign, another space, and another 2. When we call eval, it translates this string using exactly the same parser that Node uses for our program and immediately runs the result.

We can make the example a little more interesting by constructing the string dynamically:


const x = 1 
const y = 3 
const z = 5 
for (const name of ['x', 'y', 'z', 'oops']) {
  const expr = `${name} + 1`
  console.log(name, '+ 1 =', eval(expr))
}

x + 1 = 2
y + 1 = 4
z + 1 = 6
undefined:1
oops + 1
^

ReferenceError: oops is not defined
    at eval (eval at <anonymous> \
    (/u/stjs/file-interpolator/eval-loop.js:7:30), <anonymous>:1:1)
    at /u/stjs/file-interpolator/eval-loop.js:7:30
    at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
    at async Loader.import (internal/modules/esm/loader.js:166:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

The first time the loop runs the string is 'x + 1'; since there's a variable called x in scope, eval does the addition and we print the result. The same thing happens for the variables y and z, but we get an error when we try to evaluate the string 'oops + 1' because there is no variable in scope called oops.

eval can use whatever variables are in scope when it's called, but what happens to any variables it defines? This example creates a variable called x and runs console.log to display it, but as the output shows, x is local to the eval call just as variables created inside a function only exist during a call to that function:


const code = `
  const x = 'hello'
  console.log('x in eval is', x)
`

eval(code)
console.log('typeof x after eval', typeof x)

x in eval is hello
typeof x after eval undefined

However, eval can modify variables defined outside the text being evaluated in the same way that a function can modify global variables:


let x = 'original' 
eval('x = "modified"')
console.log('x after eval is', x)

x after eval is modified

This means that if the text we give to eval modifies a structure that is defined outside the text, that change outlives the call to eval:


const seen = {}

for (const name of ['x', 'y', 'z']) {
  const expr = `seen["${name}"] = "${name.toUpperCase()}"`
  eval(expr)
}

console.log(seen)

{ x: 'X', y: 'Y', z: 'Z' }

The examples so far have all evaluated strings embedded in the program itself, but eval doesn't care where its input comes from. Let's move the code that does the modifying into to-be-loaded.js:


// Modify a global structure defined by whoever loads us.
Seen.from_loaded_file = 'from loaded file'

This doesn't work on its own because Seen isn't defined:

/u/stjs/file-interpolator/to-be-loaded.js:3
Seen.from_loaded_file = 'from loaded file'
^

ReferenceError: Seen is not defined
    at /u/stjs/file-interpolator/to-be-loaded.js:3:1
    at ModuleJob.run (internal/modules/esm/module_job.js:152:23)
    at async Loader.import (internal/modules/esm/loader.js:166:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

But if we read the file and eval the text after defining Seen, it does what we want:


import fs from 'fs'

const Seen = {}

const filename = process.argv[2]
const content = fs.readFileSync(filename, 'utf-8')
console.log('before eval, Seen is', Seen)
eval(content)
console.log('after eval, Seen is', Seen)

node does-the-loading.js to-be-loaded.js

before eval, Seen is {}
after eval, Seen is { from_loaded_file: 'from loaded file' }

How can we manage files?

The source files in this book are small enough that we don't have to worry about reading them repeatedly, but we would like to avoid re-reading things unnecessarily in large systems or when there might be network delays. The usual approach is to create a cache using the Singleton pattern that we first met in . Whenever we want to read a file, we check to see if it's already in the cache (). If it is, we use that copy; if not, we read it and add it to the cache using the file path as a lookup key.

Implementing a cache as a singleton
Using the Singleton pattern to implement a cache of loaded files.

We can write a simple cache in just a few lines of code:


import fs from 'fs'

class Cache {
  constructor () {
    this.loaded = new Map()
  }

  need (name) {
    if (this.loaded.has(name)) {
      console.log(`returning cached value for ${name}`)
      return this.loaded.get(name)
    }
    console.log(`loading ${name}`)
    const content = fs.readFileSync(name, 'utf-8')
    const result = eval(content)
    this.loaded.set(name, result)
    return result
  }
}

const cache = new Cache()

export default (name) => {
  return cache.need(name)
}

Since we are using eval, though, we can't rely on export to make things available to the rest of the program. Instead, we rely on the fact that the result of an eval call is the value of the last expression evaluated. Since a variable name on its own evaluates to the variable's value, we can create a function and then use its name to "export" it from the evaluated file:


// Define.
const report = (message) => {
  console.log(`report in import-01.js with message "${message}"`)
}

// Export.
report

To test our program, we load the implementation of the cache using import, then use it to load and evaluate another file. This example expects that "other file" to define a function, which we call in order to show that everything is working:

import need from './need-simple.js'

const imported = need('./import-simple.js')
imported('called from test-simple.js')

node test-simple.js

How can we find files?

Each of the files included in our examples is in the same directory as the file including it, but in C/C++ or a page templating system we might include a particular file in several different places. We don't want to have to put all of our files in a single directory, so we need a way specify where to look for files that are being included.

One option is to use relative paths, but another option is to give our program a list of directories to look in. This is called a search path, and many programs use them, including Node itself. By convention, a search path is written as a colon-separated list of directories on Unix or using semi-colons on Windows. If the path to an included starts with ./, we look for it locally; if not, we go through the directories in the search path in order until we find a file with a matching name ().

Implementing a search path
Using a colon-separated list of directories as a search path.

That's just how it is

The rules about search paths in the paragraph above are a convention: somebody did it this way years ago and (almost) everyone has imitated it since. We could implement search paths some other way, but as with configuration file formats, variable naming conventions, and many other things, the last thing the world needs is more innovation.

Since the cache is responsible for finding files, it should also handle the search path. The outline of the class stays the same:


import fs from 'fs'
import path from 'path'

class Cache {
  constructor () {
    this.loaded = new Map()
    this.constructSearchPath()
  }

  need (fileSpec) {
    if (this.loaded.has(fileSpec)) {
      console.log(`returning cached value for ${fileSpec}`)
      return this.loaded.get(fileSpec)
    }
    console.log(`loading value for ${fileSpec}`)
    const filePath = this.find(fileSpec)
    const content = fs.readFileSync(filePath, 'utf-8')
    const result = eval(content)
    this.loaded.set(fileSpec, result)
    return result
  }

}

const cache = new Cache()

export default (fileSpec) => {
  return cache.need(fileSpec)
}

To get the search path, we look for the shell variable NEED_PATH. (Writing shell variables' names in upper case is another convention.) If NEED_PATH exists, we split it on colons to create a list of directories:


  constructSearchPath () {
    this.searchPath = []
    if ('NEED_PATH' in process.env) {
      this.searchPath = process.env.NEED_PATH
        .split(':')
        .filter(x => x.length > 0)
    }
  }

When we need to find a file we first check to see if the path is local. If it's not, we try the directories in the search path in order:


  constructSearchPath () {
    this.searchPath = []
    if ('NEED_PATH' in process.env) {
      this.searchPath = process.env.NEED_PATH
        .split(':')
        .filter(x => x.length > 0)
    }
  }

To test this, we put the file to import in a subdirectory called modules:


// Define.
const report = (message) => {
  console.log(`in LEFT with message "${message}"`)
}

// Export.
report

and then put the file doing the importing in the current directory:

import need from './need-path.js'
const imported = need('imported-left.js')
imported('called from test-import-left.js')

We now need to set the variable NEED_PATH. There are many ways to do this in shell; if we only need the variable to exist for a single command, the simplest is to write it as:

NAME=value command

right before the command (on the same line). Here's the shell command that runs our test case using $PWD to get the current working directory:

NEED_PATH=$PWD/modules/ node test-import-left.js

loading value for imported-left.js
trying /u/stjs/file-interpolator/modules/imported-left.js for \
imported-left.js
in LEFT with message "called from test-import-left.js"

Now let's create a second importable file in the modules directory:


// Define.
const report = (message) => {
  console.log(`in RIGHT with message "${message}"`)
}

// Export.
report

and load that twice to check that caching works:

import need from './need-path.js'

const imported = need('imported-right.js')
imported('called from test-import-right.js')

const alsoImported = need('imported-right.js')
alsoImported('called from test-import-right.js')

loading value for imported-right.js
trying /u/stjs/file-interpolator/modules/imported-right.js for \
imported-right.js
in RIGHT with message "called from test-import-right.js"
returning cached value for imported-right.js
in RIGHT with message "called from test-import-right.js"

How can we interpolate pieces of code?

Interpolating files is straightforward once we have this machinery in place. We modify Cache.find to return a directory and a file path, then add an interpolate method to replace special comments:


class Cache {
  // ...
  interpolate (fileDir, outer) {
    return outer.replace(Cache.INTERPOLATE_PAT,
                         (match, comment, filename) => {
      filename = filename.trim()
      const filePath = path.join(fileDir, filename)
      if (!fs.existsSync(filePath)) {
        throw new Error(`Cannot find ${filePath}`)
      }
      const inner = fs.readFileSync(filePath, 'utf-8')
      return inner
    })
  }
  // ...
}
Cache.INTERPOLATE_PAT = /\/\*\+(.+?)\+(.+?)\+\*\//g

We can now have a file like this:


class Example {
  constructor (msg) {
    this.constructorMessage = msg
  }
  /*+ top method + import-interpolate-topmethod.js +*/
  /*+ bottom method + import-interpolate-bottommethod.js +*/
}

Example

and subfiles like this:


topMethod (msg) {
  this.bottomMethod(`(topMethod ${msg})`)
}

and this:


bottomMethod (msg) {
  console.log(`(bottomMethod ${msg})`)
}

Let's test it:

node test-import-interpolate.js

(bottomMethod (topMethod called from test-import-interpolate.js))

When this program runs, its lifecycle is:

  1. Node starts to run test-import-interpolate.js.
  2. It sees the import of need-interpolate` so it reads and evaluates that code.
  3. Doing this creates a singleton cache object.
  4. The program then calls need('./import-interpolate.js').
  5. This checks the cache: nope, nothing there.
  6. So it loads import-interpolate.js.
  7. It finds two specially-formatted comments in the text…
  8. …so it loads the file described by each one and inserts the text in place of the comment.
  9. Now that it has the complete text, it calls eval
  10. …and stores the result of eval (which is a class) in the cache.
  11. It also returns that class.
  12. We then create an instance of that class and call its method.

This works, but as we said in the introduction we decided not to use it because it didn't play well with other tools. No piece of software exists in isolation; when we evaluate a design, we always have to ask how it fits into everything else we have.

What did we do instead?

Rather than interpolating file fragments, we extract or erase parts of regular JavaScript files based on specially-formatted comments like the <fragment>...</fragment> pair shown below.

class Example {
  constructor (name) {
    this.name = name
  }

  // <fragment>
  fragment (message) {
    console.log(`${name}: ${message}`)
  }
  // </fragment>
}

The code that selects the part of the file we want to display is part of our page templating system. It re-extracts code for display every time the web version of this site is built, which ensures that we always shows what's in the current version of our examples. However, this system doesn't automatically update the description of the code: if we write, "It does X," then modify the code to do Y, our lesson can be inconsistent. Literate programming was invented to try to prevent this from happening, but it never really caught on—unfortunately, most programming systems that describe themselves as "literate" these days only implement part of Donald Knuth's original vision.

Exercises

Security concerns

  1. Write a function loadAndRun that reads a file, evaluates it, and returns the result.

  2. Create a file trust-me.js that prints "nothing happening here" when it is evaluated, but also deletes everything in the directory called target.

  3. Write tests for this using mock-fs.

Please be careful doing this exercise.

Loading functions

Write a function that reads a file containing single-argument functions like this:

addOne: (x) => x + 1
halve: (x) => x / 2
array: (x) => Array(x).fill(0)

and returns an object containing callable functions.

Registering functions

Write a function that loads one or more files containing function definitions like this:

const double = (x) => {
  return 2 * x
}

EXPORTS.append(double)

and returns a list containing all the loaded functions.

Indenting inclusions

Modify the file inclusion system so that inclusions are indented by the same amount as the including comment. For example, if the including file is:

const withLogging = (args) => {
  /*+ logging call + logging.js +*/
}

withLogging

and the included file is:

console.log('first message')
console.log('second message')

then the result will be:

const withLogging = (args) => {
  console.log('first message')
  console.log('second message')
}

withLogging

i.e., all lines of the inclusion will be indented to match the first.

Interpolating from subdirectories

Modify the file interpolator so that snippets can be included from sub-directories using relative paths.

  1. Modify the file interpolator so that it searches recursively through all subdirectories of the directories on the search path to find inclusions.

  2. Explain why this is a bad idea.

Defining variables

Modify the file inclusion system so that users can pass in a Map containing name-value pairs and have these interpolated into the text of the files being loaded. To interpolate a value, the included file must use @@name@@.

Specifying markers

Modify the file inclusion system so that the user can override the inclusion comment markers. For example, the user should be able to specify that /*! and !*/ be used to mark inclusions. (This is often used in tutorials that need to show the inclusion markers without them being interpreted.)

Recursive inclusions

Modify the file interpolator to support recursive includes, i.e., to handle inclusion markers in files that are being included. Be sure to check for the case of infinite includes.

Slicing files

Write a function that reads a JavaScript source file containing specially-formatted comments like the ones shown below and extracts the indicated section.

const toBeLeftOut = (args) => {
  console.log('this should not appear')
}

// <keepThis>
const toBeKept = (args) => {
  console.log('only this function should appear')
}
// </keepThis>

Users should be able to specify any tag they want, and if that tag occurs multiple times, all of the sections marked with that tag should be kept. (This is the approach we took for this book instead of file interpolation.)