File Interpolator

Managing source files that have been broken into pieces

Terms defined: header file, loader, sandbox, search path, shell variable

Many of our examples are too long to show comfortably in one block of code on a printed page. As an experiment, we wrote a custom loader that reads a source file with specially-formatted comments and then reads and inserts the files specified in those comments (). This isn't how modern programming languages manage multi-file programs, but C and C++ do this with header files, and page templating systems () do this to share fragments of HTML.

Using file inclusions
Including fragments of code to create runnable programs.

The special comments in our source files contain two fields: the text to put in the displayed version and file to include when loading:


class Something {
  /*+ constructor + constructor.js +*/

  /*+ a long method + long_method.js +*/

  /*+ another method + another_method.js +*/
}

Spoiler alert: we got this to work, then decided to use a different approach for managing the source code fragments in this book. The stumbling block was that the style-checking tool ESLint didn't know how to handle our inclusions so we would either have to modify it or build a style checker of our own. (We will actually do that in , but we won't go nearly as far as ESLint.) There are some useful lessons in how we built the inclusion tool despite that, particularly related to turning source code into something that will run.

How can we evaluate JavaScript dynamically?

We want to load a file dynamically just like import does, but display the special comments in our web/print versions rather than the interpolated code. To do this, we need to understand the lifecycle of a JavaScript program. When we ask for a file, Node reads the text, translates it into runnable instructions, and runs those instructions. We can do the second and third steps whenever we want using a function called eval, which takes a string as input and executes it as if it were part of the program ().

How eval works
eval vs. normal translation and execution.

This is not a good idea

eval is a security risk: arbitrary code can do arbitrary things, so if we take a string typed in by a user and execute it without any checks it could email our bookmark list to villains all over the world, erase our hard drive, or do anything else that code can do (which is pretty much anything). Browsers do their best to run code in a sandbox for safety, but Node doesn't, so it's up to us to be very (very) careful.

To see eval in action, let's evaluate an expression:


console.log(eval('2 + 2'))
4

Notice that the input to eval is not 2 + 2, but rather a string containing the digit 2, a space, a plus sign, another space, and another 2. When we call eval, it translates this string using exactly the same parser that Node uses for our program and immediately runs the result.

We can make the example a little more interesting by constructing the string dynamically:


const x = 1
const y = 3
const z = 5
for (const name of ['x', 'y', 'z', 'oops']) {
  const expr = `${name} + 1`
  console.log(name, '+ 1 =', eval(expr))
}
x + 1 = 2
y + 1 = 4
z + 1 = 6
undefined:1
oops + 1
^

ReferenceError: oops is not defined
    at eval (eval at <anonymous> \
     (/u/stjs/file-interpolator/eval-loop.js:7:30), <anonymous>:1:1)
    at /u/stjs/file-interpolator/eval-loop.js:7:30
    at ModuleJob.run (internal/modules/esm/module_job.js:146:23)
    at async Loader.import (internal/modules/esm/loader.js:165:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

The first time the loop runs the string is 'x + 1'; since there's a variable called x in scope, eval does the addition and we print the result. The same thing happens for the variables y and z, but we get an error when we try to evaluate the string 'oops + 1' because there is no variable in scope called oops.

eval can use whatever variables are in scope when it's called, but what happens to any variables it defines? This example creates a variable called x and runs console.log to display it, but as the output shows, x is local to the eval call just as variables created inside a function only exist during a call to that function:


const code = `
  const x = 'hello'
  console.log('x in eval is', x)
`

eval(code)
console.log('typeof x after eval', typeof x)
x in eval is hello
typeof x after eval undefined

However, eval can modify variables outside the text in the same way that a function can modify global variables:


let x = 'original'
eval('x = "modified"')
console.log('x after eval is', x)
x after eval is modified

So if the text we give to eval modifies a structure that is defined outside the text, that change outlives the call to eval:


const seen = {}

for (const name of ['x', 'y', 'z']) {
  const expr = `seen["${name}"] = "${name.toUpperCase()}"`
  eval(expr)
}

console.log(seen)
{ x: 'X', y: 'Y', z: 'Z' }

The examples so far have all evaluated strings embedded in the program itself, but eval doesn't care where its input comes from. Let's move the code that does the modifying into to-be-loaded.js:


// Modify a global structure defined by whoever loads us.
Seen.from_loaded_file = 'from loaded file'

This doesn't work on its own because Seen isn't defined:

/u/stjs/file-interpolator/to-be-loaded.js:3
Seen.from_loaded_file = 'from loaded file'
^

ReferenceError: Seen is not defined
    at /u/stjs/file-interpolator/to-be-loaded.js:3:1
    at ModuleJob.run (internal/modules/esm/module_job.js:146:23)
    at async Loader.import (internal/modules/esm/loader.js:165:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

But if we read the file and eval the text after defining Seen, it does what we want:


import fs from 'fs'

const Seen = {}

const filename = process.argv[2]
const content = fs.readFileSync(filename, 'utf-8')
console.log('before eval, Seen is', Seen)
eval(content)
console.log('after eval, Seen is', Seen)
node does-the-loading.js to-be-loaded.js
before eval, Seen is {}
after eval, Seen is { from_loaded_file: 'from loaded file' }

How can we manage files?

The source files in this book are small enough that we don't have to worry about reading them repeatedly, but in larger systems or when there are network delays we would like to avoid doing unnecessary work. The usual approach is to create a cache using the Singleton pattern that we first met in . Whenever we want to read a file, we check to see if it's already in the cache (). If so, we use that copy; if not, we read it and add it to the cache using the file path as a lookup key.

Implementing a cache as a singleton
Using the Singleton pattern to implement a cache of loaded files.

We can write a simple cache in just a few lines of code:


import fs from 'fs'

class Cache {
  constructor () {
    this.loaded = new Map()
  }

  need (name) {
    if (this.loaded.has(name)) {
      console.log(`returning cached value for ${name}`)
      return this.loaded.get(name)
    }
    console.log(`loading ${name}`)
    const content = fs.readFileSync(name, 'utf-8')
    const result = eval(content)
    this.loaded.set(name, result)
    return result
  }
}

const cache = new Cache()

export default (name) => {
  return cache.need(name)
}

Since we are using eval, though, we can't rely on export to make things available to the rest of the program. Instead, we rely on the fact that the result of an eval call is the value of the last expression evaluated. Since a variable name on its own evaluates to the variable's value, we can create a function and then use its name to "export" it from the evaluated file:


// Define.
const report = (message) => {
  console.log(`report in import-01.js with message "${message}"`)
}

// Export.
report

To test our program, we load the implementation of the cache using import, then use it to load and evaluate another file. This example expects that "other file" to define a function, which we call in order to show that everything is working:

import need from './need-simple.js'

const imported = need('./import-simple.js')
imported('called from test-simple.js')
node test-simple.js

While each of the files we read in our code interpolation example will probably be unique, page templating systems often want to read the same file from multiple places. We could specify everything using relative paths, but another option is to give our program a search path, i.e., a list of directories to look in for the things it needs. Many programs use search paths, including Node itself; by convention, they are written as a colon-separated list of directories on Unix or using semi-colons on Windows. If a path starts with ./, we look for it locally, and if not, we go through the directories in the search path in order until we find a file with a matching name ().

Implementing a search path
Using a colon-separated list of directories as a search path.

:::

That's just how it is

The rules about search paths in the paragraph above are a convention: somebody did it this way years ago and (almost) everyone has imitated it since. It could have been done many other ways, and there's no guarantee that any particular application does it exactly this way; it's just how it is. :::

Since the cache is responsible for finding files, let's teach it about search paths. The outline of the class stays the same:


import fs from 'fs'
import path from 'path'

class Cache {
  constructor () {
    this.loaded = new Map()
    this.constructSearchPath()
  }

  need (fileSpec) {
    if (this.loaded.has(fileSpec)) {
      console.log(`returning cached value for ${fileSpec}`)
      return this.loaded.get(fileSpec)
    }
    console.log(`loading value for ${fileSpec}`)
    const filePath = this.find(fileSpec)
    const content = fs.readFileSync(filePath, 'utf-8')
    const result = eval(content)
    this.loaded.set(fileSpec, result)
    return result
  }
...
}

const cache = new Cache()

export default (fileSpec) => {
  return cache.need(fileSpec)
}

To get the search path, we look for the shell variable NEED_PATH. (Writing shell variables' names in upper case is another convention.) If NEED_PATH exists, we split it on colons to create a list of directories:

  constructSearchPath () {
    this.searchPath = []
    if ('NEED_PATH' in process.env) {
      this.searchPath = process.env.NEED_PATH
        .split(':')
        .filter(x => x.length > 0)
    }
  }

When the time comes to find a file we check to see if the path is local, and if it's not, we try the directories in order:

  constructSearchPath () {
    this.searchPath = []
    if ('NEED_PATH' in process.env) {
      this.searchPath = process.env.NEED_PATH
        .split(':')
        .filter(x => x.length > 0)
    }
  }

To test this, we put the file to import in a subdirectory called modules:


// Define.
const report = (message) => {
  console.log(`in LEFT with message "${message}"`)
}

// Export.
report

and then put the file doing the importing in the current directory:

import need from './need-path.js'
const imported = need('imported-left.js')
imported('called from test-import-left.js')

We now need to set the variable NEED_PATH. There are many ways to do this in shell; if we only need the variable to exist for a single command, the simplest is to write it as:

NAME=value command

Here's the shell command that runs out test case, using $PWD to get the current working directory:

NEED_PATH=$PWD/modules/ node test-import-left.js
loading value for imported-left.js
trying /u/stjs/file-interpolator/modules/imported-left.js for \
 imported-left.js
in LEFT with message "called from test-import-left.js"

Now let's create a second importable file in the modules directory:


// Define.
const report = (message) => {
  console.log(`in RIGHT with message "${message}"`)
}

// Export.
report

and load that twice to check that caching works:

import need from './need-path.js'

const imported = need('imported-right.js')
imported('called from test-import-right.js')

const alsoImported = need('imported-right.js')
alsoImported('called from test-import-right.js')
loading value for imported-right.js
trying /u/stjs/file-interpolator/modules/imported-right.js for \
 imported-right.js
in RIGHT with message "called from test-import-right.js"
returning cached value for imported-right.js
in RIGHT with message "called from test-import-right.js"

How can we interpolate pieces of code?

Now that all this machinery is in place, interpolating files is straightforward. We modify Cache.find to return a directory and a file path, then add an interpolate method to replace special comments:


class Cache {
  // ...
  interpolate (fileDir, outer) {
    return outer.replace(Cache.INTERPOLATE_PAT,
                         (match, comment, filename) => {
      filename = filename.trim()
      const filePath = path.join(fileDir, filename)
      if (!fs.existsSync(filePath)) {
        throw new Error(`Cannot find ${filePath}`)
      }
      const inner = fs.readFileSync(filePath, 'utf-8')
      return inner
    })
  }
  // ...
}
Cache.INTERPOLATE_PAT = /\/\*\+(.+?)\+(.+?)\+\*\//g

We can now have a file like this:


class Example {
  constructor (msg) {
    this.constructorMessage = msg
  }
  /*+ top method + import-interpolate-topmethod.js +*/
  /*+ bottom method + import-interpolate-bottommethod.js +*/
}

Example

and subfiles like this:


topMethod (msg) {
  this.bottomMethod(`(topMethod ${msg})`)
}

and this:


bottomMethod (msg) {
  console.log(`(bottomMethod ${msg})`)
}

Let's test it:

node test-import-interpolate.js
(bottomMethod (topMethod called from test-import-interpolate.js))

When this program runs:

  1. Node starts to run test-import-interpolate.js.
  2. It sees the import of need-interpolate` so it reads and evaluates that code.
  3. Doing this creates a singleton cache object.
  4. The program then calls need('./import-interpolate.js').
  5. This checks the cache: nope, nothing there.
  6. So it loads import-interpolate.js.
  7. It finds two specially-formatted comments in the text…
  8. …so it loads the file described by each one and inserts the text in place of the comment.
  9. Now that it has the complete text, it calls eval
  10. …and stores the result of eval (which is a class) in the cache.
  11. It also returns that class.
  12. We then create an instance of that class and call its method.

This works, but as we said in the introduction we decided not to use it because it didn't work with other tools. No piece of software exists in isolation; when we are evaluating a design, we always have to ask how it fits into everything else we have.

What we did instead

Rather than interpolating file fragments, we extract or erase parts of complete (valid) JavaScript files based on specially-formatted comments, like the "fragment" comment pair shown below.

class Example {
  constructor (name) {
    this.name = name
  }

  // <fragment>
  fragment (message) {
    console.log(`${name}: ${message}`)
  }
  // </fragment>
}

Exercises

Security concerns

  1. Write a function loadAndRun that reads a file, evaluates it, and returns the result.

  2. Create a file trust-me.js that prints "nothing happening here" when it is evaluated, but also deletes everything in the directory called target.

  3. Write tests for this using mock-fs.

Please be careful doing this exercise.

Loading functions

Write a function that reads a file containing single-argument functions like this:

addOne: (x) => x + 1
halve: (x) => x / 2
array: (x) => Array(x).fill(0)

and returns an object containing callable functions.

Registering functions

Write a function that loads one or more files containing function definitions like this:

const double = (x) => {
  return 2 * x
}

EXPORTS.append(double)

and returns a list containing all the loaded functions.

Indenting inclusions

Modify the file inclusion system so that inclusions are indented by the same amount as the including comment. For example, if the including file is:

const withLogging = (args) => {
  /*+ logging call + logging.js +*/
}

withLogging

and the included file is:

console.log('first message')
console.log('second message')

then the result will be:

const withLogging = (args) => {
  console.log('first message')
  console.log('second message')
}

withLogging

i.e., all lines of the inclusion will be indented to match the first.

Interpolating from subdirectories

Modify the file interpolator so that snippets can be included from sub-directories using relative paths.

Recursive search for inclusions

  1. Modify the file interpolator so that it searches recursively through all subdirectories of the directories on the search path to find inclusions.

  2. Explain why this is a bad idea.

Defining variables

Modify the file inclusion system so that users can pass in a Map containing name-value pairs and have these interpolated into the text of the files being loaded. To interpolate a value, the included file must use $$name$$.

Specifying markers

Modify the file inclusion system so that the user can override the inclusion comment markers. For example, the user should be able to specify that /*! and !*/ be used to mark inclusions. (This is often used in tutorials that need to show the inclusion markers without them being interpreted.)

Recursive inclusions

Modify the file interpolator to support recursive includes, i.e., to handle inclusion markers in files that are being included. Be sure to check for the case of infinite includes.

Slicing files

Write a function that reads a JavaScript source file containing specially-formatted comments like the ones shown below and extracts the indicated section.

const toBeLeftOut = (args) => {
  console.log('this should not appear')
}

// <keepThis>
const toBeKept = (args) => {
  console.log('only this function should appear')
}
// </keepThis>

Users should be able to specify any tag they want, and if that tag occurs multiple times, all of the sections marked with that tag should be kept. (This is the approach we took for this book instead of file interpolation.)