Page Templates

Generating HTML pages from templates

Terms defined: Visitor pattern, bare object, dynamic scoping, environment, lexical scoping, stack frame, static site generator

Every program needs documentation in order to be usable, and the best place to put that documentation is on the web. Writing and updating pages by hand is time-consuming and error-prone, particularly when many parts are the same, so most documentation sites use some kind of static site generator to create web pages from templates.

At the heart of every static site generator is a page templating system. Thousands of these have been written in the last thirty years in every popular programming language (and one language, PHP, was created for this purpose). Most of these systems use one of three designs ():

  1. Mix commands in a language such as JavaScript with the HTML or Markdown using some kind of marker to indicate which parts are commands and which parts are to be taken as-is. This approach is taken by EJS, which we used to write these lessons.

  2. Create a mini-language with its own commands like Jekyll (which is used by GitHub Pages). Mini-languages are appealing because they are smaller and safer than general-purpose languages, but experience shows that they eventually grow most of the features of a general-purpose language. Again, some kind of marker must be used to show which parts of the page are code and which are ordinary text.

  3. Put directives in specially-named attributes in the HTML. This approach has been the least popular, but since pages are valid HTML, it eliminates the need for a special parser.

Three options for page templates
Three different ways to implement page templating.

In this chapter we will build a simple page templating system using the third strategy. We will process each page independently by parsing the HTML and walking the DOM to find nodes with special attributes. Our program will execute the instructions in those nodes to do the equivalent of loops and if/else statements; other nodes will be copied as-is to create text.

What will our system look like?

Let's start by deciding what "done" looks like. Suppose we want to turn an array of strings into an HTML list. Our page will look like this:

<html>
  <body>
    <p>Expect three items</p>
    <ul z-loop="item:names">
      <li><span z-var="item"/></li>
    </ul>
  </body>
</html>

The attribute z-loop tells the tool to repeat the contents of that node; the loop variable and the collection being looped over are separated by a colon. The attribute z-var tells the tool to fill in the node with the value of the variable.

When our tool processes this page, the output will be standard HTML without any traces of how it was created:

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <p>Expect three items</p>
    <ul>
      <li><span>Johnson</span></li>
    
      <li><span>Vaughan</span></li>
    
      <li><span>Jackson</span></li>
    </ul>
  </body>
</html>

Human-readable vs. machine-readable

The introduction said that mini-languages for page templating quickly start to accumulate extra features. We have already started down that road by putting the loop variable and loop target in a single attribute and splitting that attribute to get them out. Doing this makes loops easy for people to type, but hides important information from standard HTML processing tools. They can't know that this particular attribute of these particular elements contains multiple values or that those values should be extracted by splitting a string on a colon. We could instead require people to use two attributes, as in:

<ul z-loop="names" z-loop-var="item">

but we have decided to err on the side of minimal typing. And note that strictly speaking, we should call our attributes data-something instead of z-something to conform with the HTML5 specification, but by the time we're finished processing our templates, there shouldn't be any z-* attributes left to confuse a browser.

The next step is to define the API for filling in templates. Our tool needs the template itself, somewhere to write its output, and some variables to use in the expansion. These variables might come from a configuration file, from a YAML header in the file itself, or from some mix of the two; for the moment, we will just pass them into the expansion function as an object:

const variables = {
  names: ['Johnson', 'Vaughan', 'Jackson']
}
const dom = readHtml('template.html') 
const expander = new Expander(dom, variables) 
expander.walk()
console.log(expander.result)

How can we keep track of values?

Speaking of variables, we need a way to keep track of their current values; we say "current" because the value of a loop variable changes each time we go around the loop. We also need to maintain multiple sets of variables so that variables used inside a loop don't conflict with ones used outside it. (We don't actually "need" to do this—we could just have one global set of variables—but experience teaches us that if all our variables are global, all of our programs will be buggy.)

The standard way to manage variables is to create a stack of lookup tables. Each stack frame is an object with names and values; when we need to find a variable, we look through the stack frames in order to find the uppermost definition of that variable..

Scoping rules

Searching the stack frame by frame while the program is running is called is dynamic scoping, since we find variables while the program is running. In contrast, most programming languages used lexical scoping, which figures out what a variable name refers to based on the structure of the program text.

The values in a running program are sometimes called an environment, so we have named our stack-handling class Env. Its methods let us push and pop new stack frames and find a variable given its name; if the variable can't be found, Env.find returns undefined instead of throwing an exception ().

class Env {
  constructor (initial) {
    this.stack = []
    this.push(Object.assign({}, initial))
  }

  push (frame) {
    this.stack.push(frame)
  }

  pop () {
    this.stack.pop()
  }

  find (name) {
    for (let i = this.stack.length - 1; i >= 0; i--) {
      if (name in this.stack[i]) {
        return this.stack[i][name]
      }
    }
    return undefined
  }

  toString () {
    return JSON.stringify(this.stack)
  }
}

export default Env

Variable stack
Using a stack to manage variables.

How do we handle nodes?

HTML pages have a nested structure, so we will process them using the Visitor design pattern. Visitor's constructor takes the root node of the DOM tree as an argument and saves it. When we call Visitor.walk without a value, it starts recursing from that saved root; if .walk is given a value (as it is during recursive calls), it uses that instead.

import assert from 'assert'

class Visitor {
  constructor (root) {
    this.root = root
  }

  walk (node = null) {
    if (node === null) {
      node = this.root
    }
    if (this.open(node)) {
      node.children.forEach(child => {
        this.walk(child)
      })
    }
    this.close(node)
  }

  open (node) {
    assert(false,
      'Must implemented "open"')
  }

  close (node) {
    assert(false,
      'Must implemented "close"')
  }
}

export default Visitor

Visitor defines two methods called open and close that are called when we first arrive at a node and when we are finished with it (). The default implementations of these methods throw exceptions to remind the creators of derived classes to implement their own versions.

The Visitor pattern
Using the Visitor pattern to evaluate a page template.

The Expander class is specialization of Visitor that uses an Env to keep track of variables. It imports a handler for each type of special node we support—we will write those in a moment—and uses them to process each type of node:

  1. If the node is plain text, copy it to the output.

  2. If there is a handler for the node, call the handler's open or close method.

  3. Otherwise, open or close a regular tag.

import assert from 'assert'

import Visitor from './visitor.js'
import Env from './env.js'

import z_if from './z-if.js'
import z_loop from './z-loop.js'
import z_num from './z-num.js'
import z_var from './z-var.js'

const HANDLERS = {
  'z-if': z_if,
  'z-loop': z_loop,
  'z-num': z_num,
  'z-var': z_var
}

class Expander extends Visitor {
  constructor (root, vars) {
    super(root)
    this.env = new Env(vars)
    this.handlers = HANDLERS
    this.result = []
  }

  open (node) {
    if (node.type === 'text') {
      this.output(node.data)
      return false
    } else if (this.hasHandler(node)) {
      return this.getHandler(node).open(this, node)
    } else {
      this.showTag(node, false)
      return true
    }
  }

  close (node) {
    if (node.type === 'text') {
      return
    }
    if (this.hasHandler(node)) {
      this.getHandler(node).close(this, node)
    } else {
      this.showTag(node, true)
    }
  }

}

export default Expander

Checking to see if there is a handler for a particular node and getting that handler are straightforward—we just look at the node's attributes:


  hasHandler (node) {
    for (const name in node.attribs) {
      if (name in this.handlers) {
        return true
      }
    }
    return false
  }

  getHandler (node) {
    const possible = Object.keys(node.attribs)
      .filter(name => name in this.handlers)
    assert(possible.length === 1,
      'Should be exactly one handler')
    return this.handlers[possible[0]]
  }

Finally, we need a few helper methods to show tags and generate output:


  showTag (node, closing) {
    if (closing) {
      this.output(`</${node.name}>`)
      return
    }

    this.output(`<${node.name}`)
    if (node.name === 'body') {
      this.output(' style="font-size: 200%; margin-left: 0.5em"')
    }
    for (const name in node.attribs) {
      if (!name.startsWith('z-')) {
        this.output(` ${name}="${node.attribs[name]}"`)
      }
    }
    this.output('>')
  }

  output (text) {
    this.result.push((text === undefined) ? 'UNDEF' : text)
  }

  getResult () {
    return this.result.join('')
  }

Notice that this class adds strings to an array and joins them all right at the end rather than concatenating strings repeatedly. Doing this is more efficient and also helps with debugging, since each string in the array corresponds to a single method call.

How do we implement node handlers?

At this point we have built a lot of infrastructure but haven't actually processed any special nodes. To do that, let's write a handler that copies a constant number into the output:

export default {
  open: (expander, node) => {
    expander.showTag(node, false)
    expander.output(node.attribs['z-num'])
  },

  close: (expander, node) => {
    expander.showTag(node, true)
  }
}

When we enter a node like <span z-num="123"/> this handler asks the expander to show an opening tag followed by the value of the z-num attribute. When we exit the node, the handler asks the expander to close the tag. The handler doesn't know whether things are printed immediately, added to an output list, or something else; it just knows that whoever called it implements the low-level operations it needs.

Note that this expander is not a class, but instead an object with two functions stored under the keys open and close. We could use a class for each handler so that handlers can store any extra state they need, but bare objects are common and useful in JavaScript (though we will see below that we should have used classes).

So much for constants; what about variables?

export default {
  open: (expander, node) => {
    expander.showTag(node, false)
    expander.output(expander.env.find(node.attribs['z-var']))
  },

  close: (expander, node) => {
    expander.showTag(node, true)
  }
}

This code is almost the same as the previous example. The only difference is that instead of copying the attribute's value directly to the output, we use it as a key to look up a value in the environment.

These two pairs of handlers look plausible, but do they work? To find out, we can build a program that loads variable definitions from a JSON file, reads an HTML template, and does the expansion:

import fs from 'fs'
import htmlparser2 from 'htmlparser2'

import Expander from './expander.js'

const main = () => {
  const vars = readJSON(process.argv[2])
  const doc = readHtml(process.argv[3])
  const expander = new Expander(doc, vars)
  expander.walk()
  console.log(expander.getResult())
}

const readJSON = (filename) => {
  const text = fs.readFileSync(filename, 'utf-8')
  return JSON.parse(text)
}

const readHtml = (filename) => {
  const text = fs.readFileSync(filename, 'utf-8')
  return htmlparser2.parseDOM(text)[0]
}

main()

We added new variables for our test cases one by one as we were writing this chapter. To avoid repeating text repeatedly, we show the entire set once:

{
  "firstVariable": "firstValue",
  "secondVariable": "secondValue",
  "variableName": "variableValue",
  "showThis": true,
  "doNotShowThis": false,
  "names": ["Johnson", "Vaughan", "Jackson"]
}

Our first test: is static text copied over as-is ()?

<html>
  <body>
    <h1>Static Text</h1>
    <p>This page has:</p>
    <ul>
      <li>static</li>
      <li>text</li>
    </ul>
  </body>
</html>

node template.js vars.json input-static-text.html

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <h1>Static Text</h1>
    <p>This page has:</p>
    <ul>
      <li>static</li>
      <li>text</li>
    </ul>
  </body>
</html>

Generating static text
Static text generated by page templates.

Good. Now, does the expander handle constants ()?

<html>
  <body>
    <p><span z-num="123"/></p>
  </body>
</html>

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <p><span>123</span></p>
  </body>
</html>

Generating a single constant
A single constant generated by page templates.

What about a single variable ()?

<html>
  <body>
    <p><span z-var="variableName"/></p>
  </body>
</html>

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <p><span>variableValue</span></p>
  </body>
</html>

Generating a single variable
A single variable generated by page templates.

What about a page containing multiple variables? There's no reason it should fail if the single-variable case works, but we should still check—again, software isn't done until it has been tested ().

<html>
  <body>
    <p><span z-var="firstVariable" /></p>
    <p><span z-var="secondVariable" /></p>
  </body>
</html>

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <p><span>firstValue</span></p>
    <p><span>secondValue</span></p>
  </body>
</html>

Generating multiple variables
Multiple variables generated by page templates.

How can we implement control flow?

Our tool supports two types of control flow: conditional expressions and loops. Since we don't support Boolean expressions like and and or, implementing a conditional is as simple as looking up a variable (which we know how to do) and then expanding the node if the value is true:

export default {
  open: (expander, node) => {
    const doRest = expander.env.find(node.attribs['z-if'])
    if (doRest) {
      expander.showTag(node, false)
    }
    return doRest
  },

  close: (expander, node) => {
    if (expander.env.find(node.attribs['z-if'])) {
      expander.showTag(node, true)
    }
  }
}

Let's test it ():

<html>
  <body>
    <p z-if="showThis">This should be shown.</p>
    <p z-if="doNotShowThis">This should <em>not</em> be shown.</p>
  </body>
</html>

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <p>This should be shown.</p>
    
  </body>
</html>

Generating conditional text
Conditional text generated by page templates.

Spot the bug

This implementation of if contains a subtle bug. The open and close functions both check the value of the control variable. If something inside the body of the if changes that value, the result could be an opening tag without a matching closing tag or vice versa. We haven't implemented an assignment operator, so right now there's no way for that to happen, but it's a plausible thing for us to add later, and tracking down a bug in old code that is revealed by new code is always a headache.

Finally we come to loops. For these, we need to get the array we're looping over from the environment and do something for each of its elements. That "something" is:

  1. Create a new stack frame holding the current value of the loop variable.

  2. Expand all of the node's children with that stack frame in place.

  3. Pop the stack frame to get rid of the temporary variable.

export default {
  open: (expander, node) => {
    const [indexName, targetName] = node.attribs['z-loop'].split(':')
    delete node.attribs['z-loop']
    expander.showTag(node, false)
    const target = expander.env.find(targetName)
    for (const index of target) {
      expander.env.push({ [indexName]: index })
      node.children.forEach(child => expander.walk(child))
      expander.env.pop()
    }
    return false
  },

  close: (expander, node) => {
    expander.showTag(node, true)
  }
}

Once again, it's not done until we test it ():

<html>
  <body>
    <p>Expect three items</p>
    <ul z-loop="item:names">
      <li><span z-var="item"/></li>
    </ul>
  </body>
</html>

<html>
  <body style="font-size: 200%; margin-left: 0.5em">
    <p>Expect three items</p>
    <ul>
      <li><span>Johnson</span></li>
    
      <li><span>Vaughan</span></li>
    
      <li><span>Jackson</span></li>
    </ul>
  </body>
</html>

Generating text with a loop
Repeated text generated with a loop by page templates.

Notice how we create the new stack frame using:

{ [indexName]: index }

This is an ugly but useful trick. We can't write:

{ indexName: index }

because that would create an object with the string indexName as a key, rather than one with the value of the variable indexName as its key. We can't do this either:

{ `${indexName}`: index }

though it seems like we should be able to. Instead, we create an array containing the string we want. Since JavaScript automatically converts arrays to strings by concatenating their elements when it needs to, our expression is a quick way to get the same effect as:

const temp = {}
temp[indexName] = index
expander.env.push(temp)

Those three lines are much easier to understand, though, so we should probably have been less clever.

How did we know how to do all of this?

We have just implemented a simple programming language. It can't do arithmetic, but if we wanted to add tags like:

<span z-math="+"><span z-var="width"/><span z-num="1"//>

we could. It's unlikely anyone would use the result—typing all of that is so much clumsier than typing width+1 that people wouldn't use it unless they had no other choice—but the basic design is there.

We didn't invent any of this from scratch, any more than we invented the parsing algorithm of . Instead, we did what you are doing now: we read what other programmers had written and tried to make sense of the key ideas.

The problem is that "making sense" depends on who we are. When we use a low-level language, we incur the cognitive load of assembling micro-steps into something more meaningful. When we use a high-level language, on the other hand, we incur a similar load translating functions of functions of functions (or meta-classes templated on object factories) into actual operations on actual data.

More experienced programmers are more capable at both ends of the curve, but that's not the only thing that changes. If a novice's comprehension curve looks like the one on the left of , then an expert's looks like the one on the right. Experts don't just understand more at all levels of abstraction; their preferred level has also shifted so that \(\sqrt{x^2 + y^2}\) is actually more readable than the medieval expression "the side of the square whose area is the sum of the areas of the two squares whose sides are given by the first part and the second part".

Comprehension curves
Novice and expert comprehension curves.

One implication of this is that for any given task, the software that is quickest for a novice to comprehend will almost certainly be different from the software that an expert can understand most quickly. In an ideal world our tools would automatically re-represent programs at different levels, so that with a click of a button we could view our code as either:

const hosts = links.map(a => a.href.split(':')[1].split('/')[0]).unique()

or:

hosts = []
for (each a in links) do
  temp <- attr(a, 'href').split(':')[1].split('/')[0]
  if (not (temp in hosts)) do
    hosts.append(temp)
  end
end

just as we could change the colors used for syntax highlighting or the depth to which loop bodies are indented. But today's tools don't do that, and I suspect that any IDE smart enough to translate between comprehension levels automatically would also be smart enough to write the code without our help.

Exercises

Tracing execution

Add a directive <span z-trace="variable"/> that prints the current value of a variable using console.error for debugging.

Unit tests

Write unit tests for template expansion using Mocha.

Trimming text

Modify all of the directives to take an extra optional attribute z-trim="true" If this attribute is set, leading and trailing whitespace is trimmed from the directive's expansion.

Literal text

Add a directive <div z-literal="true">…</div> that copies the enclosed text as-is without interpreting or expanding any contained directives. (A directive like this would be needed when writing documentation for the template expander.)

Including other files

  1. Add a directive <div z-include="filename.html"/> that includes another file in the file being processed.

  2. Should included files be processed and the result copied into the including file, or should the text be copied in and then processed? What difference does it make to the way variables are evaluated?

HTML snippets

Add a directive <div z-snippet="variable">…</div> that saves some text in a variable so that it can be displayed later. For example:

<html>
  <body>
    <div z-snippet="prefix"><strong>Important:</strong></div>
    <p>Expect three items</p>
    <ul>
      <li z-loop="item:names">
        <span z-var="prefix"><span z-var="item"/>
      </li>
    </ul>
  </body>
</html>

would printed the word "Important:" in bold before each item in the list.

YAML headers

Modify the template expander to handle variables defined in a YAML header in the page being processed. For example, if the page is:

---
name: "Dorothy Johnson Vaughan"
---
<html>
  <body>
    <p><span z-var="name"/></p>
  </body>
</html>

will create a paragraph containing the given name.

Expanding all files

Write a program expand-all.js that takes two directory names as command-line arguments and builds a website in the second directory by expanding all of the HTML files found in the first or in sub-directories of the first.

Counting loops

Add a directive <div z-index="indexName" z-limit="limitName">…</div> that loops from zero to the value in the variable limitName, putting the current iteration index in indexName.

Auxiliary functions

  1. Modify Expander so that it takes an extra argument auxiliaries containing zero or more named functions:

    const expander = new Expander(root, vars, {
      max: Math.max,
      trim: (x) => x.trim()
    })
    
  2. Add a directive <span z-call="functionName" z-args="var,var"/> that looks up a function in auxiliaries and calls it with the given variables as arguments.