Page Templates

Generating HTML pages from templates

Terms defined: bare object, Document Object Model (DOM), dynamic scoping, environment, lexical scoping, stack frame, Visitor pattern

Every program needs documentation in order to be usable, and the best place to put that documentation is on the web. Writing and updating pages by hand is time-consuming and error-prone, particularly when so many of their parts are the same, so most websites use some kind of tool to create HTML from templates.

Thousands of page templating systems have been written in the last thirty years in every popular programming language (and in fact one language, PHP, was created for this purpose). Most of these systems use one of three designs ():

  1. Mix commands in a language such as JavaScript with the HTML or Markdown using some kind of marker to indicate which parts are commands and which parts are to be taken as-is. This approach is taken by EJS, which we have used to write these lessons.

  2. Create a mini-language with its own commands like Jekyll (the templating system used by GitHub Pages). Mini-languages are appealing because they are smaller and safer than general-purpose languages, but experience shows that they quickly grow many of the features of a general-purpose language. Again, some kind of marker must be used to show which parts of the page are code and which are ordinary text.

  3. Use specially-named attributes in the HTML. This approach has been the least popular, but eliminates the need for a special parser (since pages are valid HTML).

Three options for page templates
Three different ways to implement page templating.

In this chapter we will build a simple page templating system using the third option. We will process each page independently by parsing the HTML and walking the DOM to find nodes with special attributes. Our program will execute the instructions in those nodes to do the equivalent of loops and if/else statements; other nodes will be copied as-is to create text.

What will our system look like?

Let's start by deciding what "done" looks like. Suppose we want to turn an array of strings into an HTML list. Our page will look like this:

<html>
  <body>
    <p>Expect three items</p>
    <ul q-loop="item:names">
      <li><span q-var="item"/></li>
    </ul>
  </body>
</html>

The attribute q-loop tells the tool to repeat that node; the loop variable and the collection being looped over are the attribute's value, separated by a colon. The attribute q-var tells the tool to fill in the node with the value of the variable. The output will look like HTML without any traces of how it was created:

<html>
  <body>
    <p>Expect three items</p>
    <ul>
      <li><span>Johnson</span></li>
    
      <li><span>Vaughan</span></li>
    
      <li><span>Jackson</span></li>
    </ul>
  </body>
</html>

Human-readable vs. machine-readable

The introduction said that mini-languages for page templating quickly start to accumulate extra features. We have already started down that road by putting the loop variable and loop target in a single attribute and parsing that attribute to get them out. Doing that makes loop elements easier for people to type, but means that important information is hidden from standard HTML processing tools, which can't know that this particular attribute of these particular elements contains multiple values or that those values should be extracted by splitting a string on a colon. We could instead require people to use two attributes, as in:

    <ul q-loop="names" q-loop-var="item">

What about processing templates? Our tool needs the template itself, somewhere to write its output, and some variables to use in the expansion. These variables might come from a configuration file, from a YAML header in the file itself, or from some mix of the two; for the moment, all we need to know is that we wil pass them into the expansion function as an object:

const variables = {
  names: ['Johnson', 'Vaughan', 'Jackson']
}
const dom = readHtml('template.html')
const expander = new Expander(dom, variables)
expander.walk()
console.log(expander.result)

How can we keep track of values?

Speaking of variables, we need a way to keep track of their current values: "current", because the value of a loop variable changes each time we go around the loop. We also need to maintain multiple sets of variables so that we can nest loops.

The standard solution is to create a stack of lookup tables. Each stack frame is an object with names and values; when we need to find a variable, we look through the stack frames in order to find the uppermost definition of that variable..

Scoping rules

Searching the stack frame by frame is called is dynamic scoping, since we find variables while the program is running. In contrast, most programming languages used lexical scoping, which figures out what a variable name refers to based on the structure of the program text.

The values in a running program are sometimes called an environment, so we have named our stack-handling class Env. Its methods let us push and pop new stack frames and find a variable given its name; if the variable can't be found, Env.find returns undefined instead of throwing an exception ().

class Env {
  constructor (initial) {
    this.stack = []
    this.push(Object.assign({}, initial))
  }

  push (frame) {
    this.stack.push(frame)
  }

  pop () {
    this.stack.pop()
  }

  find (name) {
    for (let i = this.stack.length - 1; i >= 0; i--) {
      if (name in this.stack[i]) {
        return this.stack[i][name]
      }
    }
    return undefined
  }

  toString () {
    return JSON.stringify(this.stack)
  }
}

export default Env
Variable stack
Using a stack to manage variables.

How do we handle nodes?

HTML pages have a nested structure, so we will process them using the Visitor pattern. Visitor's constructor takes the root node of the DOM tree as an argument and saves it. When we call Visitor.walk without a value, it starts recursing from that saved root; if .walk is given a value (as it is during recursive calls), it uses that instead.

import assert from 'assert'

class Visitor {
  constructor (root) {
    this.root = root
  }

  walk (node = null) {
    if (node === null) {
      node = this.root
    }
    if (this.open(node)) {
      node.children.forEach(child => {
        this.walk(child)
      })
    }
    this.close(node)
  }

  open (node) {
    assert(false,
      'Must implemented "open"')
  }

  close (node) {
    assert(false,
      'Must implemented "close"')
  }
}

export default Visitor

Visitor defines two methods called open and close that are called when we first arrive at a node and when we are finished with it (). The default implementations throw exceptions so that the creators of derived classes must remember to implement their own versions.

The Visitor pattern
Using the Visitor pattern to evaluate a page template.

The Expander class is a Visitor and uses an Env. It loads a handler for each type of special node we support---we will write these in a moment---and uses them to process each type of node:

  1. If the node is plain text, copy it to the output.

  2. If there is a handler for the node, call the handler's open or close method.

  3. Otherwise, open or close a regular tag.

import assert from 'assert'

import Visitor from './visitor.js'
import Env from './env.js'

import q_if from './q-if.js'
import q_loop from './q-loop.js'
import q_num from './q-num.js'
import q_var from './q-var.js'

const HANDLERS = {
  'q-if': q_if,
  'q-loop': q_loop,
  'q-num': q_num,
  'q-var': q_var
}

class Expander extends Visitor {
  constructor (root, vars) {
    super(root)
    this.env = new Env(vars)
    this.handlers = HANDLERS
    this.result = []
  }

  open (node) {
    if (node.type === 'text') {
      this.output(node.data)
      return false
    } else if (this.hasHandler(node)) {
      return this.getHandler(node).open(this, node)
    } else {
      this.showTag(node, false)
      return true
    }
  }

  close (node) {
    if (node.type === 'text') {
      return
    }
    if (this.hasHandler(node)) {
      this.getHandler(node).close(this, node)
    } else {
      this.showTag(node, true)
    }
  }
...
}

export default Expander

Checking to see if there is a handler for a particular node and getting that handler are straightforward:

  hasHandler (node) {
    for (const name in node.attribs) {
      if (name in this.handlers) {
        return true
      }
    }
    return false
  }

  getHandler (node) {
    const possible = Object.keys(node.attribs)
      .filter(name => name in this.handlers)
    assert(possible.length === 1,
      'Should be exactly one handler')
    return this.handlers[possible[0]]
  }

Finally, we need a few helper methods to show tags and generate output:

  showTag (node, closing) {
    if (closing) {
      this.output(`</${node.name}>`)
      return
    }

    this.output(`<${node.name}`)
    for (const name in node.attribs) {
      if (!name.startsWith('q-')) {
        this.output(` ${name}="${node.attribs[name]}"`)
      }
    }
    this.output('>')
  }

  output (text) {
    this.result.push((text === undefined) ? 'UNDEF' : text)
  }

  getResult () {
    return this.result.join('')
  }

Notice that this class adds strings to an array and then joins them all right at the end rather than concatenating strings repeatedly. Doing this is more efficient and also helps with debugging, since each string in the array corresponds to a single method call.

How do we implement node handlers?

So far we have built a lot of infrastructure but haven't actually processed a single special node. To do that, let's start with a handler that copies a constant number into the output:

export default {
  open: (expander, node) => {
    expander.showTag(node, false)
    expander.output(node.attribs['q-num'])
  },

  close: (expander, node) => {
    expander.showTag(node, true)
  }
}

When we enter a node like <span q-num="123"/>, this handler prints an opening tag and then copies the value of the q-num attribute to the output. When we are exiting the node, the handler closes the tag.

Note that this is not a class, but instead an object with two functions stored under the keys open and close. We could (and probably should) use a class for each handler so that handlers can store any extra state they need, but bare objects are still often used in JavaScript.

So much for constants; what about variables?

export default {
  open: (expander, node) => {
    expander.showTag(node, false)
    expander.output(expander.env.find(node.attribs['q-var']))
  },

  close: (expander, node) => {
    expander.showTag(node, true)
  }
}

This code is almost the same as the previous example; the only difference is that instead of copying the attribute value directly to the output, we use the attribute value as a key to look up a value in the environment.

These two pairs of handlers look plausible, but do they work? To find out, we can build a program that loads variable definitions from a JSON file, reads an HTML template, and does the expansion:

import fs from 'fs'
import htmlparser2 from 'htmlparser2'

import Expander from './expander.js'

const main = () => {
  const vars = readJSON(process.argv[2])
  const doc = readHtml(process.argv[3])
  const expander = new Expander(doc, vars)
  expander.walk()
  console.log(expander.getResult())
}

const readJSON = (filename) => {
  const text = fs.readFileSync(filename, 'utf-8')
  return JSON.parse(text)
}

const readHtml = (filename) => {
  const text = fs.readFileSync(filename, 'utf-8')
  return htmlparser2.parseDOM(text)[0]
}

main()

As we were writing this chapter, we added new variables for our test cases one by one. To avoid repeating text repeatedly, we show the entire set once:

{
  "firstVariable": "firstValue",
  "secondVariable": "secondValue",
  "variableName": "variableValue",
  "showThis": true,
  "doNotShowThis": false,
  "names": ["Johnson", "Vaughan", "Jackson"]
}

Our first test: is static text copied over as-is?

<html>
  <body>
    <h1>Only Static Text</h1>
    <p>This document only contains:</p>
    <ul>
      <li>static</li>
      <li>text</li>
    </ul>
  </body>
</html>
node template.js vars.json input-static-text.html
<html>
  <body>
    <h1>Only Static Text</h1>
    <p>This document only contains:</p>
    <ul>
      <li>static</li>
      <li>text</li>
    </ul>
  </body>
</html>

Only Static Text

This document only contains:

Good. Now, does the expander handle constants?

<html>
  <body>
    <p><span q-num="123"/></p>
  </body>
</html>
<html>
  <body>
    <p><span>123</span></p>
  </body>
</html>

123

What about a single variable?

<html>
  <body>
    <p><span q-var="variableName"/></p>
  </body>
</html>
<html>
  <body>
    <p><span>variableValue</span></p>
  </body>
</html>

variableValue

What about a page containing multiple variables? There's no reason it should fail if the single-variable case works, but variable lookup is one of the more complicated parts of our processing, so we should check:

<html>
  <body>
    <p><span q-var="firstVariable" /></p>
    <p><span q-var="secondVariable" /></p>
  </body>
</html>
<html>
  <body>
    <p><span>firstValue</span></p>
    <p><span>secondValue</span></p>
  </body>
</html>

firstValue

secondValue

How can we implement control flow?

Our tool supports two types of control flow: conditional expressions and loops. Since we don't support Boolean expressions like and and or, implementing a conditional is as simple as looking up a variable (which we know how to do) and then expanding the node if the value is true:

export default {
  open: (expander, node) => {
    const doRest = expander.env.find(node.attribs['q-if'])
    if (doRest) {
      expander.showTag(node, false)
    }
    return doRest
  },

  close: (expander, node) => {
    if (expander.env.find(node.attribs['q-if'])) {
      expander.showTag(node, true)
    }
  }
}

Let's test it:

<html>
  <body>
    <p q-if="showThis">This should be shown.</p>
    <p q-if="doNotShowThis">This should <em>not</em> be shown.</p>
  </body>
</html>
<html>
  <body>
    <p>This should be shown.</p>
    
  </body>
</html>

This should be shown.

And finally we come to loops. For these, we need to get the array we're looping over from the environment and do something once for each of its elements. That "something" is:

  1. Create a new stack frame holding the current value of the loop variable.

  2. Expand all of the node's children with that stack frame in place.

  3. Pop the stack frame to get rid of the temporary variable.

export default {
  open: (expander, node) => {
    const [indexName, targetName] = node.attribs['q-loop'].split(':')
    delete node.attribs['q-loop']
    expander.showTag(node, false)
    const target = expander.env.find(targetName)
    for (const index of target) {
      expander.env.push({ [indexName]: index })
      node.children.forEach(child => expander.walk(child))
      expander.env.pop()
    }
    return false
  },

  close: (expander, node) => {
    expander.showTag(node, true)
  }
}

Once again, it's not done until we test it:

<html>
  <body>
    <p>Expect three items</p>
    <ul q-loop="item:names">
      <li><span q-var="item"/></li>
    </ul>
  </body>
</html>
<html>
  <body>
    <p>Expect three items</p>
    <ul>
      <li><span>Johnson</span></li>
    
      <li><span>Vaughan</span></li>
    
      <li><span>Jackson</span></li>
    </ul>
  </body>
</html>

Expect three items

Notice how we create the new stack frame using:

{ [indexName]: index }

This is an ugly but useful trick. We can't write:

{ indexName: index }

because that would create an object with the string indexName as a key, rather than one with the value of the variable indexName as its key. We can't do this either:

{ `${indexName}`: index }

though it seems like we should be able to. Instead, we create an array containing the string we want. JavaScript automatically converts arrays to strings by concatenating their elements when it needs to, so our expression is a quick way to get the same effect as:

const temp = {}
temp[indexName] = index
expander.env.push(temp)

How did we know how to do all of this?

We have just implemented a simple programming language. It can't do arithmetic, but if we wanted to add tags like:

<span q-math="+"><span q-var="width"/><span q-num="1"/></span>

we could. It's unlikely anyone would use the result---typing all of that is so much clumsier than typing width+1 that people wouldn't use it unless they had no other choice---but the basic design is there.

We didn't invent any of this from scratch, any more than we invented the parsing algorithm of . Instead, we did what you are doing now: we read what other programmers had written and tried to make sense of the key ideas.

Exercises

Tracing execution

Add a directive <span q-trace="variable"/> that prints the current value of a variable using console.error for debugging.

Unit tests

Write unit tests for template expansion using Mocha.

Trimming text

Modify all of the directives to take an extra optional attribute q-trim="true" If this attribute is set, leading and trailing whitespace is trimmed from the directive's expansion.

Literal text

Add a directive <div q-literal="true">…</div> that copies the enclosed text as-is without interpreting or expanding any contained directives. (A directive like this would be needed when writing documentation for the template expander.)

Including other files

  1. Add a directive <div q-include="filename.html"/> that includes another file in the file being processed.

  2. Should included files be processed and the result copied into the including file, or should the text be copied in and then processed? What difference does it make to the way variables are evaluated?

HTML snippets

Add a directive <div q-snippet="variable">…</div> that saves some text in a variable so that it can be displayed later. For example:

<html>
  <body>
    <div q-snippet="prefix"><strong>Important:</strong></div>
    <p>Expect three items</p>
    <ul>
      <li q-loop="item:names">
        <span q-var="prefix"><span q-var="item"/>
      </li>
    </ul>
  </body>
</html>

would printed the word "Important:" in bold before each item in the list.

YAML headers

Modify the template expander to handle variables defined in a YAML header in the page being processed. For example, if the page is:

---
name: "Dorothy Johnson Vaughan"
---
<html>
  <body>
    <p><span q-var="name"/></p>
  </body>
</html>

will create a paragraph containing the given name.

Expanding all files

Write a program expand-all.js that takes two directory names as command-line arguments and builds a website in the second directory by expanding all of the HTML files found in the first or in sub-directories of the first.

Counting loops

Add a directive <div q-index="indexName" q-limit="limitName">…</div> that loops from zero to the value in the variable limitName, putting the current iteration index in indexName.

Auxiliary functions

  1. Modify Expander so that it takes an extra argument auxiliaries containing zero or more named functions:

    const expander = new Expander(root, vars, {
      max: Math.max,
      trim: (x) => x.trim()
    })
    
  2. Add a directive <span q-call="functionName" q-args="var,var"/> that looks up a function in auxiliaries and calls it with the given variables as arguments.