Module Bundler
Turning many files into one
Terms defined:
JavaScript was designed in a hurry 25 years ago to make web pages interactive. Nobody realized it would become one of the most popular programming languages in the world, which means it didn't include support for things that large programs need. One of those was a way to turn many easy-to-edit source files into a single easy-to-load file so that browsers could get what they needed with a single request.
A

Bundling requires an import
and export
statements,
we will use the older require
and module.exports
.
Our test case and the expected output are:
const main = () => {
console.log('in main')
}
module.exports = main
in main
In our second test case,
main.js
requires other.js
,
which doesn't require anything.
The main file is:
const other = require('./other')
const main = () => {
console.log(other('main'))
}
module.exports = main
and the required file is:
const other = require('./other')
const main = () => {
console.log(other('main'))
}
module.exports = main
The output we expect is:
other called from main
Our third test case has multiple inclusions in multiple directories
and is shown in
./main
requires all four of the files below../top-left
doesn't require anything../top-right
requirestop-left
andbottom-right
../subdir/bottom-left
also requirestop-left
andbottom-right
../subdir/bottom-right
doesn't require anything.

The main program is:
// main.js
const topLeft = require('./top-left') // none
const topRight = require('./top-right') // needs top-left + bottom-right
const bottomLeft = require('./subdir/bottom-left') // needs top-left + bottom-right
const bottomRight = require('./subdir/bottom-right') // none
const main = () => {
const functions = [topLeft, topRight, bottomLeft, bottomRight]
functions.forEach(func => {
console.log(`${func('main')}`)
})
}
module.exports = main
and the other four files use require
and module.exports
.
The output we expect is:
topLeft from main
topRight from main with topLeft from topRight and bottomRight from \
topRight
bottomLeft from main with topLeft from bottomLeft and bottomRight from \
bottomLeft
bottomRight from main
We do not handle circular dependencies
because require
itself doesn't (
How can we find dependencies?
To get all the dependencies for one source file,
we parse it and extract all of the calls to require
.
The code to do this is relatively straightforward given what we know about Acorn:
import acorn from 'acorn'
import fs from 'fs'
import walk from 'acorn-walk'
const getRequires = (filename) => {
const entryPointFile = filename
const text = fs.readFileSync(entryPointFile, 'utf-8')
const ast = acorn.parse(text)
const requires = []
walk.simple(ast, {
CallExpression: (node, state) => {
if ((node.callee.type === 'Identifier') &&
(node.callee.name === 'require')) {
state.push(node.arguments[0].value)
}
}
}, null, requires)
return requires
}
export default getRequires
import getRequires from './get-requires.js'
const result = getRequires(process.argv[2])
console.log(result)
node test-get-requires.js simple/main.js
[ './other' ]
An unsolvable problem
The dependency finder shown above gives the right answer for any reasonable JavaScript program,
but not all JavaScript is reasonable.
Suppose creates an alias for require
and uses that to load other files:
const req = require
const weWillMissThis = req('./other-file')
We could try to trace variable assignments to catch cases like these, but someone could still fool us by writing this:
const clever = eval(`require`)
const weWillMissThisToo = clever('./other-file')
There is no general solution to this problem other than actually running the code to see what it does. If you would like to understand why not, and learn about a pivotal moment in the history of computing, we highly recommend Petzold2008.
To get all of the dependencies our bundle needs
we need to find the pending
,
which contains the things we haven't looked at yet,
and seen
,
which contains the things we have
(pending
initially contains the entry point file and seen
is initially empty.
We keep taking items from pending
until it is empty.
If the current thing is already in seen
we do nothing,
while otherwise we get its dependencies and add them to either seen
or pending
.

Finding dependencies is complicated by the fact that we can load something under different names,
such as ./subdir/bottom-left
from main
but ./bottom-left
from ./subdir/bottom-right
.
As with the module loader in Set
class doesn't have an equivalent of Array.pop
,
so we will maintain the set of pending items as a list.
The resulting code is:
import path from 'path'
import getRequires from './get-requires.js'
const transitiveClosure = (entryPointPath) => {
const pending = [path.resolve(entryPointPath)]
const filenames = new Set()
while (pending.length > 0) {
const candidate = path.resolve(pending.pop())
if (filenames.has(candidate)) {
continue
}
filenames.add(candidate)
const candidateDir = path.dirname(candidate)
getRequires(candidate)
.map(raw => path.resolve(path.join(candidateDir, `${raw}.js`)))
.filter(cooked => !filenames.has(cooked))
.forEach(cooked => pending.push(cooked))
}
return [...filenames]
}
export default transitiveClosure
import transitiveClosure from './transitive-closure-only.js'
const result = transitiveClosure(process.argv[2])
console.log(JSON.stringify(result, null, 2))
node test-transitive-closure-only.js full/main.js
[
"/u/stjs/module-bundler/full/main.js",
"/u/stjs/module-bundler/full/subdir/bottom-right.js",
"/u/stjs/module-bundler/full/subdir/bottom-left.js",
"/u/stjs/module-bundler/full/top-left.js",
"/u/stjs/module-bundler/full/top-right.js"
]
This works,
but it isn't keeping track of the mapping from required names within files to absolute paths,
so when one of the files in our bundle tries to access something,
we might not know what it's after.
The fix is to modify transitive closure to construct and return a two-level structure.
The primary keys are the absolute paths to the files being required,
while sub-keys are the paths they refer to when loading things
(

Adding this takes our transitive closure code from
24 lines to 30 lines:
import path from 'path'
import getRequires from './get-requires.js'
const transitiveClosure = (entryPointPath) => {
const mapping = {}
const pending = [path.resolve(entryPointPath)]
const filenames = new Set()
while (pending.length > 0) {
const candidate = path.resolve(pending.pop())
if (filenames.has(candidate)) {
continue
}
filenames.add(candidate)
mapping[candidate] = {}
const candidateDir = path.dirname(candidate)
getRequires(candidate)
.map(raw => {
mapping[candidate][raw] =
path.resolve(path.join(candidateDir, `${raw}.js`))
return mapping[candidate][raw]
})
.filter(cooked => cooked !== null)
.forEach(cooked => pending.push(cooked))
}
return mapping
}
export default transitiveClosure
import transitiveClosure from './transitive-closure.js'
const result = transitiveClosure(process.argv[2])
console.log(JSON.stringify(result, null, 2))
node test-transitive-closure.js full/main.js
{
"/u/stjs/module-bundler/full/main.js": {
"./top-left": "/u/stjs/module-bundler/full/top-left.js",
"./top-right": "/u/stjs/module-bundler/full/top-right.js",
"./subdir/bottom-left": \
"/u/stjs/module-bundler/full/subdir/bottom-left.js",
"./subdir/bottom-right": \
"/u/stjs/module-bundler/full/subdir/bottom-right.js"
},
"/u/stjs/module-bundler/full/subdir/bottom-right.js": {},
"/u/stjs/module-bundler/full/subdir/bottom-left.js": {
"../top-left": "/u/stjs/module-bundler/full/top-left.js",
"./bottom-right": \
"/u/stjs/module-bundler/full/subdir/bottom-right.js"
},
"/u/stjs/module-bundler/full/top-left.js": {},
"/u/stjs/module-bundler/full/top-right.js": {
"./top-left": "/u/stjs/module-bundler/full/top-left.js",
"./subdir/bottom-right": \
"/u/stjs/module-bundler/full/subdir/bottom-right.js"
}
}
How can we safely combine several files into one?
We now need to combine all these files into one while keeping each in its own namespace.
We do this using the same method we used in module
object to fill in
and an implementation of require
to resolve dependencies within the bundle.
For example, suppose we have this file:
const main = () => {
console.log('in main')
}
module.exports = main
The wrapped version will look like this:
const wrapper = (module, require) => {
const main = () => {
console.log('in main')
}
module.exports = main
}
And we can test it like this:
const wrapper = (module, require) => {
const main = () => {
console.log('in main')
}
module.exports = main
}
const _require = (name) => null
const temp = {}
wrapper(temp, _require)
temp.exports()
in main
We need to do this for multiple files, so we will put these functions in a lookup table with their files' absolute paths as its keys. We will also wrap the loading in a function so that we don't accidentally step on anyone else's toys:
import fs from 'fs'
import path from 'path'
const HEAD = `const initialize = (creators) => {
`
const TAIL = `
}
`
const combineFiles = (allFilenames) => {
const body = allFilenames
.map(filename => {
const key = path.resolve(filename)
const source = fs.readFileSync(filename, 'utf-8')
const func = `(module, require) => {${source}}`
const entry = `creators.set('${key}',\n${func})`
return `// ${key}\n${entry}\n`
})
.join('\n')
const func = `${HEAD}\n${body}\n${TAIL}`
return func
}
export default combineFiles
Breaking this down,
the code in HEAD
creates a function of no arguments
while the code in TAIL
returns the lookup table from that function.
In between,
combineFiles
adds an entry to the lookup table for each file
(

We can test that this works in our two-file case:
import combineFiles from './combine-files.js'
console.log(combineFiles(process.argv.slice(2)))
const initialize = (creators) => {
// /u/stjs/module-bundler/simple/main.js
creators.set('/u/stjs/module-bundler/simple/main.js',
(module, require) => {const other = require('./other')
const main = () => {
console.log(other('main'))
}
module.exports = main
})
// /u/stjs/module-bundler/simple/other.js
creators.set('/u/stjs/module-bundler/simple/other.js',
(module, require) => {const other = (caller) => {
return `other called from ${caller}`
}
module.exports = other
})
}
and then load the result and call initialize
:
Map(2) {
'/u/stjs/module-bundler/simple/main.js' => [Function (anonymous)],
'/u/stjs/module-bundler/simple/other.js' => [Function (anonymous)]
}
How can files access each other?
The code we have built so far has not yet created our exports; instead, it has created a lookup table of functions that can create what we asked for. More specifically we have
-
a map from absolute filenames to functions that create the exports for those modules;
-
a map from absolute filenames to pairs of (written import name, absolute filename); and
-
an entry point.
To turn this into what we want we look up the function associated with the entry point and run it,
giving it an empty module object and a require
function that we will describe below,
then get the exports
from the module object
(

Our replacement for require
is only allowed to take one argument
because that's all that JavaScript's require
takes.
However,
it actually needs four things:
the argument to the user's require
call,
the absolute path of the file making the call,
and the two lookup tables.
Those two tables can't be global variables because of possible name collisions:
no matter what we call them,
the user might have given a variable the same name.
As in

We also need a third structure: a cache for the modules we've already loaded. Putting it all together we have:
import fs from 'fs'
import path from 'path'
import transitiveClosure from './transitive-closure.js'
const HEAD = `const creators = new Map()
const cache = new Map()
const makeRequire = (absPath) => {
return (localPath) => {
const actualKey = translate[absPath][localPath]
if (!cache.has(actualKey)) {
const m = {}
creators.get(actualKey)(m)
cache.set(actualKey, m.exports)
}
return cache.get(actualKey)
}
}
const initialize = (creators) => {
`
const TAIL = `
}
initialize(creators)
`
const makeProof = (entryPoint) => `
const start = creators.get('${entryPoint}')
const m = {}
start(m)
m.exports()
`
const createBundle = (entryPoint) => {
entryPoint = path.resolve(entryPoint)
const table = transitiveClosure(entryPoint)
const translate = `const translate = ${JSON.stringify(table, null, 2)}`
const creators = Object.keys(table).map(filename => makeCreator(filename))
const proof = makeProof(entryPoint)
return [
translate,
HEAD,
...creators,
TAIL,
proof
].join('\n')
}
const makeCreator = (filename) => {
const key = path.resolve(filename)
const source = fs.readFileSync(filename, 'utf-8')
const func = `(module, require = makeRequire('${key}')) =>\n{${source}}`
const entry = `creators.set('${key}',\n${func})`
return `// ${key}\n${entry}\n`
}
export default createBundle
This code is really hard to read,
both because we have to distinguish what is being printed in the output versus what is being executed right now
and because of the levels of nesting needed to capture variables safely.
Getting this right took much more time per line of finished code than anything we have seen so far
except the promises in
To prove that this works
we will look up the function main
in the first file and call it;
if we were loading in the browser,
we'd capture the exports in a variable for later use.
First, we create the bundled file:
echo '/* eslint-disable */' > bundle-single.js
node test-create-bundle.js single/main.js >> bundle-single.js
const translate = {
"/u/stjs/module-bundler/single/main.js": {}
}
const creators = new Map()
const cache = new Map()
const makeRequire = (absPath) => {
return (localPath) => {
const actualKey = translate[absPath][localPath]
if (!cache.has(actualKey)) {
const m = {}
creators.get(actualKey)(m)
cache.set(actualKey, m.exports)
}
return cache.get(actualKey)
}
}
const initialize = (creators) => {
// /u/stjs/module-bundler/single/main.js
creators.set('/u/stjs/module-bundler/single/main.js',
(module, require = makeRequire('/u/stjs/module-bundler/single/main.js')) =>
{const main = () => {
console.log('in main')
}
module.exports = main
})
}
initialize(creators)
const start = creators.get('/u/stjs/module-bundler/single/main.js')
const m = {}
start(m)
m.exports()
and then we run it:
n main
That was a lot of work to print one line,
but what we have should work for other files.
The two-file case with main
and other
works:
const translate = {
"/u/stjs/module-bundler/simple/main.js": {
"./other": "/u/stjs/module-bundler/simple/other.js"
},
"/u/stjs/module-bundler/simple/other.js": {}
}
const creators = new Map()
const cache = new Map()
const makeRequire = (absPath) => {
return (localPath) => {
const actualKey = translate[absPath][localPath]
if (!cache.has(actualKey)) {
const m = {}
creators.get(actualKey)(m)
cache.set(actualKey, m.exports)
}
return cache.get(actualKey)
}
}
const initialize = (creators) => {
// /u/stjs/module-bundler/simple/main.js
creators.set('/u/stjs/module-bundler/simple/main.js',
(module, require = makeRequire('/u/stjs/module-bundler/simple/main.js')) =>
{const other = require('./other')
const main = () => {
console.log(other('main'))
}
module.exports = main
})
// /u/stjs/module-bundler/simple/other.js
creators.set('/u/stjs/module-bundler/simple/other.js',
(module, require = makeRequire('/u/stjs/module-bundler/simple/other.js')) =>
{const other = (caller) => {
return `other called from ${caller}`
}
module.exports = other
})
}
initialize(creators)
const start = creators.get('/u/stjs/module-bundler/simple/main.js')
const m = {}
start(m)
m.exports()
ther called from main
and so does our most complicated test with main
and four other files:
topLeft from main
topRight from main with topLeft from topRight and bottomRight from \
topRight
bottomLeft from main with topLeft from bottomLeft and bottomRight from \
bottomLeft
bottomRight from main
Exercises
Using test-driven development
Suppose we wanted to compress the files being stored by the file backup system in
Finding `import` dependencies
Modify the dependency finder to work with import
statements instead of require
calls.
Track files using hashes
Modify the dependency finder to track files by hashing them instead of relying on paths, so that if exactly the same file is being required from two locations, only one copy is loaded.
Using asynchronous file operations
Modify the dependency finder to use async
and await
instead of synchronous file operations.
Unit testing transitive closure
Write unit tests for the tool that finds the transitive closure of files' requirements
using Mocha and mock-fs
.
(Rather than parsing JavaScript files in the mock filesystem,
have each file contain only a list of the names of the files it depends on.)
Exporting multiple functions
Create test cases for the module bundler in which files export more than one function and fix any bugs in the module bundler that they uncover.
Checking integrity
Write a function that checks the integrity of the data structure returned by the transitive closure routine, i.e., that makes sure every cross-reference resolves correctly.
Logging module loading
-
Write a function called
logLoad
that takes a module name as an argument and prints a message usingconsole.error
saying that the module has been loaded. -
Modify the bundle generator to insert calls to this function to report when modules are actually loaded.
Tracing execution
Trace the execution of every function called
when the main
function in the full bundle is called.
Making bundles more readable
Modify the bundle creator to make its output more readable, e.g., by adding comments and indentation. (This does not matter to the computer, but can help debugging.)