File Backup
Archiving files with directory structure
Terms defined:
Now that we can test software,
the next step is to be able to save it.
A
-
records which versions of which files existed at the same time (so that we can go back to a consistent previous state), and
-
stores any particular version of a file only once, so that we don't waste disk space.
In this chapter we will build a simple tool for doing both tasks.
How can we uniquely identify files?
To avoid storing redundant copies of files,
we need a way to tell when two files contain the same data.
We could compare the files byte by byte,
but a quicker way is to use a

A hash function always produces the same
-
The hash codes it produces look like random numbers: they are evenly distributed (i.e., the odds of getting any specific output value are exactly the same).
-
The hash code depends on the entire input: changing even a single byte results in a different hash code.
It's easy to write a bad hash function,
but very hard to write one that qualifies as cryptographic.
We will therefore use a library to calculate 160-bit [SHA-1][sha_1] hashes for our files.
These are not random enough to keep data secret from a patient, well-funded attacker,
but that's not what we're using them for:
we just want hashes that are random to make
The Birthday Problem
The odds that two people share a birthday are 1/365 (ignoring February 29). The odds that they don't are therefore 364/365. When we add a third person, the odds that they don't share a birthday with either of the preceding two people are 363/365, so the overall odds that nobody shares a birthday are (365/365)×(364/365)×(363/365). If we keep calculating, there's a 50% chance of two people sharing a birthday in a group of just 23 people, and a 99.9% chance with 70 people.
We can use the same math to calculate how many files we need to hash before there's a 50% chance of a collision. Instead of 365, we use 2160 (the number of values that are 160 bits long), and quickly get into "if every atom in the universe was a file there still wouldn't be collisions" territory.
Node's crypto
module provides the functions we need to create a SHA-1 hash.
To do this,
we create an object that keeps track of the current state of the hashing calculations,
tell it how we want to encode (or represent) the hash value,
and then feed it some bytes.
When we are done,
we call its .end
method
and then use its .read
method to get the final result:
import crypto from 'crypto'
// create a SHA1 hasher
const hash = crypto.createHash('sha1')
// encode as hex (rather than binary)
hash.setEncoding('hex')
// send it some text
const text = process.argv[2]
hash.write(text)
// signal end of text
hash.end()
// display the result
const sha1sum = hash.read()
console.log(`SHA1 of "${text}" is ${sha1sum}`)
node hash-text.js something
SHA1 of "something" is 1af17e73721dbe0c40011b82ed4bb1a7dbe3ce29
Given this, hashing a file is straightforward: we just read the file and pass its contents to the hashing object:
import fs from 'fs'
import crypto from 'crypto'
const filename = process.argv[2]
const data = fs.readFileSync(filename, 'utf-8')
const hash = crypto.createHash('sha1').setEncoding('hex')
hash.write(data)
hash.end()
const sha1sum = hash.read()
console.log(`SHA1 of "${filename}" is ${sha1sum}`)
node hash-file.js hash-file.js
SHA1 of "hash-file.js" is c54c8ee3e576770d29ae2d0d73568e5a5c49eac0
However,
it is more efficient to process the file as a
import fs from 'fs'
import crypto from 'crypto'
const filename = process.argv[2]
const hash = crypto.createHash('sha1').setEncoding('hex')
fs.createReadStream(filename).pipe(hash)
hash.on('finish', () => {
const final = hash.read()
console.log('final', final)
})
console.log('program ends')
node hash-stream.js hash-stream.js
program ends
final dc9e6c231e243860dace2dbf52845b121062b60e
Many libraries rely on streams
so that programs don't have to read entire (possibly large) files into memory.
To start,
this program asks the fs
library to create a reading stream for a file
and to

How can we back up files?
Many files don't change after they're created, or only change very slowly.
It would be wasteful for a version control system to copy them all
each time the user wanted to save a snapshot of a project,
so instead our tool will copy each unique file to something like abcd1234.bck
,
where abcd1234
is a hash of the file's contents.
It will then store a data structure that records the filenames and hash keys for each snapshot.
The hash keys tell it which unique files are part of the snapshot,
while the filenames tell us what each file's contents were called when the snapshot was made
(since files can be moved or renamed).
To restore a particular snapshot,
all we have to do is copy the saved .bck
files back to where they were
(

We can build the tools we need to do this uses promises (glob
to find files
and then:
-
checks that entries in the list are actually files;
-
reads each file into memory; and
-
calculates hashes for those files.
import fs from 'fs-extra-promise'
import glob from 'glob-promise'
import crypto from 'crypto'
const hashExisting = (rootDir) => {
const pattern = `${rootDir}/**/*`
return new Promise((resolve, reject) => {
glob(pattern, {})
.then(matches => Promise.all(
matches.map(path => statPath(path))))
.then(pairs => pairs.filter(
([path, stat]) => stat.isFile()))
.then(pairs => Promise.all(
pairs.map(([path, stat]) => readPath(path))))
.then(pairs => Promise.all(
pairs.map(([path, content]) => hashPath(path, content))))
.then(pairs => resolve(pairs))
.catch(err => reject(err))
})
}
...
export default hashExisting
Notice that this function uses Promise.all
to wait for the operations on all of the files in the list to complete
before going on to the next step.
A different design would combine stat, read, and hash into a single step
so that each file would be handled independently
and there would be only one Promise.all
at the end to bring them all together.
The first two helper functions that hashExisting
relies on
wrap asynchronouss operation in promises:
// <async>
const statPath = (path) => {
return new Promise((resolve, reject) => {
fs.statAsync(path)
.then(stat => resolve([path, stat]))
.catch(err => reject(err))
})
}
const readPath = (path) => {
return new Promise((resolve, reject) => {
fs.readFileAsync(path, 'utf-8')
.then(content => resolve([path, content]))
.catch(err => reject(err))
})
}
// </async>
// <hashPath>
const hashPath = (path, content) => {
const hasher = crypto.createHash('sha1').setEncoding('hex')
hasher.write(content)
hasher.end()
return [path, hasher.read()]
}
// </hashPath>
The final helper function calculates the hash synchronously,
but we can use Promise.all
to wait on those operations finishing anyway:
import fs from 'fs-extra-promise'
import glob from 'glob-promise'
import crypto from 'crypto'
const hashExisting = (rootDir) => {
const pattern = `${rootDir}/**/*`
return new Promise((resolve, reject) => {
glob(pattern, {})
.then(matches => Promise.all(
matches.map(path => statPath(path))))
.then(pairs => pairs.filter(
([path, stat]) => stat.isFile()))
.then(pairs => Promise.all(
pairs.map(([path, stat]) => readPath(path))))
.then(pairs => Promise.all(
pairs.map(([path, content]) => hashPath(path, content))))
.then(pairs => resolve(pairs))
.catch(err => reject(err))
})
}
// <helpers>
// <async>
const statPath = (path) => {
return new Promise((resolve, reject) => {
fs.statAsync(path)
.then(stat => resolve([path, stat]))
.catch(err => reject(err))
})
}
const readPath = (path) => {
return new Promise((resolve, reject) => {
fs.readFileAsync(path, 'utf-8')
.then(content => resolve([path, content]))
.catch(err => reject(err))
})
}
// </async>
// <hashPath>
const hashPath = (path, content) => {
const hasher = crypto.createHash('sha1').setEncoding('hex')
hasher.write(content)
hasher.end()
return [path, hasher.read()]
}
// </hashPath>
// </helpers>
export default hashExisting
Let's try running it:
import hashExisting from './hash-existing-promise.js'
const root = process.argv[2]
hashExisting(root).then(pairs => pairs.forEach(
([path, hash]) => console.log(path, hash)
))
node run-hash-existing-promise.js . | fgrep -v test/ | fgrep -v '~'
./backup.js 11422489e11be3d8ff76278503457665f6152ebe
./check-existing-files.js 66b933cf9e792e9a9204171d04e0f8b530ec3f4f
./hash-existing-async.js 38518660463a80367f9719a870916eeab2933a8d
./hash-existing-promise.js 2b022a50294697eef84689b63ab46f0b899d14db
./hash-file.js c54c8ee3e576770d29ae2d0d73568e5a5c49eac0
./hash-file.out 7d5014143474e733daf263a94d4d43df30ba91cf
./hash-file.sh ff2f263c71ce98338adf4a8a65c297c9361e3e16
./hash-stream.js dc9e6c231e243860dace2dbf52845b121062b60e
./hash-stream.out af7195be4585e588f301cbfbddd5a47966cb704c
./hash-stream.sh 0dd6064f3005e832fef580f3de7fa0980fd81dee
...
./x-from-to/problem.md 47bb0e1ef733bc2b946f23bb03742272b34f7bf9
./x-from-to/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-io-streams/problem.md 298897364632cd64c8a50c0a08aa3e235335c0d4
./x-io-streams/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-json-manifests/problem.md 83ad4f0d55f420dbe694b72a26a89b9b4d92bea6
./x-json-manifests/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-mock-hashes/problem.md c1e815e888846fddd0e3d28640c0e20a8c665b25
./x-mock-hashes/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-pre-commit/problem.md b7d945af4554fc0f64b708fe735417bee8b33eef
./x-pre-commit/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
The code we have writen is clearer than it would be with callbacks---if you don't believe this,
try rewriting it---but the layer of promises around everything still obscures its meaning.
Here are the same operations written using async
and await
:
import fs from 'fs-extra-promise'
import glob from 'glob-promise'
import crypto from 'crypto'
const statPath = async (path) => {
const stat = await fs.statAsync(path)
return [path, stat]
}
const readPath = async (path) => {
const content = await fs.readFileAsync(path, 'utf-8')
return [path, content]
}
const hashPath = (path, content) => {
const hasher = crypto.createHash('sha1').setEncoding('hex')
hasher.write(content)
hasher.end()
return [path, hasher.read()]
}
const hashExisting = async (rootDir) => {
const pattern = `${rootDir}/**/*`
const options = {}
const matches = await glob(pattern, options)
const stats = await Promise.all(matches.map(path => statPath(path)))
const files = stats.filter(([path, stat]) => stat.isFile())
const contents = await Promise.all(
files.map(([path, stat]) => readPath(path)))
const hashes = contents.map(
([path, content]) => hashPath(path, content))
return hashes
}
export default hashExisting
This version creates and resolves exactly the same promises as the previous one, but those promises are created for us automatically by Node. To check that it works, let's run it for the same input files:
import hashExisting from './hash-existing-async.js'
const root = process.argv[2]
hashExisting(root).then(
pairs => pairs.forEach(([path, hash]) => console.log(path, hash)))
node run-hash-existing-async.js . | fgrep -v test/ | fgrep -v '~'
./backup.js 11422489e11be3d8ff76278503457665f6152ebe
./check-existing-files.js 66b933cf9e792e9a9204171d04e0f8b530ec3f4f
./hash-existing-async.js 38518660463a80367f9719a870916eeab2933a8d
./hash-existing-promise.js 2b022a50294697eef84689b63ab46f0b899d14db
./hash-file.js c54c8ee3e576770d29ae2d0d73568e5a5c49eac0
./hash-file.out 7d5014143474e733daf263a94d4d43df30ba91cf
./hash-file.sh ff2f263c71ce98338adf4a8a65c297c9361e3e16
./hash-stream.js dc9e6c231e243860dace2dbf52845b121062b60e
./hash-stream.out af7195be4585e588f301cbfbddd5a47966cb704c
./hash-stream.sh 0dd6064f3005e832fef580f3de7fa0980fd81dee
...
./x-from-to/problem.md 47bb0e1ef733bc2b946f23bb03742272b34f7bf9
./x-from-to/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-io-streams/problem.md 298897364632cd64c8a50c0a08aa3e235335c0d4
./x-io-streams/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-json-manifests/problem.md 83ad4f0d55f420dbe694b72a26a89b9b4d92bea6
./x-json-manifests/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-mock-hashes/problem.md c1e815e888846fddd0e3d28640c0e20a8c665b25
./x-mock-hashes/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
./x-pre-commit/problem.md b7d945af4554fc0f64b708fe735417bee8b33eef
./x-pre-commit/solution.md 412b87759362bb14d417e02efe7c3607274e10ca
How can we track which files have already been backed up?
The second part of our backup tool keeps track of which files have and haven't been backed up already.
It stores backups in a directory that contains backup files like abcd1234.bck
and files describing the contents of particular snapshots
(ssssssssss.csv
,
where ssssssssss
is the .csv
extension indicates that the file is formatted as

Time of check/time of use
Our naming convention for index files will fail if we try to create more than one backup per second. This might seem very unlikely, but many faults and security holes are the result of programmers assuming things weren't going to happen.
We could try to avoid this problem by using a two-part naming scheme ssssssss-a.csv
,
ssssssss-b.csv
, and so on,
but this leads to a
import glob from 'glob-promise'
import path from 'path'
const findNewFiles = async (rootDir, pathHashPairs) => {
const hashToPath = pathHashPairs.reduce((obj, [path, hash]) => {
obj[hash] = path
return obj
}, {})
const pattern = `${rootDir}/*.bck`
const options = {}
const existingFiles = await glob(pattern, options)
existingFiles.forEach(filename => {
const stripped = path.basename(filename).replace(/\.bck$/, '')
delete hashToPath[stripped]
})
return hashToPath
}
export default findNewFiles

To test our program, let's manually create testing directories with manufactured (shortened) hashes:
tree --charset unicode test
test
|-- bck-0-csv-0
|-- bck-1-csv-1
| |-- 0001.csv
| `-- abcd1234.bck
|-- bck-4-csv-2
| |-- 0001.csv
| |-- 3028.csv
| |-- 3456cdef.bck
| |-- abcd1234.bck
| `-- bcde2345.bck
|-- test-backup.js
|-- test-find-mock.js
`-- test-find.js
3 directories, 10 files
We use Mocha to manage our tests.
Every test is an async
function;
Mocha automatically waits for them all to complete before reporting results.
To run them,
we add the line:
"test": "mocha */test/test-*.js"
in the scripts
section of our project's package.json
file
so that when we run npm run test
,
Mocha looks for files in test
sub-directories of the directories holding our lessons.
Here are our first few tests:
import assert from 'assert'
import findNewFiles from '../check-existing-files.js'
describe('pre-existing hashes and actual filesystem', () => {
it('finds no pre-existing files when none given or exist', async () => {
const expected = {}
const actual = await findNewFiles('file-backup/test/bck-0-csv-0', [])
assert.deepStrictEqual(expected, actual,
'Expected no files')
})
it('finds some files when one file is given and none exist', async () => {
const check = [['somefile.txt', '9876fedc']]
const expected = { '9876fedc': 'somefile.txt' }
const actual = await findNewFiles('file-backup/test/bck-0-csv-0', check)
assert.deepStrictEqual(expected, actual,
'Expected one file')
})
it('finds nothing needs backup when there is a match', async () => {
const check = [['alpha.js', 'abcd1234']]
const expected = {}
const actual = await findNewFiles('file-backup/test/bck-1-csv-1', check)
assert.deepStrictEqual(expected, actual,
'Expected no files')
})
it('finds something needs backup when there is a mismatch', async () => {
const check = [['alpha.js', 'a1b2c3d4']]
const expected = { a1b2c3d4: 'alpha.js' }
const actual = await findNewFiles('file-backup/test/bck-1-csv-1', check)
assert.deepStrictEqual(expected, actual,
'Expected one file')
})
it('finds mixed matches', async () => {
const check = [
['matches.js', '3456cdef'],
['matches.txt', 'abcd1234'],
['mismatch.txt', '12345678']
]
const expected = { 12345678: 'mismatch.txt' }
const actual = await findNewFiles('file-backup/test/bck-4-csv-2', check)
assert.deepStrictEqual(expected, actual,
'Expected one file')
})
})
and here is Mocha's report:
> stjs@1.0.0 test /u/stjs
> mocha */test/test-*.js "-g" "pre-existing hashes"
checks for pre-existing hashes using mock filesystem
✓ finds no pre-existing files when none given or exist
✓ finds some files when one file is given and none exist
✓ finds nothing needs backup when there is a match
✓ finds something needs backup when there is a mismatch
✓ finds mixed matches
pre-existing hashes and actual filesystem
✓ finds no pre-existing files when none given or exist
✓ finds some files when one file is given and none exist
✓ finds nothing needs backup when there is a match
✓ finds something needs backup when there is a mismatch
✓ finds mixed matches
10 passing (28ms)
How can we test code that modifies files?
The final thing our tool needs to do is copy the files that need copying and create a new index file. The code itself will be relatively simple, but testing will be complicated by the fact that our tests will need to create directories and files before they run and then delete them afterward (so that they don't contaminate subsequent tests).
A better approach is to use a

Node's mock-fs
library provides the same functions as the fs
library,
but stores everything in memory.
This prevents our tests from accidentally disturbing the filesystem,
and also makes tests much faster
(since in-memory operations are thousands of times faster than operations that touch the actual filesystem).
We can create a mock filesystem by giving the library a JSON description of the files and what they should contain:
import assert from 'assert'
import mock from 'mock-fs'
import findNewFiles from '../check-existing-files.js'
describe('checks for pre-existing hashes using mock filesystem', () => {
beforeEach(() => {
mock({
'bck-0-csv-0': {},
'bck-1-csv-1': {
'0001.csv': 'alpha.js,abcd1234',
'abcd1234.bck': 'alpha.js content'
},
'bck-4-csv-2': {
'0001.csv': ['alpha.js,abcd1234',
'beta.txt,bcde2345'].join('\n'),
'3024.csv': ['alpha.js,abcd1234',
'gamma.png,3456cdef',
'subdir/renamed.txt,bcde2345'].join('\n'),
'3456cdef.bck': 'gamma.png content',
'abcd1234.bck': 'alpha content',
'bcde2345.bck': 'beta.txt became subdir/renamed.txt'
}
})
})
afterEach(() => {
mock.restore()
})
...
})
Mocha automatically calls beforeEach
before running each tests,
and afterEach
after each tests completes.
All of the tests stay exactly the same,
and since mock-fs
replaces the functions in the standard fs
library with its own,
nothing in our application needs to change either.
We are finally ready to write the program that actually backs up files:
import fs from 'fs-extra-promise'
import hashExisting from './hash-existing-async.js'
import findNewFiles from './check-existing-files.js'
const backup = async (src, dst, timestamp = null) => {
if (timestamp === null) {
timestamp = Math.round((new Date()).getTime() / 1000)
}
timestamp = String(timestamp).padStart(10, '0')
const existing = await hashExisting(src)
const needToCopy = await findNewFiles(dst, existing)
await copyFiles(dst, needToCopy)
await saveManifest(dst, timestamp, existing)
}
const copyFiles = async (dst, needToCopy) => {
const promises = Object.keys(needToCopy).map(hash => {
const srcPath = needToCopy[hash]
const dstPath = `${dst}/${hash}.bck`
fs.copyFileAsync(srcPath, dstPath)
})
return Promise.all(promises)
}
const saveManifest = async (dst, timestamp, pathHash) => {
pathHash = pathHash.sort()
const content = pathHash.map(
([path, hash]) => `${path},${hash}`).join('\n')
const manifest = `${dst}/${timestamp}.csv`
fs.writeFileAsync(manifest, content, 'utf-8')
}
export default backup
The tests for this are more complicated than tests we have written previously because we want to check with actual file hashes. Let's set up some fixtures to run tests on:
import backup from '../backup.js'
const hashString = (data) => {
const hasher = crypto.createHash('sha1').setEncoding('hex')
hasher.write(data)
hasher.end()
return hasher.read()
}
const Contents = {
aaa: 'AAA',
bbb: 'BBB',
ccc: 'CCC'
}
const Hashes = Object.keys(Contents).reduce((obj, key) => {
obj[key] = hashString(Contents[key])
return obj
}, {})
const Fixture = {
source: {
'alpha.txt': Contents.aaa,
'beta.txt': Contents.bbb,
gamma: {
'delta.txt': Contents.ccc
}
},
backup: {}
}
const InitialBackups = Object.keys(Hashes).reduce((set, filename) => {
set.add(`backup/${Hashes[filename]}.bck`)
return set
}, new Set())
and then run some tests:
describe('check entire backup process', () => {
beforeEach(() => {
mock(Fixture)
})
afterEach(() => {
mock.restore()
})
it('creates an initial CSV manifest', async () => {
await backup('source', 'backup', 0)
assert.strictEqual((await glob('backup/*')).length, 4,
'Expected 4 files')
const actualBackups = new Set(await glob('backup/*.bck'))
assert.deepStrictEqual(actualBackups, InitialBackups,
'Expected 3 backup files')
const actualManifests = await glob('backup/*.csv')
assert.deepStrictEqual(actualManifests, ['backup/0000000000.csv'],
'Expected one manifest')
})
it('does not duplicate files unnecessarily', async () => {
await backup('source', 'backup', 0)
assert.strictEqual((await glob('backup/*')).length, 4,
'Expected 4 files after first backup')
await backup('source', 'backup', 1)
assert.strictEqual((await glob('backup/*')).length, 5,
'Expected 5 files after second backup')
const actualBackups = new Set(await glob('backup/*.bck'))
assert.deepStrictEqual(actualBackups, InitialBackups,
'Expected 3 backup files after second backup')
const actualManifests = (await glob('backup/*.csv')).sort()
assert.deepStrictEqual(actualManifests,
['backup/0000000000.csv', 'backup/0000000001.csv'],
'Expected two manifests')
})
it('adds a file as needed', async () => {
await backup('source', 'backup', 0)
assert.strictEqual((await glob('backup/*')).length, 4,
'Expected 4 files after first backup')
await fs.writeFileAsync('source/newfile.txt', 'NNN')
const hashOfNewFile = hashString('NNN')
await backup('source', 'backup', 1)
assert.strictEqual((await glob('backup/*')).length, 6,
'Expected 6 files after second backup')
const expected = new Set(InitialBackups)
.add(`backup/${hashOfNewFile}.bck`)
const actualBackups = new Set(await glob('backup/*.bck'))
assert.deepStrictEqual(actualBackups, expected,
'Expected 4 backup files after second backup')
const actualManifests = (await glob('backup/*.csv')).sort()
assert.deepStrictEqual(actualManifests,
['backup/0000000000.csv', 'backup/0000000001.csv'],
'Expected two manifests')
})
})
OK, what did I break?
> stjs@1.0.0 test /u/stjs
> mocha */test/test-*.js "-g" "check entire backup process"
check entire backup process
1) creates an initial CSV manifest
2) does not duplicate files unnecessarily
3) adds a file as needed
0 passing (4s)
3 failing
1) check entire backup process
creates an initial CSV manifest:
Error: ENXIO: no such device or address, read
2) check entire backup process
does not duplicate files unnecessarily:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/u/stjs/file-backup/test/test-backup.js)
at listOnTimeout (internal/timers.js:554:17)
at processTimers (internal/timers.js:497:7)
3) check entire backup process
adds a file as needed:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/u/stjs/file-backup/test/test-backup.js)
at listOnTimeout (internal/timers.js:554:17)
at processTimers (internal/timers.js:497:7)
Exercises
Odds of collision
If hashes were only 2 bits long, then the chances of collision with each successive file assuming no previous collision are:
Number of Files | Odds of Collision |
---|---|
1 | 0% |
2 | 25% |
3 | 50% |
4 | 75% |
5 | 100% |
A colleague of yours says this means that if we hash four files, there's only a 75% chance of any collision occurring. What are the actual odds?
Streaming I/O
Write a small program using fs.createReadStream
and fs.createWriteStream
that copies a file piece by piece
instead of reading it into memory and then writing it out again.
Sequencing backups
Modify the backup program so that manifests are numbered sequentially
as 00000001.csv
, 00000002.csv
, and so on
rather than being timestamped.
Why doesn't this solve the time of check/time of use race condition mentioned earlier.
JSON manifests
-
Modify
backup.js
so that it can save JSON manifests as well as CSV manifests based on a command-line flag. -
Write another program called
migrate.js
that converts a set of manifests from CSV to JSON. (The program's name comes from the termdata migration .) -
Modify
backup.js
programs so that each manifest stores the user name of the person who created it along with file hashes, and then modifymigrate.js
to transform old files into the new format.
Testing line counting
Write tests for the line-counting functions of mock-fs
.
Did you find (at least) two bugs?
Mock hashes
- Modify the file backup program so that it uses a function called
ourHash
to hash files. - Create a replacement that returns some predictable value, such as the first few characters of the data.
- Rewrite the tests to use this function.
How did you modify the main program so that the tests could control which hashing function is used?
Comparing manifests
Write a program compare-manifests.js
that reads two manifest files and reports:
- Which files have the same names but different hashes (i.e., their contents have changed).
- Which files have the same hashes but different names (i.e., they have been renamed).
- Which files are in the first hash but neither their names nor their hashes are in the second (i.e., they have been deleted).
- Which files are in the second hash but neither their names nor their hashes are in the first (i.e., they have been added).
From one state to another
-
Write a program called
from-to.js
that takes the name of a directory and the name of a manifest file as its command-line arguments, then adds, removes, and/or renames files in the directory to restore the state described in the manifest. The program should only perform file operations when it needs to, e.g., it should not delete a file and re-add it if the contents have not changed. -
Write some tests for
from-to.js
using Mocha andmock-fs
.
File history
-
Write a program called
file-history.js
that takes the name of a file as a command-line argument and displays the history of that file by tracing it back in time through the available manifests. -
Write tests for your program using Mocha and
mock-fs
.
Pre-commit hooks
Modify backup.js
to load and run a function called preCommit
from a file called pre-commit.js
stored in the root directory of the files being backed up.
If preCommit
returns true
, the backup proceeds;
if it returns false
or throws an exception,
no backup is created.