Glossary

absolute error
The absolute value of the difference between the observed and the correct value. Absolute error is usually less useful than relative error.
absolute path
A path that points to the same location in the filesystem regardless of where it is evaluated. An absolute path is the equivalent of latitude and longitude in geography.
abstract method
In object-oriented programming, a method that is defined but not implemented. Programmers will define an abstract method in a parent class to specify operations that child classes must provide.
abstract syntax tree (AST)
A deeply nested data structure, or tree, that represents the structure of a program. For example, the AST might have a node representing a while loop with one child representing the loop condition and another representing the loop body.
accumulator
A variable that collects and/or combines many values. For example, if a program sums the values in an array by adding them all to a variable called result, then result is the accumulator.
actual result (of test)
The value generated by running code in a test. If this matches the expected result, the test passes; if the two are different, the test fails.
alias
A second or subsequent reference to the same object. Aliases are useful, but increase the cognitive load on readers who have to remember that all these names refer to the same thing.
anonymous function
A function that has not been assigned a name. Anonymous functions are usually quite short, and are usually defined where they are used, e.g., as callbacks. In Python, these are called lambda functions and are created through use of the lambda reserved word.
Application Binary Interface (ABI)
The low-level layout that a piece of software must have to work on a particular kind of machine.
argument
The term should not be confused with, and is not a synonym for, parameter. An argument is one of possibly several expressions that are passed to a function. It is the actual value that is passed. Parameters and arguments are distinct, but related concepts. Parameters are variables and arguments are the values assigned to those variables.
ASCII
A standard way to represent the characters commonly used in the Western European languages as 7- or 8-bit integers, now superceded by Unicode.
assembler
A compiler that translates software written in assembly code into machine instructions.
assembly code
A low-level programming language whose statements correspond closely to the actual instruction set of a particular kind of processor.
assertion
A Boolean expression that must be true at a certain point in a program. Assertions may be built into the language (e.g., Python's assert statement) or provided as functions (e.g., R's stopifnot). They are often used in testing, but are also put in production code to check that it is behaving correctly. In many languages, assertions should not be used to perform data-validation as they may be silently dropped by compilers and interpreters under optimization conditions. Using assertions for data validation can therefore introduce security risks. Unlike many languages, R does not have an assert statement which can be disabled, and so use of package such as assertr for data validation does not create security holes.
associative array
See dictionary.
asynchronous
Not happening at the same time. In programming, an asynchronous operation is one that runs independently of another, or that starts at one time and ends at another.
attribute
A name-value pair associated with an object, used to store metadata about the object such as an array's dimensions.
automatic variable
A variable that is automatically given a value in a build rule. For example, Make automatically assigns the name of a rule's target to the automatic variable $@. Automatic variables are frequently used when writing pattern rules.
backward-compatible
A property of a system, hardware or software, that allows for interoperability with an older legacy system, or with input designed for such a system. For example a function written in Python 3 that can be run successfully with Python version 2 is backward-compatible.
bare object
An object that isn't an instance of any particular class.
base class
In object-oriented programming, a class from which other classes are derived.
binary
A system which can have one of two possible states. In computing often represented as being in the state 0 or 1. Represented in Boolean logic as false (0) or true (1). Computers are built upon systems which store 0s and 1s as bits.
bit
A unit of information representing representing alternatives, yes/no, true/false. In computing a state of either 0 or 1.
bitwise operation
An operation that manipulates individual bits in memory. Common bitwise operations include and, or, not, and xor.
block comment
A comment that spans multiple lines. Block comments may be marked with special start and end symbols, like /* and */ in C and its descendents, or each line may be prefixed with a marker like #.
Boolean
Relating to a variable or data type that can have either a logical value of true or false. Named for George Boole, a 19th century mathematician. Binary systems, like all computers, are built on this foundation of systems of logical evaluations between states of true and false, 1 or 0.
breadth first
To go through a nested data structure such as a tree by exploring all of one level, then going on to the next level and so on, or to explore a problem by examining the first step of each possible solution, and then trying the next step for each.
breakpoint
An instruction to a debugger telling it to suspend execution whenever a specific point in the program (such as a particular line) is reached.
bug
A missing or undesirable feature of a piece of software; the digital equivalent of a weed.
bug report
A collection of files, logs, or related information that describes either an unexpected output of some code or program, or an unexpected error or warning. This information is used to help find and fix a bug in the program or code.
build manager
A program that keeps track of how files depend on one another and runs commands to update any files that are out-of-date. Build managers were invented to compile only those parts of programs that had changed, but are now often used to implement workflows in which plots depend on results files, which in turn depend on raw data files or configuration files.
build recipe
The part of a build rule that describes how to update something that has fallen out-of-date.
build rule
A specification for a build manager that describes how some files depend on others and what to do if those files are out-of-date.
build target
The file(s) that a build rule will update if they are out-of-date compared to their dependencies.
byte code
A set of instructions designed to be executed efficiently by an interpreter.
cache
Something that stores copies of data so that future requests for it can be satisfied more quickly. The CPU in a computer uses a hardware cache to hold recently-accessed values; many programs rely on a software cache to reduce network traffic and latency. Figuring out when something in a cache is out-of-date and should be replaced is one of the two hard problems in computer science.
caching
To save a copy of some data in a local cache to make future access faster.
call stack
A data structure that stores information about the active subroutines executed.
callback function
A function A that is passed to another function B so that B can call it at some later point. Callbacks can be used synchronously, as in generic functions like map that invoke a callback function once for each element in a collection, or asynchronously, as in a client that runs a callback when a response is received in answer to a request.
Cascading Style Sheets (CSS)
A way to control the appearance of HTML. CSS is typically used to specify fonts, colors, and layout.
catch (an exception)
To accept responsibility for handling an error or other unexpected event. R prefers "handling a condition" to "catching an exception". Python, on the other hand, encourages raising and catching exceptions, and in some situations, requires it.
Chain of Responsibility pattern
A design pattern in which each object either handles a request or passes it on to another object.
character encoding
A specification of how characters are stored as bytes. The most commonly-used encoding today is UTF-8.
child (in a tree)
A node in a tree that is below another node (call the parent).
child class
In object-oriented programming, a class derived from another class (called the parent class).
circular dependency
A situation in which X depends on Y and Y depends on X, either directly or indirectly. If there is a circular dependency, then the dependency graph is not acyclic.
class
In object-oriented programming, a structure that combines data and operations (called methods). The program then uses a constructor to create an object with those properties and methods. Programmers generally put generic or reusable behavior in parent classes, and more detailed or specific behavior in child classes.
client
Typically, a program such as a web browser that gets data from a server and displays it to, or interacts with, users. The term is used more generally to refer to any program A that makes requests of another program B. A single program can be both a client and a server.
closure
A set of variables defined in the same scope whose existence has been preserved after that scope has ended.
code coverage (in testing)
How much of a library or program is executed when tests run. This is normally reported as a percentage of lines of code: for example, if 40 out of 50 lines in a file are run during testing, those tests have 80% code coverage.
cognitive load
The amount of working memory needed to accomplish a set of simultaneous tasks.
collision
A situation in which a program tries to store two items in the same location in memory. For example, a collision occurs when a hash function generates the same hash code for two different items.
column-major storage
Storing each column of a two-dimensional array as one block of memory so that elements in the same row are far apart.
combinatorial explosion
The exponential growth in the size of a problem or the time required to solve it that arises when all possible combinations of a set of items must be searched.
comma-separated values (CSV)
A text format for tabular data in which each record is one row and fields are separated by commas. There are many minor variations, particularly around quoting of strings.
command-line argument
A filename or control flag given to a command-line program when it is run.
command-line interface (CLI)
A user interface that relies solely on text for commands and output, typically running in a shell.
comment
Text written in a script that is not treated as code to be run, but rather as text that describes what the code is doing. These are usually short notes, often beginning with a # (in many programming languages).
compile
To translate textual source into another form. Programs in compiled languages are translated into machine instructions for a computer to run, and Markdown is usually translated into HTML for display.
compiled language
Originally, a language such as C or Fortran that is translated into machine instructions for execution. Languages such as Java are also compiled before execution, but into byte code instead of machine instructions, while interpreted languages like Python are compiled to byte code on the fly.
compiler
An application that translates programs written in some languages into machine instructions or byte code.
confirmation bias
The tendency for someone to look for evidence that they are right rather than searching for reasons why they might be wrong.
console
A computer terminal where a user may enter commands, or a program, such as a shell that simulates such a device.
constructor
A function that creates an object of a particular class. In the S3 object system, constructors are a convention rather than a requirement.
Coordinated Universal Time (UTC)
The standard time against which all others are defined. UTC is the time at longitude 0°, and is not adjusted for daylight savings. Timestamps are often reported in UTC so that they will be the same no matter what timezone the computer is in.
corner case
Another name for an edge case.
coupling
The degree of interaction between two classes, modules, or other software components. If a system's components are loosely coupled, changes to one are unlikely to affect others. If they are tightly coupled, then any change requires other changes elsewhere, which complicates maintenance and evolution.
cryptographic hash function
A hash function that produces an apparently-random value for any input.
current working directory
The folder or directory location in which the program operates. Any action taken by the program occurs relative to this directory.
cycle (in a graph)
A set of links in a graph that leads from a node back to itself.
data frame
A two-dimensional data structure for storing tabular data in memory. Rows represent records and columns represent variables.
data migration
Moving data from one location or format to another. The term refers to translating data from an old format to a newer one.
Decorator pattern
A design pattern in which a function adds additional features to another function or a class after its initial definition. Decorators are a feature of Python and can be implemented in most other languages as well.
defensive programming
A set of programming practices that assumes mistakes will happen and either reports or corrects them, such as inserting assertions to report situations that are not ever supposed to occur.
dependency
See prerequisite.
dependency graph
A directed graph showing how things depend on one another, such as the files to be updated by a build manager. If the dependency graph is not acyclic, the dependencies cannot be resolved.
deprecation
To indicate that while a function, method, or class exists, its use is no longer recommended (for example, because it is going to be phased out in a future release).
depth first
To go through a nested data structure such as a tree by going as far as possible down one path, then as far as possible down the next and so on, or to explore a problem by following one solution to its conclusion and then trying the next.
A search algorithm that explores one possibility all the way to its conclusion before moving on to the next.
derived class
In object-oriented programming, a class that is a direct or indirect extension of a base class.
design pattern
A recurring pattern in software design that is specific enough to be worth naming, but not so specific that a single best implementation can be provided by a library. For example, data frames and database tables are instances of the same pattern.
destructuring assignment
Unpacking values from data structures and assigning them to multiple variables in a single statement.
dictionary
A data structure that allows items to be looked up by value, sometimes called an associative array. Dictionaries are often implemented using hash tables.
directed acyclic graph (DAG)
A directed graph which does not contain any loops (i.e., it is not possible to reach a node from itself by following edges).
directed graph
A graph whose edges have directions.
directory
A structure in a filesystem that contains references to other structures, such as files and other directories.
disassembler
A program that translates machine instructions into assembly code or some other higher-level language.
doc comment
A documentation comment ("doc comment" for short) is a specially-formatted comment containing documentation about a piece of code that is embedded in the code itself.
Document Object Model (DOM)
A standard, in-memory representation of HTML and XML. Each element is stored as a node in a tree with a set of named attributes; contained elements are child nodes. Modern programming languages provide many libraries for searching and modifying the DOM.
DOM selector
A pattern that identifies nodes in a DOM tree. For example, #alpha matches nodes whose id attribute is "alpha", while .beta matches nodes whose class attribute is "beta".
driver
A program that runs other programs, or a function that "drives" all of the other functions in a program.
dynamic loading
To import a module into the memory of a program while it is already running. Most interpreted languages use dynamic loading, and provide tools so that programs can find and load modules dynamically to configure themselves.
dynamic lookup
To find a function or a property of an object by name while a program is running. For example, instead of getting a specific property of an object using obj.name, a program might use obj[someVariable], where someVariable could hold "name" or some other property name.
dynamic scoping
To find the value of a variable by looking at what is on the call stack at the moment the lookup is done. Almost all programming languages use lexical_scoping instead, since it is more predictable.
eager matching
Matching as much as possible, as early as possible.
easy mode
A term borrowed from gaming meaning to do something with obstacles or difficulties simplified or removed, often for practice purposes.
edge
A connection between two nodes in a graph. An edge may have data associated with it, such as a name or distance.
edge case
A problem that only comes up under unusual circumstances or when a system is pushed to its limits; also sometimes called a corner case. Programs intended for widespread use have to handle edge cases, but doing so can make them much more complicated.
element
A named component in an HTML or XML document. Elements are usually written <name>...</name>, where "..." represents the content of the element. Elements often have attributes.
encapsulate
To store data inside some kind of structure so that it is only accessible through that structure.
entry point
Where a program begins executing.
environment
A structure that stores a set of variable names and the values they refer to.
error (in a test)
Signalled when something goes wrong in a unit test itself rather than in the system being tested. In this case, we do not know anything about the correctness of the system.
error handling
What a program does to detect and correct for errors. Examples include printing a message and using a default configuration if the user-specified configuration cannot be found.
event loop
A mechanism for managing concurrent activities in a program. Tasks are represented as items in a queue; the event loop repeatedly takes an item from the front of the queue and runs it, adding any other tasks it generates to the back of the queue to run later.
exception
An object that stores information about an error or other unusual event in a program. One part of a program will create and raise an exception to signal that something unexpected has happened; another part will catch it.
exception handler
A piece of code that deals with an exception after it is caught, e.g., by writing a log message, retrying the operation that failed, or performing an alternate operation.
expected result (of test)
The value that a piece of software is supposed to produce when tested in a certain way, or the state in which it is supposed to leave the system.
exploratory programming
A software development methodology in which requirements emerge or change as the software is being written, often in response to results from early runs.
export
To make something visible outside a module so that other parts of a program can import it. In most languages a module must export things explicitly in an attempt to avoid name collision.
fail (a test)
A test fails if the actual result does not match the expected result.
false
The logical (Boolean) state opposite of "true". Used in logic and programming to represent a binary state of something.
feature (in software)
Some aspect of software that was deliberately designed or built. A bug is an undesired feature.
field
A component of a record containing a single value. Every record in a tibble or database table has the same fields.
filename extension
The last part of a filename, usually following the '.' symbol. Filename extensions are commonly used to indicate the type of content in the file, though there is no guarantee that this is correct.
filesystem
The part of the operating system that manages how files are stored and retrieved. Also used to refer to all of those files and directories or the specific way they are stored (as in "the Unix filesystem").
filter
As a verb, to choose a set of records (i.e., rows of a table) based on the values they contain. As a noun, a command-line program that reads lines of text from files or standard input, performs some operation on them (such as filtering), and writes to a file or stdout.
fixed-width (of strings)
A set of character strings that have the same length. Databases often used fixed-width strings to make storage and access more efficient; short strings are padded up to the required length and long strings are truncated.
fixture
The thing on which a test is run, such as the parameters to the function being tested or the file being processed.
folder
Another term for a directory.
for loop
A construct in a program that repeats one or more other statements (the loop body) once for each item in a sequence, such as each number in a range or each element of a list.
function
A code block which gathers a sequence of operations into a whole, preserving it for ongoing use by defining a set of tasks that takes zero or more required and optional arguments as inputs and returns expected outputs (return values), if any. Functions enable repeating these defined tasks with one command, known as a function call.
garbage collection
The process of identifying memory that has been allocated but is no longer in use and reclaiming it to be re-used.
generator function
A function whose state is automatically saved when it returns a value so that execution can be restarted from that point the next time it is called. One example of generator functions use is to produce streams of values that can be processed by for loops.
generic function
A collection of functions with similar purpose, each operating on a different class of data.
global variable
A variable defined outside any particular function or package namespace, which is therefore visible to all functions.
globbing
To specify a set of filenames using a simplified form of regular expressions, such as *.dat to mean "all files whose names end in .dat". The name is derived from "global".
graph
  1. A plot or a chart that displays data, or 2. a data structure in which nodes are connected to one another by edges.
greedy algorithm
An algorithm that consumes as much input as possible, as early as possible.
handler
A callback function responsible for handling some particular event, such as the user clicking on a button or new data being receiving from a file.
hash code
A value generated by a hash function. Good hash codes have the same properties as random numbers in order to reduce the frequency of collisions.
hash function
A function that turns arbitrary data into a bit array, or a key, of a fixed size. Hash functions are used to determine where data should be stored in a hash table.
hash table
A data structure that calculates a pseudo-random key (location) for each value passed to it and stores the value in that location. Hash tables enable fast lookup for arbitrary data. This occurs at the cost of extra memory because hash tables must always be larger than the amount of information they need to store, to avoid the possibility of data collisions, when the hash function returns the same key for two different values.
header file
In C and C++, a file that defines constants and function signatures but does not contain runnable code. Header files tell the including file what is defined in other files so that the compiler can generate correct code.
heterogeneous
Containing mixed data types. For example, in Python and R, a list can contain a mix of numbers, character strings, and values of other types.
heuristic
A rule or guideline that isn't guaranteed to produce the desired result, but usually does.
homogeneous
Containing a single data type. For example, a vector must be homogeneous: its values must all be numeric, logical, etc.
HTTP request
A message sent from a client to a server using the HTTP protocol asking for data. A request usually asks for a web page, image, or other data.
HTTP response
A reply sent from a server to a client using the HTTP protocol in response to a request. The response usually contains a web page, image, or data.
HyperText Markup Language (HTML)
The standard markup language used for web pages. HTML is represented in memory using DOM (Digital Object Model).
HyperText Transfer Protocol (HTTP)
The standard protocol for data transfer on the World-Wide Web. HTTP defines the format of requests and responses, the meanings of standard error codes, and other features.
idiomatic
To use a language in the same way as a fluent or native speaker. Programs are called idiomatic if they use the language the way that proficient programmers use it.
immediately-invoked function expression (IIFE)
A function that is invoked once at the point where it is defined. IIFEs are typically used to create a scope to hide some function or variable definitions.
immutable
Data that cannot be changed after being created. Immutable data is easier to think about, particularly if data structures are shared between several tasks, but may result in higher memory requirements.
import
To bring things from a module into a program for use. In most languages, a program can only import things that the module explicitly exports.
index (in a database)
An auxiliary data structure in a database used to speed up search for some entries. An index increases memory and disk requirements but reduces search time.
instance
An object of a particular class.
instruction pointer
A special register in a processor that stores the address of the next instruction to execute.
instruction set
The basic operations that a particular processor can execute directly.
interpreted language
A high-level language that is not executed directly by the computer, but instead is run by an interpreter that translates program instructions into machine commands on the fly.
interpreter
A program whose job it is to run programs written in a high-level interpreted language. Interpreters can run interactively, but may also execute commands saved in a file.
introspection
Having a program examine itself as it is running; common examples are to determine the specific class of a generic object or to get the fields of an object when they are not known in advance.
ISO date format
An international for formatting dates. While the full standard is complex, the most common form is YYYY-MM-DD, i.e., a four-digit year, a two-digit month, and a two-digit day, separated by hyphens.
issue
A bug report, feature request, or other to-do item associated with a project. Also called a ticket.
Iterator pattern
A design pattern in which a temporary object or generator function produces each value from a collection in turn for processing. This pattern hides the differences between different kinds of data structures so that everything can be processed using loops.
JavaScript Object Notation (JSON)
A way to represent data by combining basic values like numbers and character strings in lists and key/value structures. The acronym stands for "JavaScript Object Notation"; unlike better-defined standards like XML, it is unencumbered by a syntax for comments or ways to define a schema.
join
One of several operations that combine values from two tables.
key
  1. A field or combination of fields whose value(s) uniquely identify a record within a table or dataset. Keys are often used to select specific records and in joins. 2. Part of a key/value pair, used as a unique identifier in a data structure such as a dictionary.
label (address in memory)
A human-readable name given to a particular location in memory when writing programs in assembly code.
layout engine
A piece of software that decides where to place text, images, and other elements on a page.
lazy matching
Matching as little as possible while still finding a valid match.
Least Recently Used cache (LRU cache)
A cache that discards items that have not been used recently in order to limit memory requirements.
lexical scoping
To look up the value associated with a name according to the textual structure of a program. Most programming languages use lexical scoping instead of dynamic scoping because the latter is less predictable.
library
A reusable software package, also often called a module.
lifecycle
The steps that something is allowed or required to go through. The lifecycle of an object runs from its construction through the operations it can or must perform before it is destroyed; the lifecycle of an issue may be: "created", "assigned", "in progress", "ready for review", and "completed".
line comment
A comment in a program that spans part of a single line, as opposed to a block comment that may span multiple lines.
To combine separately compiled modules into a single runnable program.
linter
A program that checks for common problems in software, such as violations of indentation rules or variable naming conventions. The name comes from the first tool of its kind, called lint.
list
A vector that can contain values of many different (heterogeneous) types.
literal
A representation of a fixed value in a program, such as the digits 123 for the number 123 or the characters "abc" for the string containing those three letters.
loader
A function whose job is to read files containing runnable code into memory and make that code available to the calling program.
local variable
A variable defined inside a function which is only visible within that function.
log
A record of a program's execution containing messages written via a logging framework for later inspection.
log message
A single entry in a log of a program's execution. Log messages are usually highly structured so that data (such as the time or the severity) can be recovered later.
logging framework
A software library that manages internal reporting for programs.
loop body
The statement or statements executed by a loop.
loosely coupled
Components in a software system are said to be loosely coupled if they are relatively independent of one another, i.e., if any one of them can be changed or replaced without others having to be altered as well.
macro
Originally short for "macro-instruction", an instruction to translate some of the text into a program into other text before using it.
manifest
A list that specifies the precise versions of a complete set of libraries or other software components.
Markdown
A markup language with a simple syntax intended as a replacement for HTML. Markdown is often used for README files, and is the basis for R markdown.
markup language
A set of rules for annotating text to define its meaning or how it should be displayed. The markup is usually not displayed, but instead controls how the underlying text is interpreted or shown. Markdown and HTML are widely-used markup languages for web pages.
method
An implementation of a generic function that handles objects of a specific class.
method chaining
A style of object-oriented programming in which an object's methods return that object as their result so that another method can immediately be called, as in obj.a().b().c().
mock object
A simplified replacement for part of a program whose behavior is easy to control and predict. Mock objects are used in unit tests to simulate databases, web services, and other complex systems.
module
A reusable software package, also often called a library.
module bundler
A program that finds all the dependencies of a set of source files and combines them into a single loadable file.
multi-threaded
Capable of performing several operations simultaneously. Multi-threaded programs are usually more efficient than single-threaded ones, but also harder to understand and debug.
name collision
The ambiguity that arises when two or more things in a program that have the same name are active at the same time. Most languages use namespaces to prevent such collisions.
namespace
A collection of names in a program that exists in isolation from other namespaces. Each function, object, class, or module in a program typically has its own namespace so that references to "X" in one part of a program do not accidentally refer to something called "X" in another part of the program. Scope is a distinct, but related, concept.
nested function
A function that is defined inside another function.
node
An element of a graph that is connected to other nodes by edges. Nodes typically have data associated with them, such as names or weights.
non-blocking execution
To allow a program to continue running while an operation is in progress. For example, many systems support non-blocking execution for file I/O so that the program can continue doing work while it waits for data to be read from or written to the filesystem (which is typically much slower than the CPU).
object
In object-oriented programming, a structure that contains the data for a specific instance of a class. The operations the object is capable of are defined by the class's methods.
object-oriented programming (OOP)
A style of programming in which functions and data are bound together in objects that only interact with each other through well-defined interfaces.
observation
A value or property of a specific member of a population.
off-by-one error
A common error in programming in which the program refers to element i of a structure when it should refer to element i-1 or i+1, or processes N elements when it should process N-1 or N+1.
op code
The numerical code for a particular instruction that a processor can execute.
operating system
A program that provides a standard interface to whatever hardware it is running on. Theoretically, any program that only interacts with the operating system should run on any computer that operating system runs on.
package
A collection of code, data, and documentation that can be distributed and re-used. Also referred to in some languages as a library or module.
pad (a string)
To add extra characters to a string to make it a required length.
parameter
A variable specified in a function definition whose value is passed to the function when the function is called. Parameters and arguments are distinct, but related concepts. Parameters are variables and arguments are the values assigned to those variables.
parent (in a tree)
A node in a tree that is above another node (call a child). Every node in a tree except the root node has a single parent.
parent class
In object-oriented programming, the class from which a sub class (called the child class) is derived.
pass (a test)
A test passes if the actual result matches the expected result.
patch
A single file containing a set of changes to a set of files, separated by markers that indicate where each individual change should be applied.
path (in filesystem)
A string that specifies a location in a filesystem. In Unix, the directories in a path are joined using /.
pattern rule
A generic build rule that describes how to update any file whose name matches a pattern. Pattern rules often use automatic variables to represent the actual filenames.
pipe
To use the output of one computation as the input for the next, or the connection between the two computations responsible for the data transfer. Pipes were popularized by the Unix shell, and are now used in many different programming languages and systems.
pipe (in the Unix shell)
The | used to make the output of one command the input of the next.
plugin architecture
A style of application design in which the main program loads and runs small independent modules that do the bulk of the work.
precedence
The priority of an operation. For example, multiplication has a higher precedence than addition, so a+b*c is read as "the sum of a with the product of b and c".
prerequisite
Something that a build target depends on.
process
An operating system's representation of a running program. A process typically has some memory, the identity of the user who is running it, and a set of connections to open files.
production code
Software that is delivered to an end user. The term is used to distinguish such code from test code, deployment infrastructure, and everything else that programmers write along the way.
promise
A way to represent the result of a delayed or asynchronous computation. A promise is a placeholder for a value that will eventually be computed; any attempt to read the value before it is available blocks, while any such attempt after the computation finishes acts like a normal read.
promisification
In JavaScript, the act of wrapping a callback function in a promise for uniform asynchronous execution.
protocol
Any standard specifying how two pieces of software interact. A network protocol such as HTTP defines the messages that clients and servers exchange on the World-Wide Web; object-oriented programs often define protocols for interactions between objects of different classes.
prune
To remove branches and nodes from a tree, or to rule out partially-complete solutions when searching for an overall solution in order to reduce work.
pseudo-random number
A value generated in a repeatable way that resembles the true randomness of the universe well enough to fool observers.
pseudo-random number generator (PRNG)
A function that can generate pseudo-random numbers.
Python
A popular interpreted open-source programming language that relies on indentation to define control structure.
query selector
A pattern that specifies a set of DOM nodes. Query selectors are used in CSS to specify the elements that rules apply to, or by JavaScript programs to manipulate web pages.
R (programming language)
A popular open-source programming language used primarily for data science.
R Markdown
A dialect of Markdown that allows authors to mix prose and code (usually written in R) in a single document.
race condition
A situation in which a result depends on the order in which two or more concurrent operations are carried out.
raise (an exception)
To signal that something unexpected or unusual has happened in a program by creating an exception and handing it to the error-handling system, which then tries to find a point in the program that will catch it.
read-eval-print loop (REPL)
An interactive program that reads a command typed in by a user, executes it, prints the result, and then waits patiently for the next command. REPLs are often used to explore new ideas, or for debugging.
record
A group of related values that are stored together. A record may be represented as a tuple or as a row in a table; in the latter case, every record in the table has the same fields.
register
A small piece of memory (typically one word long) built into a processor that operations can refer to directly.
regular expression
A pattern for matching text, written as text itself. Regular expressions are sometimes called "regexp", "regex", or "RE", and are powerful tools for working with text.
relational database
A database that organizes information into tables, each of which has a fixed set of named fields (shown as columns) and a variable number of records (shown as rows).
relative error
The absolute value of the difference between the actual and correct value divided by the correct value. For example, if the actual value is 9 and the correct value is 10, the relative error is 0.1. Relative error is usually more useful than absolute error.
root (in a tree)
The node in a tree of which all other nodes are direct or indirect children, or equivalently the only node in the tree that has no parent.
row-major storage
Storing each row of a two-dimensional array as one block of memory so that elements in the same column are far apart.
runnable documentation
Statements about code that can be executed to check their correctness, such as assertions or type declarations.
S3
A framework for object-oriented programming in R.
sandbox
A testing environment that is separate from the production system, or an environment that is only allowed to perform a restricted set of operations for security reasons.
SAT solver
A library or application that determines whether there is an assignment of true and false to a set of Boolean variables that makes an expression true (i.e., that satisfies the expression).
scalar
A single value of a particular type, such as 1 or "a". Scalars exist in most languages, but do not really exist in R; in R, values that appear to be scalars are actually vectors of unit length.
schema
A specification of the format of a dataset, including the name, format, and content of each table.
scope
The portion of a program within which a definition can be seen and used. See closure, global variable, and local variable.
scoring function
A function that measures or estimates how good a solution to a problem is.
search path
The list of directories that a program searches to find something. For example, the Unix shell uses the search path stored in the PATH variable when trying to find a program whose name it has been given.
seed
A value used to initialize a pseudo-random number generator.
semantic versioning
A standard for identifying software releases. In the version identifier major.minor.patch, major changes when a new version of software is incompatible with old versions, minor changes when new features are added to an existing version, and patch changes when small bugs are fixed.
server
Typically, a program such as a database manager or web server that provides data to a client upon request.
shell
A command-line interface that allows a user to interact with the operating system, such as Bash (for Unix and MacOS) or PowerShell (for Windows).
shell variable
A variable set and used in the Unix shell. Commonly-used shell variables include HOME (the user's home directory) and PATH (their search path).
side effect
A change made by a function while it runs that is visible after the function finishes, such as modifying a global variable or writing to a file. Side effects make programs harder for people to understand, since the effects are not necessarily clear at the point in the program where the function is called.
signature
The set of parameters (with types or meaning) that characterize the calling interface of a function or set of functions. Two functions with the same signature can be called interchangeably.
single-threaded
A model of program execution in which only one thing can happen at a time. Single-threaded execution is easier for people to understand, but less efficient than multi-threaded execution.
singleton
A set with only one element, or a class with only one instance.
Singleton pattern
A design pattern that creates a singleton object to manage some resource or service, such as a database or cache. In object-oriented programming, the pattern is usually implemented by hiding the constructor of the class in some way so that it can only be called once.
slug
An abbreviated portion of a page's URL that uniquely identifies it. In the example https://www.mysite.com/category/post-name, the slug is post-name.
source map
A table used to translate a piece of code back to the lines in the original source.
sparse matrix
A matrix in which most of the values are zero (or some other value). Rather than storing many copies of the same values, programs will often use a special data structure that only stores the "interesting" values.
SQL
The language used for writing queries for a relational database. The term is an acronym for Structured Query Language.
stack frame
A section of the call stack that records details of a single call to a specific function.
stale (in build)
To be out-of-date compared to a prerequisite. A build manager's job is to find and update things that are stale.
standard input
A predefined communication channel for a process, typically used to read input from the keyboard or from the previous process in a pipe.
standard output
A predefined communication channel for a process, typically used to send output to the screen or to the next process in a pipe.
stream
A sequential flow of data, such as the bits arriving across a network connection or the bytes read from a file.
string
A block of text in a program. The term is short for "character string".
string interpolation
The process of inserting text corresponding to specified values into a string, usually to make output human-readable.
synchronous
To happen at the same time. In programming, synchronous operations are ones that have to run simultaneously, or complete at the same time.
tab completion
A technique implemented by most REPLs, shells, and programming editors that completes a command, variable name, filename, or other text when the TAB key is pressed.
table
A set of records in a relational database or observations in a data frame. Tables are usually displayed as rows (each of which represents one record or observation and columns (each of which represents a field or variable.
tagged data
A technique for storing data in a two-part structure, where one part identifies the type and the other part stores the bits making up the value.
Template Method pattern
A design pattern in which a parent class defines an overall sequence of operations by calling abstract methods that child classes must then implement. Each child class then behaves in the same general way, but implements the steps differently.
test harness
A program written to test some other program or set of functions, typically to measure their performance.
test runner
A program that finds and runs software tests and reports their results.
test-driven development (TDD)
A programming practice in which tests are written before a new feature is added or a bug is fixed in order to clarify the goal.
throw (exception)
Another term for raising an exception.
tibble
A modern replacement for R's data frame, which stores tabular data in columns and rows, defined and used in the tidyverse.
ticket
See issue.
Tidyverse
A collection of R packages for operating on tabular data in consistent ways.
tightly coupled
Components in a software system are said to be tightly coupled if they depend on each other's internals, so that if one is altered then others have to be altered as well.
Time of check/time of use (ToCToU)
A race condition in which a process checks the state of something and then operates on it, but some other process might alter that state between the check and the operation.
timestamp
A digital identifier showing the time at which something was created or accessed. Timestamps should use ISO date format for portability.
token
An indivisible unit of text for a parser, such as a variable name or a number. Exactly what constitutes a token depends on the language.
topological order
Any ordering of the nodes in a graph that respects the direction of its edges, i.e., if there is an edge from node A to node B, A comes before B in the ordering. There may be many topological orderings of a particular graph.
transitive closure
The set of all nodes in a graph that are reachable from a starting node, either directly or indirectly.
tree
A graph in which every node except the root has exactly one parent.
true
The logical (Boolean) state opposite of "false". Used in logic and programming to represent a binary state of something.
tuple
A data type that has a fixed number of parts, such as the three color components of a red-green-blue color specification. Tuples are immutable (their values can not be reset.)
two hard problems in computer science
Refers to a quote by Phil Karlton—"There are only two hard problems in computer science—cache invalidation and naming things." Many variations add a third problem (most often "off-by-one errors").
type declaration
A statement in a program that a variable or value has a particular data type. Languages like Java require type declarations for all variables; they are optional in TypeScript and Python, and not allowed in pure JavaScript.
Unicode
A standard that defines numeric codes for many thousands of characters and symbols. Unicode does not define how those numbers are stored; that is done by standards like UTF-8.
unit test
A test that exercises one function or feature of a piece of software and produces pass, fail, or error.
UTF-8
A way to store the numeric codes representing Unicode characters in memory that is backward-compatible with the older ASCII standard.
variable (data)
Some attribute of a population that can be measured or observed.
variable (program)
A name in a program that has some data associated with it. A variable's value can be changed after definition.
vector
A sequence of values, usually of homogeneous type. Vectors are the fundamental data structure in R; a scalar is just a vector with exactly one element.
version control system
A system for managing changes made to software during its development.
virtual machine
A program that pretends to be a computer. This may seem a bit redundant, but VMs are quick to create and start up, and changes made inside the virtual machine are contained within that VM so we can install new packages or run a completely different operating system without affecting the underlying computer.
Visitor pattern
A design pattern in which the operation to be done is taken to each element of a data structure in turn. It is usually implemented by having a generator "visitor" that knows how to reach the structure's elements, which is given a function or method to call for each in turn, and that carries out the specific operation.
walk (a tree)
To visit each node in a tree in some order, typically depth-first or breadth-first.
watchpoint
An instruction for a debugger telling it to suspect execution whenever the value of a variable (or more generally an expression) changes.
well formed
A piece of text that obeys the rules of a formal grammar is said to be well formed.
while loop
A statement in a program that repeats one or more other statements (the loop body) as long as a condition is true.
word (of memory)
The unit of memory that a particular processor most naturally works with. While a byte is a fixed size (8 bits), a word may be 16, 32, or 64 bits long depending on the processor.
XML
A set of rules for defining HTML-like tags and using them to format documents (typically data). XML was popular in the early 2000s, but its complexity led many programmers to adopt JSON, instead.
YAML
Short for "YAML Ain't Markup Language", a way to represent nested data using indentation rather than the parentheses and commas of JSON. YAML is often used in configuration files and to define parameters for various flavors of Markdown documents.