The 13 Dwarves of Computing
Dense linear algebra (e.g. BLAS, ML Principal Component Analysis)
Data is stored in dense matrices or vectors and access is often via unit-level strides. Typical algorithm would be Cholesky decomposition for symmetric systems or Gaussian elimination for non-symmetric systems. In terms of mathematical operations, this involves the classic vector and matrix operations, traditionally divided into Level 1 (vector/vector), Level 2 (matrix/vector), and Level 3 (matrix/matrix) operations.
Applications are various:
- Linear algebra: LAPACK, ATLAS
- Clustering algorithms / Data-mining: StreamCluster, K-means
Normally implemented by loops, and in many cases easy to parallelize.
Sparse linear algebra (e.g. SpMV, ML Principal Component Analysis)
Data is stored in compressed format as it largely consists of zeros and is therefore accessed via an index-based load. Typical algorithm would be Conjugate Gradient or any of the Krylov methods. Requires multiplications involving matrices composed primarily of zeroes. Computations can be done much more efficiently by moving the non-zero elements around the diagonal of the matrix.
Applications:
- Finite element analysis
- Partial differential equations
Spectral methods (e.g. FFT)
Data is in frequency domain and requires a transform to convert to spatial/temporal domain. They are typified by, but not restricted to, FFT. Involves solving certain differential equations, often involving the use of the Fast Fourier Transform. Spectral methods can be used to solve ordinary differential equations (ODEs), partial differential equations (PDEs) and eigenvalue problems involving differential equations.
Applications:
- Fluid Dynamics
- Quantum Mechanics
- Weather Prediction
There are various FFT implementations for each hardware architecture.
N-body methods (e.g. Barnes-Hut)
Data consists of discrete particle bodies that interact with each other and/or the “environment”. An N-body simulation is a simulation of a dynamical system of particles, usually under the influence of physical forces, such as gravity. Computations can be done both ways (A influences B the same B influences A) and the state of the whole system is updated after each round. The basic algorithm is O(N^2). Optimizations for larger systems are possible by neighbour-administration and leaving far-away particles out of the computation. Run-time approach-selection is desirable.
Applications:
- Astronomy: cosmology (e.g. formation of galaxies)
- Computational Chemistry: Molecular Dynamics (e.g. protein folding), Molecular modeling
- Physics: Fluid Dynamics, Plasma Physics
Structured grids (e.g. Cactus)
Represented by a regular grid. Points on grid are conceptually updated together via equations linking them to other grids. There is high spatial locality. Updates may be in place or between two versions of the grid. In a structured or regular grid all the elements in the grid have the same dimensions. Think squares and blocks.
Applications:
- Image Processing: Gaussian image blurring
- Physics Simulations: transient thermal differential equation solver
- Finite Element Method
Unstructured grids (e.g. ABAQUS, FLUENT)
These include all grids that are not regular grids. Different elements have different number of neighbors. Data is stored in terms of the locality and connectivity to other data. Points on the grid are conceptually updated together, but updates require multiple levels of redirection. This algorithmic class has a lot of overlap with backtracking.
Applications:
- Computational fluid dynamics
- Belief propagation
Map reduce methods (e.g. Monte Carlo)
These are embarrassingly parallel problems, such as Monte Carlo methods, where calculations are independent of each other, so nearly no communication is required between processes. In essence we are talking about a single function that executes in parallel on independent data sets, with outputs that are eventually combined to form a single or small number of results. In the case of huge data-sets and compute-intensive algorithms, GPUs can be used in combination with big data solutions like Hadoop to attack this set of problems.
Applications:
- Monte-Carlo: computation of pi, portfolio analysis, collision simulation, sequence alignment
- Distributed searching
As communication between the processes is minimal, some of these problems can see tremendous speedups when implemented on GPUs.
Combinational logic (e.g. encryption, ML hashing)
These algorithms generally involve performing simple operations on very large amounts of data often exploiting bit-level parallelism. For example, computing Cyclic Redundancy Codes (CRC) is critical to ensure integrity and RSA encryption for data security.
Applications:
- Computing checksums, CRCs
- Encryption & decryption
- Hashing
- Hamming weight
Graph traversal (e.g. Quicksort, ML decision trees)
Graph traversal is the problem of visiting all the nodes in a graph in a particular manner, updating and/or checking their values along the way. Tree traversal is a special case of graph traversal. It typically involves indirect table lookups and little computation.
Applications:
- Search: depth-first search, breadth-first search, finding all nodes within one connected component
- Sorting: quick-sort
- Serialization/deserialization
- Maze generation
- Collision detection
Dynamic programming
This is an algorithmic technique that computes solutions by solving simpler overlapping subproblems. Many dynamic programming problems operate by filling in a grid that represents the solution space, where one location in the grid holds the final answer.
Applications:
- Graph problems: Floyd’s AllPairs, shortest path, Bellman-Ford algorithm
- Sequence alignment: Needleman-Wunsch, Smith-Waterman
As the word “dynamic” implies, better performance is achieved when tuned during run-time.
Backtrack and branch+bound
These involve solving various search and global optimization problems for intractably large spaces. Branch and bound algorithms work on the divide and conquer principle: the search space is subdivided into smaller subregions (“branching”), and bounds are found on all the solutions contained in each subregion under consideration. Finds an optimal solution by recursively dividing the feasible region into subdomains, and then pruning subproblems that are suboptimal.
Put another way, this consists of building up all possible solutions and eliminating invalid ones (most often in one step), as there is no overview of all possible solutions at the outset. It is effectively a depth-first search of a problem space and therefore a special case of dynamic programming, but with a strong accent on the step-wise creation of the solution-space.
Applications:
- Puzzles: N-queens, crosswords, verbal arithmetic, Sudoku, Peg Solitaire.
- Traveling salesman
- Knapsack, subset sum, and partition problems
- Integer Linear Programming
- Boolean Satisfiability
- Combinatorial Optimisation
Probabilistic graphical models (e.g. Neural networks, Bayesian networks)
A graph that combines uncertainty (probabilities) and logical structure (independence constraints) to compactly represent complex, real-world phenomena. Applications involve graphs that represent random variables as nodes and conditional dependencies as edges.
Applications:
- Bayesian networks: belief networks, probabilistic networks, causal network, knowledge maps
- Hidden Markov models
- Neural networks
Finite state machine
A mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states.
Applications:
- Video Decoding, Parsing, Compression
- Data Mining
- Find Repeating Patterns