## Introduction

We consider implementations of multivariate vector functions $$f : D^{n} \times D^{n'} \rightarrow D^m \times D^{m'} : (\mathbf{y}, \mathbf{y}') = f(\mathbf{x},\mathbf{x}')$$ as computer programs over some base data type $D$ (for example, single or higher precision floating-point data, intervals, convex/concave relaxations, vectors/ensembles). (We assume that the arithmetic inside $f$ is completely defined (through overloading of the elemental functions; see below.) The $n$ active (also: independent) and $n'$ passive inputs are mapped onto $m$ active (also: dependent) and $m'$ passive outputs. The given implementation is assumed to be $k$ times continuously differentiable at all points of interest implying the existence and finiteness of the Jacobian $$\nabla f = \nabla_\mathbf{x} f(\mathbf{x},\mathbf{x}') \equiv \frac{\partial \mathbf{y}}{\partial \mathbf{x}}(\mathbf{x},\mathbf{x}') \in D^{m \times n},$$ the Hessian $$\nabla^2 f = \nabla^2 f(\mathbf{x},\mathbf{x}') \equiv \frac{\partial^2 \mathbf{y}}{\partial \mathbf{x}^2}(\mathbf{x},\mathbf{x}') \in D^{m \times n \times n},$$ if $k\geq2,$ and of potentially higher derivative tensors $$\nabla^k f = \nabla^k f(\mathbf{x},\mathbf{x}') \equiv \frac{\partial^k \mathbf{y}}{\partial \mathbf{x}^k}(\mathbf{x},\mathbf{x}') \in D^{m \times \overset{k}{\underset{l=1}{\times}} n}.$$ We denote $$\nabla^k f=\left(\left [\nabla^k f \right ]_i^{j_1,\ldots,j_k} \right)_{i=0,\ldots,m-1}^{j_1,\ldots,j_k=0,\ldots,n-1},$$ and we use $*$ to denote the entire range of an index. For example, $\left [\nabla f \right ]_i^*$ denotes the $i$th row and $\left [\nabla f \right ]_*^j$ the $j$th column of the Jacobian, respectively. Algorithmic differentiation is implemented by overloading of elemental functions including the built-in functions and operators of C++ as well as user-defined higher-level elemental functions.

## 1 First-order Tangent Mode

### 1.1 Generic first-order scalar tangents

Generic first-order scalar tangent mode enables the computation of products of the Jacobian with vectors $\mathbf{x}^{(1)} \in D^n$ $$\mathbf{y}^{(1)}=\nabla f \cdot \mathbf{x}^{(1)} \in D^m$$ through provision of a generic first-order scalar tangent data type over arbitrary base data types $D$.

### 1.2 Generic first-order vector tangents

Generic first-order vector tangent mode enables the computation of products of the Jacobian with matrices $X^{(1)} \in D^{n \times l}$ $$Y^{(1)}=\nabla f \cdot X^{(1)} \in D^{m \times l}$$ through provision of a generic first-order vector tangent data type over arbitrary base data types $D.$

### 1.3 Preaccumulation through use of expression templates

Expression templates enable the generation of statically optimized gradient code at the level of individual assignments which can be beneficial for vector tangent mode.

### 2.1 Generic first-order scalar adjoints

Generic first-order scalar adjoint mode enables the computation of products of the transposed Jacobian with vectors $\mathbf{y}_{(1)} \in D^m$ $$\mathbf{x}_{(1)}=\nabla f^T \cdot \mathbf{y}_{(1)} \in D^n$$ through provision of a generic first-order scalar adjoint data type over arbitrary base data types $D.$

### 2.2 Generic first-order vector adjoints

Generic first-order vector adjoint mode enables the computation of products of the transposed Jacobian with matrices $Y_{(1)} \in D^{m \times l}$ $$X_{(1)}=\nabla f^T \cdot Y_{(1)} \in D^{n \times l}$$ through provision of a generic first-order scalar adjoint data type over arbitrary base data types $D.$

### 2.3 Global “blob” tape

Memory of the specified size is allocated and used for storing the tape without bound checks.

### 2.4 Global “chunk” tape

The tape grows in chunks up to the physical memory bound.

Multiple “blob” or “chunk” tapes can be allocated, for example, to implement thread-safe adjoints.

### 2.6 Preaccumulation through use of expression templates

Expression templates enable the generation of statically optimized gradient code at the level of individual assignments. This preaccumulation can result in a decrease in tape memory requirement.

### 2.7 Repeated evaluation of tape

Tapes can be recorded at a given point and interpreted repeatedly.

• checkpointing
• symbolic adjoints of numerical methods
• combinations of tangent and adjoint modes
• preaccumulation of local derivative tensors
• creation of collections of domain-specific higher-level elemental functions

### 2.9 User-defined local Jacobians interface

Externally preaccumulated partial derivatives can be inserted directly into the tape for use within subsequent interpretations.

## 3 Second- and Higher-order Tangent Mode

### 3.1 Generic second-order scalar tangents

Instantiation of the generic first-order scalar tangent data type with the generic first-order scalar tangent data type over a non-derivative base data type yields the second-order scalar tangent data type. It enables the computation of scalar projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$ as $$\mathbf{y}^{(1,2)}=\langle \nabla^2 f,\mathbf{x}^{(1)},\mathbf{x}^{(2)}\rangle \equiv \left(\mathbf{x}^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot \mathbf{x}^{(2)} \right)_{i=0,\ldots,m-1} \in D^m$$ for $\mathbf{x}^{(1)}, \mathbf{x}^{(2)} \in D^n$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ becomes $O(n^2) \cdot \text{Cost}(f)$ with both $\mathbf{x}^{(1)}$ and $\mathbf{x}^{(2)}$ ranging independently over the Cartesian basis vectors in $D^n.$

### 3.2 Generic second-order vector tangents

Instantiation of the generic first-order vector tangent data type with the generic first-order vector tangent data type over a non-derivative base data type yields the second-order scalar tangent data type. It enables the computation of vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its two domain dimensions of length $n$ as $$Y^{(1,2)}=\langle \nabla^2 f,X^{(1)},X^{(2)}\rangle \equiv \left(X^{{(1)}^T} \cdot \left [\nabla^2 f \right ]_i^{*,*} \cdot X^{(2)} \right)_{i=0,\ldots,m-1} \in D^{m \times l_1 \times l_2}$$ for $X^{(1)} \in D^{n \times l_1},$ $X^{(2)} \in D^{n \times l_2}$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ remains equal to $O(n^2) \cdot \text{Cost}(f)$ with both $X^{(1)}$ and $X^{(2)}$ set equal to the identities in $D^n.$

### 3.3 Other generic second-order tangents

Instantiation of the generic first-order vector tangent data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order tangent data type. Similarly, instantiation of the generic first-order scalar tangent data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order tangent data type.

### 3.4 Generic third- and higher-order tangents

Instantiation of tangent types with $k$th-order tangent types yields $(k+1)$th-order tangent types.

## 4 Second- and Higher-order Adjoint Mode

### 4.1 Generic second-order scalar adjoints

Instantiation of the generic first-order scalar adjoint data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order scalar adjoint data type. It enables the computation of scalar projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its image and one of its domain dimensions as $$\mathbf{x}_{(1)}^{(2)}=\langle \mathbf{y}_{(1)},\nabla^2 f,\mathbf{x}^{(2)}\rangle \equiv \left(\mathbf{y}_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \mathbf{x}^{(2)} \right)_{j=0,\ldots,n-1} \in D^n$$ for $\mathbf{y}_{(1)} \in D^m$ and $\mathbf{x}^{(2)} \in D^n$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ becomes $O(m\cdot n) \cdot \text{Cost}(f)$ with $\mathbf{y}_{(1)}$ and $\mathbf{x}^{(2)}$ ranging over the Cartesian basis vectors in $D^m$ and $D^n$, respectively.

### 4.2 Generic second-order vector adjoints

Instantiation of the generic first-order vector adjoint data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order vector adjoint data type. It enables the computation of vector projections of the Hessian $\nabla^2 f \in D^{m \times n \times n}$ in its image and one of its domain dimensions as $$X_{(1)}^{(2)}=\langle Y_{(1)},\nabla^2 f,X^{(2)}\rangle \equiv \left(Y_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot X^{(2)} \right)_{j=0,\ldots,n-1} \in D^{l_1 \times n \times l_2}$$ for $Y_{(1)} \in D^{m \times l_1}$ and $X^{(2)} \in D^{n \times l_2}$ over arbitrary base data types $D$. The computational cost of accumulating the whole Hessian over $D$ remains equal to $O(m\cdot n) \cdot \text{Cost}(f)$ with $Y_{(1)}$ and $X^{(2)}$ set equal to the identities in $D^m$ and $D^n$, respectively.

### 4.3 Other generic second-order adjoints

Instantiation of the generic first-order vector adjoint data type with the generic first-order scalar tangent data type over a non-derivative base data type yields a second-order adjoint data type. Similarly, instantiation of the generic first-order scalar adjoint data type with the generic first-order vector tangent data type over a non-derivative base data type yields a second-order adjoint data type.

Symmetry of the Hessian in its two domain dimensions yields the following additional second-order adjoint data types: instantiation of a generic first-order tangent data type with a generic first-order adjoint data type over a non-derivative base data type yields a second-order adjoint data type for computing $$\langle \mathbf{y}_{(2)}^{(1)},\nabla^2 f,\mathbf{x}^{(1)}\rangle \equiv \left(\mathbf{y}_{(2)}^{(1)^T} \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \mathbf{x}^{(1)} \right)_{j=0,\ldots,n-1} \in D^n.$$ Similarly, instantiation of a generic first-order adjoint data type with a generic first-order adjoint data type over a non-derivative base data type yields a second-order adjoint data type for computing $$\langle \mathbf{x}_{(1,2)},\mathbf{y}_{(1)},\nabla^2 f\rangle \equiv \left(\mathbf{y}_{(1)}^T \cdot \left [\nabla^2 f \right ]_*^{j,*} \cdot \mathbf{x}_{(1,2)} \right)_{j=0,\ldots,n-1} \in D^n.$$

### 4.4 Generic third- and higher-order adjoints

Instantiation of tangent types with $k$th-order adjoint types yields $(k+1)$th-order adjoint types. Similarly, instantiation of adjoint types with $k$th-order tangent or adjoint types yields $(k+1)$th-order adjoint types.