In machine learning, it is common to manipulate vectors instead of scalars. This post lists a few identities, which can be helpful to quickly compute gradients over computational graphs. If you have a doubt, you should not hesitate to derive the scalar identities first and then generalize them to vectors.

Let’s define the following functions:

\begin{align*} h&\colon \mathbb{R} \rightarrow \mathbb{R} \\ f&\colon \mathbb{R}^n \rightarrow \mathbb{R} \\ g&\colon \mathbb{R}^n \rightarrow \mathbb{R} \\ \mathbf{F}&\colon \mathbb{R}^n \rightarrow \mathbb{R}^m \\ \mathbf{G}&\colon \mathbb{R}^n \rightarrow \mathbb{R}^m \end{align*}

With the following conventions for the gradients and jacobian matrices:

$f (\vec{x})=\left[\begin{array}{c}f_1(\vec{x})\\ \vdots\\ f_n(\vec{x})\end{array}\right]$ $\nabla (\vec{x})=\left[\begin{array}{c}\pder{f}{x_1}\\ \vdots\\ \pder{f}{x_n}\end{array}\right]$ $\mathbf{J}^\mathrm{T}_\mathbf{F}(\vec{x})=\left[\begin{array}{ccc} \pder{F_1}{x_1}(\vec{x}) & \dots & \pder{F_1}{x_n}(\vec{x})\\ \vdots & \ddots & \vdots\\ \pder{F_m}{x_1}(\vec{x}) & \dots & \pder{F_m}{x_n}(\vec{x})\\ \end{array}\right]$

$\nabla ( f + g ) = \nabla f + \nabla f$

### Multiplication:

$\nabla (f \, g) = g \,\nabla f + f \,\nabla g$

### Division:

$\nabla\left(\frac{f}{g}\right) = \frac{g\nabla f - f\nabla g}{g^2}$

### Composition:

$\nabla(h \circ f) = (h' \circ f) \nabla f$ $\nabla(f \circ \mathbf{F}) = \mathbf{J}_\mathbf{F}^\mathrm{T} \, (\nabla f \circ \mathbf{F})$

proof:

Let’s mask $$\mathbf{F}(\vec{x})$$ by a new variable $$\vec{y}$$. Using the multivariate chain rule, we get:

$\pder{f \circ \mathbf{F}}{x_k}(\vec{x}) = \sum_i \pder{f}{y_i}(\vec{y}) \pder{y_i}{x_k}(\vec{x}) = [\mathbf{J}_\mathbf{F}(\vec{x})_{:,k}]^\mathrm{T} \nabla f (\vec{y}) = [\mathbf{J}_\mathbf{F}(\vec{x})_{:,k}]^\mathrm{T} \nabla f (\mathbf{F}(\vec{x})),$

where the last product is a typical matrix multiplication. Therefore, we have:

$\nabla(f \circ \mathbf{F}) = [\mathbf{J}_\mathbf{F}(\vec{x})]^\mathrm{T} \nabla f (\mathbf{F}(\vec{x})).$

### Dot product:

Note that $$\mathbf{F} \cdot \mathbf{G} = \mathbf{F}^\mathrm{T} \mathbf{G}$$, where the second product is the matrix multiplication.

$\nabla(\mathbf{F} \cdot \mathbf{G}) = \mathbf{J}^\mathrm{T}_\mathbf{F} \, \mathbf{G} + \mathbf{J}^\mathrm{T}_\mathbf{F} \, \mathbf{G}$

proof:

$\pder{\nabla(\mathbf{F} \cdot \mathbf{G})}{x_k} = \sum_i \left[ \pder{F_i}{x_k} G_i + F_i \pder{G_i}{x_k} \right]$
Written on April 11, 2015