In machine learning, it is common to manipulate vectors instead of scalars. This post lists a few identities, which can be helpful to quickly compute gradients over computational graphs. If you have a doubt, you should not hesitate to derive the scalar identities first and then generalize them to vectors.

Let’s define the following functions:

With the following conventions for the gradients and jacobian matrices:

Composition:

proof:

Let’s mask $\mathbf{F}(\vec{x})$ by a new variable $\vec{y}$. Using the multivariate chain rule, we get:

where the last product is a typical matrix multiplication. Therefore, we have:

Dot product:

Note that $\mathbf{F} \cdot \mathbf{G} = \mathbf{F}^\mathrm{T} \mathbf{G}$, where the second product is the matrix multiplication.

proof:

Written on April 11, 2015