Derivative of a vector with respect to a matrix

$\begingroup$

let $W$ be a $n\times m$ matrix and $\textbf{x}$ be a $m\times1$ vector. How do we calculate the following then?

$$\frac{dW\textbf{x}}{dW}$$

Thanks in advance.

$\endgroup$ 3

3 Answers

$\begingroup$

The quantity in question is a $3^{rd}$ order tensor.

One approach is to use index notation$$\eqalign{ f_i &= W_{ij} x_j \cr\cr \frac{\partial f_i}{\partial W_{mn}} &= \frac{\partial W_{ij}}{\partial W_{mn}} \,x_j \cr &= \delta_{im}\delta_{jn} \,x_j \cr &= \delta_{im}\,x_n \cr }$$Another approach is vectorization$$\eqalign{ f &= W\,x \cr &= I\,W\,x \cr &= (x^T\otimes I)\,{\rm vec}(W) \cr &= (x^T\otimes I)\,w \cr\cr \frac{\partial f}{\partial w} &= (x^T\otimes I) \cr }$$

$\endgroup$ 6 $\begingroup$

While it's true that the true answer is a 3rd degree tensors, in the context of (Feed-Forward)NN, taking this gradient as part of the chain rule where you have a final output that is a scalar loss, the calculations can simplify enormously - and be represented as an outer product, where $\frac{d\textbf{x}W}{dW} = \textbf{x}^T \cdot \_\_$

Specifically, if $L$ is the loss, and $z=\textbf{x}W+b$ (or $\textbf{a}W+b$ for any downstream "inputs"/activations), where $\textbf{x}$ (or $\textbf{a}$) are row vectors, then:

$$\frac{\partial L}{\partial W} = \frac{\partial L}{\partial z} \frac{\partial z}{\partial W} =\frac{\partial L}{\partial z} \frac{\partial (\textbf{x}W+b)}{\partial W} = \textbf{x}^T\cdot\frac{\partial L}{\partial z} $$

I'm in the process of making a YouTube video with more explanation. Will update this as soon as it get's published.

EDIT: here is the video, the related part starts at 09:00

$\endgroup$ $\begingroup$

For independent case:

If $\mathbf{x}$ is independent of $W$, this problem can be calculated as follows.

$$\cfrac{\partial W\mathbf{x}}{\partial W}= \cfrac{\partial}{\partial W} \begin{bmatrix} w_{11} & w_{12} & \cdots & w_{1m} \\ w_{21} & w_{22} & \cdots & w_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ w_{n1} & w_{n2} & \cdots & w_{nm} \end{bmatrix} \mathbf{x} $$

$$ = \begin{bmatrix} \cfrac{\partial w_{11}}{\partial w_{11}} & \cfrac{\partial w_{12}}{\partial w_{12}} & \cdots & \cfrac{\partial w_{1m}}{\partial w_{1m}} \\ \cfrac{\partial w_{21}}{\partial w_{21}} & \cfrac{\partial w_{22}}{\partial w_{22}} & \cdots & \cfrac{\partial w_{2m}}{\partial w_{2m}} \\ \vdots & \vdots & \ddots & \vdots \\ \cfrac{\partial w_{n1}}{\partial w_{n1}} & \cfrac{\partial w_{n2}}{\partial w_{n2}} & \cdots & \cfrac{\partial w_{nm}}{\partial w_{nm}} \end{bmatrix} \mathbf{x} $$

Therefore, all elements are $1$. Eventually, the result is below.

$$ \cfrac{\partial W\mathbf{x}}{\partial W}= (\mathbf{x}^{\text{T}}\mathbf{1_{m}}) \mathbf{1_{n}} $$

Then $\mathbf{1_{k}} \in \mathbf{R}^{k}$ is

$$\mathbf{1_{k}}=[1 \ 1 \ \cdots 1]^{\text{T}}$$

For dependent case:

If $\mathbf{x}$ is dependent of $W$, it is more difficult than independent case. Likewise,

$$ \cfrac{\partial W\mathbf{x}}{\partial W}= (\mathbf{x}^{\text{T}}\mathbf{1_{m}}) \mathbf{1_{n}} + W \cfrac{\partial F(W) }{ \partial W}\mathbf{x}_{0} $$

Then, $\mathbf{x}$ can be replaced as follows

$$\mathbf{x}=F(W)\mathbf{x}_{0}$$

where, $F(W) \in \mathbf{R}^{m \times n}$ is a matrix function, for which parameters are $W$, and $\mathbf{x}_{0} \in \mathbf{R}^{m}$ is independent of $W$.

$\endgroup$ 3

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like