I see two different formulae.
In this slides page 9, the derivative of $Ax$ is $A$.
But in some other documents such as . The derivative is $A^T$.
How to properly evaluate the derivative of Ax?
$\endgroup$ 13 Answers
$\begingroup$Let use the definition by differential
$$f(x_0+\Delta x)=f(x_0)+f'(x_0)\Delta x+o(\Delta x)$$
and by $f(x)=Ax$
$$A(x_0+\Delta x)=Ax_0+A\Delta x$$
therefore $f'(x)=A$.
$\endgroup$ 2 $\begingroup$The differential of $f(x)= Ax$ at a point $x_0$ is $Df(x_0)= A$.
This is because
$$\lim_{h\to 0} \frac{\|f(x_0+h) - f(x_0) - Df(x_0)h\|}{\|h\|} = \lim_{h\to 0} \frac{\|A(x_0+h) - Ax_0 - Ah\|}{\|h\|} = 0$$
$\endgroup$ $\begingroup$Its either, depending on your definition of a derivative. for concreteness suppose $x$ is $n\times 1$ and $A$ is $1\times n$. Then $f(x) =Ax$ is scalar valued, so the derivative $\frac{df}{dx}$ is a gradient vector, but what shape should a gradient vector be? This is answered by choosing how you prefer to state first order taylor approximation (the defining property of a derivative). The first of two choices is
$$ f(x+h) \approx f(x) + \frac{df}{dx}(x)\cdot h$$ i.e. $ f(x+h) \approx f(x) + (\frac{df}{dx})^T h$, which corresponds to $\frac{df}{dx}(x)=A^T$ having the same shape as $h$. The other alternative is to enforce
$$ f(x+h) \approx f(x) + \frac{df}{dx}(x) h $$ which corresponds to $\frac{df}{dx}=A $ having the shape of the transpose of $h$, $1\times n$.
$\endgroup$