I want to solve the following equation $$\frac{\partial}{\partial {\bf \beta}} \left[||{\bf y}-{\bf X}{\bf \beta}||^2 + ||{\bf \beta}||^2\right] = 0$$ for $\beta$. Here ${\bf y}$ and ${\bf \beta}$ are vectors and ${\bf X}$ is a matrix. I am having trouble with the part of differentiating the equation. I can split it up into $$\frac{\partial}{\partial {\bf \beta}} ||{\bf y}-{\bf X}{\bf \beta}||^2 + \frac{\partial}{\partial {\bf \beta}}||{\bf \beta}||^2$$ and then use the rule that $$\frac{\partial}{\partial a}||a||^2 = 2a$$
The problem is with the other part. I can use the product rule, but I am still left with $\frac{\partial}{\partial {\bf \beta}}||{\bf y} - {\bf X}{\bf \beta}||^2$.
$\endgroup$ 42 Answers
$\begingroup$$$ \frac{\partial}{\partial \beta} \left(\|F(\beta)\|^2\right) = \frac{\partial}{\partial \beta} \left(F(\beta) \cdot F(\beta)\right) = 2 \left( \frac{\partial}{\partial \beta} F(\beta) \right) \cdot F(\beta) $$ $F(\beta) \in \mathcal{R}^D$, where $D$ is the dimension of $F(\beta)$.
$\endgroup$ 4 $\begingroup$Let's do a directional derivative instead, eventually building up to some voodoo magic.
$$a \cdot \nabla_\beta [(y - \underline X(\beta))^2 + \beta^2] = -\underline X(a) \cdot [-2(y - \underline X(\beta))] + 2 \beta \cdot a$$
But $\underline X(a) \cdot b = \overline X(b) \cdot a$. This exchanges a linear operator with its adjoint.
We can then use this to write the result as
$$2a \cdot [\overline X(\underline X(\beta)-y) + \beta]$$
Now we can take out the $a$ to get
$$2[\overline X(\underline X(\beta)-y) + \beta]$$
$\endgroup$