Theorem: Let $Y=X\beta+\varepsilon$ where $$Y\in\mathcal M_{n\times 1}(\mathbb R),$$ $$X\in \mathcal M_{n\times p}(\mathbb R),$$ $$\beta\in\mathcal M_{n\times 1}(\mathbb R ),$$ and $$\varepsilon\in\mathcal M_{n\times 1}(\mathbb R ).$$
We suppose that $X$ has full rank $p$ and that $$\mathbb E[\varepsilon]=0\quad\text{and}\quad \text{Var}(\varepsilon)=\sigma ^2I.$$Then, the least square estimator (i.e. $\hat\beta=(X^TX)^{-1}X^Ty$) is the best unbiased estimator of $\beta$, that is for any linear unbiased estimator $\tilde\beta$ of $\beta$, it hold that $$\text{Var}(\tilde\beta)-\text{Var}(\hat\beta)\geq 0.$$
Proof
Let $\tilde\beta$ a linear unbiased estimator, i.e.$$\tilde\beta=AY\ \ \text{for some }A_{n\times p}\quad\text{and}\quad\mathbb E[\tilde\beta]=\beta\text{ for all }\beta\in\mathbb R ^p.$$
Questions :
1) Why $\mathbb E[\tilde\beta]=\beta$ for all $\beta$, I don't really understand this point. To me $\beta$ is fixed, so $\mathbb E[\tilde\beta]=\beta$ for all $\beta$ doesn't have really sense.
2) Actually, what is the difference between the least square estimator and the maximum likelihood estimator. They both are $\hat\beta=(X^TX)^{-1}X^Ty$, so I don't really see (if they are the same), why we give two different name.
$\endgroup$ 13 Answers
$\begingroup$The Gauss-Markov Theorem is actually telling us that in a regression model, where the expected value of our error terms is zero, $E(\epsilon_{i}) = 0$ and variance of the error terms is constant and finite $\sigma^{2}(\epsilon_{i}) = \sigma^{2} < \infty$ and $\epsilon_{i}$ and $\epsilon_{j}$ are uncorrelated for all i and j the least squares estimator $b_{0}$ and $b_{1}$ are unbiased and have minimum variance among all unbiased linear estimators. Note that there might be biased estimator which have a even lower variance.
Extensive information about the Gauss-Markov Theorem, such as the mathematical proof of the Gauss-Markov Theorem can be found here
However, if you want to know which assumption is necessary for $b1$ to be an unbiased estimator for $\beta1$, I guess that assumption 1 to 4 of the following post () must be fulfilled to have an unbiased estimator.
Furthermore, it is true that the maximum likelihood estimator and least squares estimator are equivalent under certain conditions, i.e if noise $\epsilon$ is Gaussian distributed.
Hope this helps.
HTH
$\endgroup$ $\begingroup$The Gauss-Markov theorem states that, under the usual assumptions, the OLS estimator $\beta_{OLS}$ is BLUE (Best Linear Unbiased Estimator). To prove this, take an arbitrary linear, unbiased estimator $\bar{\beta}$ of $\beta$. Since it is linear, we can write $\bar{\beta} = Cy$ in the model $y = \beta X + \varepsilon$. Furthermore, it is necessarily unbiased, $\mathbb{E} [ \bar{\beta} ] =C\mathbb{E}[y] = CX\beta= \beta$, which only holds when $CX=I$, with $I$ the identity matrix.
Then: \begin{align*} \operatorname{Var}[\bar{\beta}] &= \operatorname{Var}[Cy] \\ &= C \operatorname{Var}[y]C'\\ &= \sigma^2 CC' \\ &\geq \sigma^2 CP_XC' \\ &= \sigma^2 CX(X'X)^{-1}X'C' \\ &= \sigma^2 (X'X)^{-1} \\ &= \operatorname{Var}[\beta_{OLS}] \end{align*} Where $P_X$ is the projection matrix, $P_X = X(X'X)^{-1}X'$.
$\endgroup$ $\begingroup$1) The condition $\mathbb{E}[\tilde{\beta}]=\beta$ is just the condition "the estimator is unbiased" in mathematical form. Let's say you are considering the least squares estimator, then $$ \begin{align} \mathbb{E}[\hat{\beta}] &= \mathbb{E}[(X^{\rm T}X)^{-1}X^{\rm T}Y]\\ &= \mathbb{E}[(X^{\rm T}X)^{-1}X^{\rm T}X\beta+\epsilon]\\ &= \beta, \end{align} $$ and thus the least squares estimator is unbiased. You do have to assume that the noise is zero mean by the way. So not every estimator of the form $\tilde{\beta}=AY+D$ is unbiased.
2) Maximum likelihood and least squares are equivalent under certain conditions, that is if you assume the noise $\epsilon$ is Gaussian. Change that and they won't be the same.
$\endgroup$ 2