1

I am reading a statistics textbook. On page 271, in the section 'a brief derivation', the author proves a theorem about the ordinary least squares method. He proves that the size of error is minimized, when the error vector $\vec{\epsilon}$ is orthogonal to $\bf{X}$. However, I don't understand how he derives.

$$ \vec{\epsilon_{n}}^{T}\vec{\epsilon_{n}} = (\vec{\epsilon} - \bf{X}\vec{\iota})^{T}(\vec{\epsilon} - \bf{X}\vec{\iota}) = \vec{\epsilon}^{T}\vec{\epsilon} - 2\vec{\iota}^{T}\bf{X}^{T}\vec{\epsilon} + \vec{\iota}^{T}\bf{X}^{T}\bf{X}\vec{\iota} $$

Is it trivial that $\vec{\iota}^{T}\bf{X}^{T}\vec{\epsilon} = \vec{\iota}\bf{X} \vec{\epsilon}^{T} $ ?

Jean Marie
  • 81,803
Nownuri
  • 113

1 Answers1

1

Keep in mind that the term $\epsilon^t\epsilon$ is a scalar (possibly real), which means that both $\iota^tX^t\epsilon$ and $\epsilon^t X\iota$ are also scalars. Furthermore

$$ (\iota^tX^t\epsilon)^t = \epsilon^t X\iota $$

To summarize: one of the terms is the transpose of the other one, both are scalars, the transpose of a scalar is the same, therefore, the terms are the same

caverac
  • 19,345