I am reading a statistics textbook. On page 271, in the section 'a brief derivation', the author proves a theorem about the ordinary least squares method. He proves that the size of error is minimized, when the error vector $\vec{\epsilon}$ is orthogonal to $\bf{X}$. However, I don't understand how he derives.
$$ \vec{\epsilon_{n}}^{T}\vec{\epsilon_{n}} = (\vec{\epsilon} - \bf{X}\vec{\iota})^{T}(\vec{\epsilon} - \bf{X}\vec{\iota}) = \vec{\epsilon}^{T}\vec{\epsilon} - 2\vec{\iota}^{T}\bf{X}^{T}\vec{\epsilon} + \vec{\iota}^{T}\bf{X}^{T}\bf{X}\vec{\iota} $$
Is it trivial that $\vec{\iota}^{T}\bf{X}^{T}\vec{\epsilon} = \vec{\iota}\bf{X} \vec{\epsilon}^{T} $ ?