Another way to solve this is to use Markov chains. Let state 0 be the start state. Then with probability 0.5 you will get heads and move to state 1; with probability 0.5 you will get tails and stay in state 0. Once in state 1, you will either get heads (probability 0.5) and remain in state 1 or you will get tails (probability 0.5) and move to state 2 which is the accepting state.
As a transition matrix then, we have:
$$ \begin{array}{c|ccc}
& 0 & 1 & 2 \\ \hline
0 & 0.5 & 0.5 & 0 \\
1 & 0 & 0.5 & 0.5 \\
2 & 0 & 0 & 1 \\
\end{array} $$
If we let $\phi(i)$ be the expected number of flips to go from state $i$ to state 2, then from this transition matrix we must have:
$\phi(0) = 0.5*(1 + \phi(0)) + 0.5*(1+\phi(1)) + 0*(1+\phi(2))$
$\phi(1) = 0*(1 + \phi(0)) + 0.5*(1+\phi(1)) + 0.5*(1+\phi(2))$
$\phi(2) = 0*(1 + \phi(0)) + 0*(1+\phi(1)) + 1*\phi(2)$
So, obviously, $\phi(2)=0$ and plugging this into the first two equations gives the same system to solve as in the other answers.
Solving for $\phi(0)$ gives $\phi(0) = 4$.
"Why" does this only require 4 flips in expectation making it different than the 6 flips needed in expectation to see HH or TT? If the coin is fair, then out of a two flip sequence, HH, HT, TH, and TT are all equally likely, so shouldn't it take the same amount of "time" before you expect to see any given one of those two-flip patterns?
No. Firstly because we just calculated it and the expectations are different. Unless our model of the situation is wrong, then 4 is simply the answer, even if it doesn't "feel" right. Secondly, though, there is an important asymmetry in the two problems.
When looking for HH, we would have a transition matrix like this:
$$ \begin{array}{c|ccc}
& 0 & 1 & 2 \\ \hline
0 & 0.5 & 0.5 & 0 \\
1 & 0.5 & 0 & 0.5 \\
2 & 0 & 0 & 1 \\
\end{array} $$
The difference is the second row. Instead of being able to stick around indefinitely in state 1, we are immediately faced with either succeeding (getting that second H) or failing and being forced to start over.
This means, on average, it should take longer in the HH case to reach the success state 2 from the middle state 1 than in our original problem. But the probability of going from the start state 0 to the middle state 1 remains identical between the two problems.
So we should expect that it takes longer on average to see the first HH than to see the first HT.
An interesting exercise would be to ask: is there a biased coin of some kind for which the expected number of flips to see HH is the same as the expected number of flips to see HT?
I decided to go ahead and work this one out assuming that $p$ represents the probability of the coin landing on Heads on a certain flip. You can modify the above two transition matrices to use $p$ and $1-p$ in the appropriate places, get the system of equations in each case (the HH case or the HT case) and then solve.
When I solve this, I get that in general, for a coin with bias $p$ for Heads, it will take on average $1/p + 1/(1-p)$ flips to see HT (this agrees with the answer in the 0.5 case) and it will take $(1+p)/p^2$ flips in the HH case, again agreeing with previous calculations for the 0.5 case.
This leads to an equation for $p$:
$\frac{1}{p} + \frac{1}{1-p} = \frac{1+p}{p^2}$
and after rearranging and simplifying, it is a polynomial equation:
$0 = 1 - p - p^2$
which has roots: 0.61803399, -1.618003399, the first of which can serve as a valid bias parameter for a coin.
Thus, if you ever encounter a coin with a probability of Heads of 0.61803399, then in that case it will be true that, on average, it takes as many flips to see the first occurrence of HH as it does to the first occurrence of HT.