If we have a nice function $f$, then we can write its Taylor series,
$$
f(x+h) = f(x) + f'(x)h + \frac12f''(x)h^2 + \frac1{3!}f'''(x)h^3 + \cdots.
$$
Let $\Delta W_t=W_{t+\Delta t}-W_t$. Then, taking $x=W_t$ and $h=\Delta W_t$, we get
$$
f(W_{t+\Delta t}) = f(W_t) + f'(W_t)\Delta W_t + \frac12 f''(W_t)(\Delta W_t)^2
+ \frac1{3!}f'''(W_t)(\Delta W_t)^3 + \cdots.
$$
Now fix $T>0$. Let $\Delta t=T/n$ and let $t_j=j\Delta t$. Then
\begin{align}
f(W_T) &= \sum_{j=0}^{n-1} (f(W_{t_{j+1}}) - f(W_{t_j}))\\
&= \sum_{j=0}^{n-1} (f(W_{t_j + \Delta t}) - f(W_{t_j}))\\
&= \sum_{j=0}^{n-1} f'(W_{t_j})\Delta W_{t_j}
+ \frac12\sum_{j=0}^{n-1} f''(W_{t_j})(\Delta W_{t_j})^2
+ \frac1{3!}\sum_{j=0}^{n-1} f'''(W_{t_j})(\Delta W_{t_j})^3
+ \cdots.
\end{align}
The first sum converges in an appropriate sense to an Ito integral:
$$
\sum_{j=0}^{n-1} f'(W_{t_j})\Delta W_{t_j}
\to \int_0^T f'(W_t)\,dW_t.
$$
The second sum converges in an appropriate sense to an ordinary integral:
$$
\frac12\sum_{j=0}^{n-1} f''(W_{t_j})(\Delta W_{t_j})^2
\to \int_0^T f''(W_t)\,dt.
$$
This convergence is fundamentally connected to the fact that the quadratic variation of Brownian motion on the interval $[0,t]$ is $t$. The notation $(dW_t)^2 \sim dt$, or $dW_t \sim (dt)^{1/2}$, is a shorthand heuristic for this convergence and the rules and results that follows from it.
Because Brownian motion has a finite quadratic variation, it's $p$-variation for $p>2$ is zero. Because of this, all of the sums with $(\Delta W_{t_j})^n$, where $n>2$, tend to zero. This is how we derive Ito's rule.
Regarding Holder continuity, there is a fundamental connection between it and $p$-variation. (See https://en.wikipedia.org/wiki/P-variation#Link_with_Hölder_norm.) However, one must be careful when using this. The $p$-variation referenced there is the one used in analysis for deterministic functions. We do not define the quadratic variation of a stochastic process by simply applying the deterministic definition of the $2$-variation to each sample path. (See https://en.wikipedia.org/wiki/Quadratic_variation.) Rather, there is a subtle difference involving the choice of partitions and the type of convergence. Because of this subtle difference, you can have counterintuitive results. For example, Brownian motion has a finite quadratic variation, but it's sample paths have infinite $2$-variation.