The supremum in the definition is necessary to capture the maximum variation in the function.
Here's an example to illustrate what happens if we ignore the supremum and take any fixed partition of the real line:
Consider the function $\sin(x)$, and let the subdivision of the real line be the zeros of $\sin(x)$. Then its variation over that subdivision is zero, even though its total variation is infinite because it does not have compact support. We could choose a similar example with compact support by replacing $\sin(x)$ with the zero function sufficiently far away from the origin and the point would be the same.
The basic idea of total variation is to capture the unsigned, total vertical distance traveled by the function. Oscillations like the sine function will travel a nonzero vertical distance, and will travel a greater vertical distance over a fixed interval as their frequency increases. To capture that vertical distance accurately for all functions, you need the supremum to be able to adapt the subdivision to the function. Then you can detect the degree to which your solution "oscillates", which makes it a useful property to track for numerical methods for solving hyperbolic partial differential equations.