How should you explain parallel transport to undergraduates?

Question

The title is a bit deceiving, because what I really mean is the parallel transport that corresponds to the Levi–Civita connection.

This is in the vein of many other questions on mathoverflow:

But the focus is different.

Let me summarize my understanding given answers in the questions above:

(Very Rough) Summary of Answers in Previous Questions

There exists an interpretation in terms of $G$-structures, as in the chosen answer (by Chris Schommer-Pries) to What is torsion in differential geometry intuitively?.
There exists an interpretation as having some universal property, as described in the top answer (by Robert Bryant) to What is the Levi-Civita connection trying to describe?.
There exists a deceiving but appealing interpretation by parallelograms whose sides don't really lie in the same space, as in the answer by Gabe K to What is the Levi-Civita connection trying to describe?. (Gabe K did a heroic effort to make sense of the nonsensical diagram, and I thank him dearly.)
There exists an interpretation regarding rolling the shape on a surface (Rolling without slipping interpretation of torsion).

But ultimately, none of that is something that I can intuitively sell to an undergraduate, and by undergraduate I really mean my heart. In the bottom of my heart, I need a better explanation, one that starts with desirable properties, and then proceeds through existence and uniqueness.

Outline of the Type of Intuition I Desire

I want to start with some desirable behaviors, which I allow to be external (i.e, to reference a given embedding of the Riemannian manifold into $\mathbb{R}^n$), and then say that the only notion of parallel transport that satisfies these conditions must be the Levi–Civita connection. (Any reasonable notion of parallel transport will respect the metric, so I'm really thinking of the torsion-free condition.)

A base case of a desirable condition is that for the Riemannian manifold $\mathbb{R}^n$, parallel transport is the trivial thing. (If one identifies the tangent bundle with $\mathbb{R}^n\times\mathbb{R}^n$ then for any path $\gamma$ the parallel transport of the tangent vector $(\gamma(0),v)$ at $\gamma(0)$ to $\gamma(1)$ via $\gamma$ is the tangent vector $(\gamma(1),v)$ at $\gamma(1)$.)

Next, we would like some way to generalize to a general Riemannian manifold. Let $(M,g)$ be a Riemannian manifold, and let $p\in M$ be a point. Then by the implicit function theorem we can have a chart $f:V\rightarrow U\subset \mathbb{R}^d$ where $0\in V\subset \mathbb{R}^n$, and $p\in U\subset M$, such that $f(0)=p$ and such that $f$ is the identity on the first $n$ coordinates.

My next thought is to look at the most intuitive case of torsion-freeness, which is the case of commuting fields $X$ and $Y$. By change of coordinates, we can assume WLOG that on $V$ the vector fields $X$ and $Y$ are defined via the constant functions $X(v)=e_1$ and $Y(v)=e_2$. One can then express $X$ on $Y$ on $M$ via the derivative of $f$.

But I'm missing multiple components to proceed.

So let me ask this in terms of several more explicit questions.

Questions

If a connection satisfies that $\nabla_XY=\nabla_YX$ for any commuting set of vector fields $X$ and $Y$, then is it torsion-free? (In other words, if you're torsion-free on commuting vector fields, are you torsion-free for all vector fields?)
What intuitive desirable condition (that is allowed to use a given embedding of $M$ into some $\mathbb{R}^d$), combined with, or perhaps generalizing the desired behavior of parallel transport on Euclidean space, would uniquely determine it as satisfying $\nabla_XY=\nabla_YX$ for commuting vector fields? (Perhaps something about geodesics? Or volumes? I don't really know what's the missing component here.)
I feel like Ben McKay's answer to What is the Levi-Civita connection trying to describe? is coming close to what I want, but I did not get to the bottom of it. It appeared at first that he was saying that the Levi–Civita parallel transport is simply parallel transporting in the ambient space, and then projecting to the tangent plane. But in retrospect, my interpretation is clearly wrong. (Imagine for example an upward pointing vector on the equator of a sphere, being parallel transported to the top. If you parallel transport in $\mathbb{R}^3$ you'll get a vector pointing up, which projected to the tangent space will be the $0$ vector.)
A little more vaguely, in case you have an entirely different notion in mind, how would you explain parallel transport to the undergraduate in your heart?

Q1: do you only want $X, Y$ to commute on a local domain, or do you require that $X,Y$ be global? (The local answer is yes, and you can see this by using coordinate vector fields.) // Q2: Hessian of any function being symmetric + local coordinate description would work. // Q3: you have to do the projection the entire time as you move around. — Willie Wong, Mar 26 '21 at 19:23
For Q1, I meant locally commuting. For Q2, I'm not sure what you mean. Can you expand that to an answer? For Q3: I presume you mean some sort of integration operation, but I can't quite see how that would work -- wouldn't the vectors keep on losing magnitude, or would you project and then enforce the same magnitude as part of the operation. Can you expand on that as well? — Andrew NC, Mar 26 '21 at 19:26
As you carry a vector $v(t)$ with you along a path $x(t)$, as you travel along a surface, make sure that its derivative $\dot{v}(t)$ is always normal. From the perspective of someone living on the surface, it looks constant, because that person can't measure normal vectors. — Ben McKay, Mar 26 '21 at 19:27
Vladimir Arnold''s book "Mathematical Methods of Classical Mechanics" describes Levi-Civita parallel transport first along geodesics on a surface, then along curves on a surface, and then along curves in a higher-dimensional space. It may be of interest. — Quarto Bendir, Mar 26 '21 at 19:40
Several things that looked like they were supposed to be links to answers were actually links to the questions. I edited accordingly; I hope that is all right. — LSpice, Mar 26 '21 at 20:37
For any connection, you can define $\nabla^2 f(X,Y) = \nabla_X \nabla_Y f - \nabla_{\nabla_X Y} f$. You can check that this definition ensures $\nabla^2 f$ is tensorial: given any smooth function $\phi$, you have $\nabla^2 f(X,\phi Y) = \phi \nabla^2 f(X,Y)$ etc. This is the Hessian tensor corresponding to a function $f$. In general there is no reason that $\nabla^2 f(X,Y) = \nabla^2 f(Y,X)$. But if you subtract, you find the difference $$\nabla^2 f(X,Y) - \nabla^2 f(Y,X) = [X,Y]f - \nabla_{\nabla_X Y - \nabla_Y X} f$$ which vanishes for all $f$ and all $X,Y$ iff torsion vanishes. — Willie Wong, Mar 27 '21 at 00:30
I would say parallel transport is equivalent to the notion of a connection, i.e. any good definition of a connection should implicitly answer your question. My preference is the Ehresmann formalism, i.e. a projection operator on the double tangent bundle. I find it's natural to extrapolate from that. That said, this is for relatively advanced undergraduates. — Ryan Budney, Mar 27 '21 at 03:00
@RyanBudney, the question was specifically about motivating torsion free-ness. — Andrew NC, Mar 27 '21 at 05:37
@AndrewNC: I suppose I view having torsion is essentially just a different "presentation" of the same connection as the equivalent one that is torsion free. I tend to present torsion with more of a physical perspective. Torsion just adds infinitesimal rotation to the parallel transport. — Ryan Budney, Mar 27 '21 at 05:55
You: "Imagine for example an upward pointing vector on the equator of a sphere, being parallel transported to the top. If you parallel transport in $\mathbb{R}^3$ you'll get[...]be the $0$ vector." Then try transporting that vector only a few degrees of latitude to the north. It will now have a component parallel to the sphere, and a component normal to the sphere. Throw away the normal part, and scale the tangential part to have the full length. Move like that in small steps to the north pole. In the limit where the steps become infinitesimal, you should have the correct parallel transport? — Jeppe Stig Nielsen, Mar 27 '21 at 19:33
To the undergraduate in my heart: Walking around while looking downward at a map held horizontally in your palm is NOT parallel transport. Walking around while looking downward at a GPS device held horizontally in your palm IS parallel transport. Either way, though, you're going to bump into something if you don't look up. — Lee Mosher, Mar 29 '21 at 16:15
I admire the elegance of your method of computation; it must be nice to ride through these fields upon the horse of true mathematics while the like of us have to make our way laboriously on foot. — Albert Einstein Spaghetti and Levi–Civita. [asked what he liked most about Italy] — Albert Einstein — Ben McKay, Mar 30 '21 at 11:53

score 27 · Accepted Answer · edited Mar 27 '21 at 10:25

27

This may not reallly be an answer that you like, but I think that, maybe you misunderstood what Ben McKay was trying to describe. Here is a more explicit, extrinsic description that may help:

Suppose that $M^m\subset\mathbb{E}^n$ is an isometrically embedded submanifold of Euclidean $n$-space. Let $\gamma:(a,b)\to M^m$ be a smooth curve in $M$ and let $v:(a,b)\to\mathbb{E}^n$ be a curve of vectors along $\gamma$, i.e., $v(t)$ lies in the tangent space $T_{\gamma(t)}M$ for all $t\in (a,b)$. Say that $v$ is parallel (along $\gamma$) if $v':(a,b)\to\mathbb{E}^n$ is normal to $TM$ along $\gamma$, i.e., $v'(t)\perp T_{\gamma(t)}M$ for all $t\in(a,b)$. In other words, the velocity of $v$ is always perpendicular to the tangent vectors to $M$ at the point of tangency.

Then the (easily proved) proposition is that this notion of a tangent vector field along a curve being parallel along $\gamma$ does not depend on the choice of the isometric embedding, i.e., it is intrinsic to the metric induced on $M$ by its embedding. More generally, if $v:(a,b)\to\mathbb{E}^n$ is tangent along $\gamma$, then letting $D_\gamma v(t)$ be the orthogonal projection of $v'(t)$ onto $T_{\gamma(t)}M$ yields another curve $D_\gamma v:(a,b)\to\mathbb{E}^n$ that is tangent along $\gamma$, and this operation (actually a derivation) on tangent fields along $\gamma$ depends only on the induced metric on $M$. Since it is independent of the choice of isometric embedding, it is the 'covariant part' of the ambient derivative, i.e., the 'covariant derivative'.

For example, it follows from the definition that if $v$ is a parallel tangent vector field along $\gamma$, then the length of $v$ is constant. Then the existence and uniqueness of 'parallel transport' follow by elementary ODE arguments. The Leibnitz rule for the 'covariant derivative' and other properties are easily derived from the definition as well.

Once you know that $\nabla_{\gamma'}v$ for a curve of tangent vectors depends only on the metric, it's natural to want to find a formula for it that uses only on the metric and not the (superfluous) isometric embedding. That is what leads to the usual characterizations.

edited Mar 27 '21 at 10:25

Ben McKay

25,490

answered Mar 26 '21 at 19:43

Robert Bryant

106,220

I'm having some difficulty understanding. Let's take a trivial example: $M=\mathbb{R}^2$ and the embedding is the identity. Let $\gamma(t)=(t,0)$, and let $v$ be given by the (Levi-Civita) parallel transport of the vector $(1,0)\in T_0\mathbb{R}^2$ along $\gamma$. Then $v(t)=(t+1,0)$? But I'm confused: $v'(t)=(1,0)$ which is clearly not normal to any tangent space at any point of $M$. Where am I going wrong here? – Andrew NC Mar 26 '21 at 22:35
3

Oh, I think I understand: you don't mean $v(t)=(t+1,0)$. You're actually defining $v$ as take the vector at the tangent space, and translate it so that it starts at the origin. So in my case $v(t)=(1,0)$, and $v'(t)=(0,0)$ which is of course normal to the tangent space. Got it! – Andrew NC Mar 26 '21 at 23:12
@AndrewNC: Yes, that's correct. In this picture, a tangent vector field along a curve $\gamma:(a,b)\to M\subset\mathbb{E}^n$ is just a curve $v:(a,b)\to\mathbb{E}^n$ such that $v(t)$ lies in the subspace $T_{\gamma(t)}M\subset \mathbb{E}^n$ for all $t\in(a,b)$. Note that, in this picture, the tangent bundle of $M$ is just a subbundle of the trivial vector bundle $M\times\mathbb{E}^n$. – Robert Bryant Mar 27 '21 at 00:29
2

Look at Dirac, General Theory of Relativity, pp. 10-12. Dirac iis less rigorous than Robert, but tells the same story. Dirac immediately derives the Christoffel symbols from this description in a few lines. – Ben McKay Mar 27 '21 at 11:46
At some point, I convinced myself that there was a variational way to state this decsription, which I found intuitive. (I hope it is correct!) Let $\gamma(t)$ be a curve on $M$ and consider the curves $x(t) = \gamma(t) + v(t)$ in the ambient Euclidean space, subject to the conditions that $v(t) \in T_{\gamma}(t)(M)$ for all $t$ and $|v(t)|$ is constant. Then I convinced myself that the minimal length such curves are those where $\gamma(t)$ obeys parallel transport. – David E Speyer Mar 27 '21 at 19:31
In terms of a story: I am walking along $M$. I am carrying a spear, which I always hold parallel to my local ground, and has a rigid length. I want the tip of the spear to move as little as possible. – David E Speyer Mar 27 '21 at 19:32
@DavidESpeyer: I don't really understand your construction unless you meant "where $v(t)$ obeys parallel transport" instead of "where $\gamma(t)$ obeys parallel transport". Parallel transport is supposed to be defined for vectors along a base curve $\gamma$. Given that, I don't see how your description could be right. Suppose that $\gamma(t)$ is a circle of radius $1$ centered at the origin in the plane $z=0$. Then setting $v(t)=-\gamma(t)$ makes $\gamma(t)+v(t)$ have minimal length, but $v(t)$, not being constant, is not the parallel transport of any vector based at any point on the circle. – Robert Bryant Mar 28 '21 at 00:48
Okay, I will think about this more. – David E Speyer Mar 28 '21 at 01:33
I think what I wrote is unfixably wrong. Thanks for the correction! – David E Speyer Mar 28 '21 at 02:17
1

Do you know if this account roughly corresponds to the historical way the Levi-Civita connection was discovered? – Michael Bächtold Mar 29 '21 at 07:04
2

@MichaelBächtold: I have never read the historical documents. My understanding is that in 1917 Levi-Civita did something along the lines I have described above for hypersurfaces in Euclidean space, which was then generaized by Weyl. In 1918, Jan Schouten, working independently, found the fully intrinsic description of the parallelism associated to a metric. Possibly because of the war, Schouten was unaware of Levi-Civita's work, and there followed a priority dispute, which Levi-Civita won. You can see this account in the Wikipedia article on Jan Arnoldus Schouten. – Robert Bryant Mar 29 '21 at 11:08

Mohammad Ghomi · Answer 2 · 2021-04-02T14:04:25.650

I agree with Ben McKay and Robert Bryant that the best way to introduce parallel transport to students, or to provide some motivation and intuition for it, is via an extrinsic approach, i.e., by the example of a tangent vector field to a surface whose derivative along a curve is orthogonal to the surface. Then I explicitly compute for them the parallel transport along a meridian of a sphere in an elementary way which I attach below.

After I do this computation I tell the students that this explains the precession of the swing plane of a pendulum, as observed by the French Physicist Leon Foucault in $1851$ (which predates Levi-Civita; see the historical note below), and how they can use the answer to figure out the latitude of their location on Earth. There is a nice Wikipedia page on Foucault's Pendulum which I also tell the students to check out.

Finally, I will tell them that the computation can be carried out much more quickly via the well-known trick of constructing a cone which is tangent to the sphere along the meridian and unrolling the cone into the plane. Since the cone and the sphere are tangent along the curve, a tangent vector field along the curve is parallel with respect to one surface if it is parallel with respect to the other one. Furthermore since, as Robert mentioned, parallel transport is intrinsic, it is not effected by unrolling (or isometric immersion) of the cone into the plane, where parallel transport is trivial, and the total rotation of the vector field with respect to the meridian can be computed immediately (it will be equal to the total angle of the cone at its apex).

Here is the explicit computation for the parallel transport of a vector along a meridian of the unit sphere $\mathbf{S}^2$. This is an excerpt from my lecture notes. Let $$X(\theta,\phi):=\big(\cos(\theta)\sin(\phi), \sin(\theta)\sin(\phi), \cos(\phi)\big)$$ be the standard parametrization for $\mathbf{S}^2-\{(0,0,\pm 1)\}$. Suppose that we want to parallel transport a given unit vector $V_0\in T_{X(\theta_0,\phi_0)}\mathbf{S}^2$ along the meridian $X(\theta,\phi_0)$. So we need to find a mapping $V\colon[0,2\pi]\to\mathbf{R}^3$ such that $V(0)=V_0$, $V(\theta)\in T_{X(\theta,\phi_0)}\mathbf{S}^2$, and $V'(\theta)\perp T_{X(\theta,\phi_0)}\mathbf{S}^2$. The latter condition is equivalent to the requirement that $$ V'(\theta)=\lambda(\theta) X(\theta,\phi_0), \quad(*) $$ for some scalar function $\lambda$, since the normal to $\mathbf{S}^2$ at the point $X(\theta,\phi)$ is just $X(\theta,\phi)$ itself. To solve this equation let $$ E_1(\theta):=\frac{\partial X/\partial\theta(\theta,\phi_0)}{\|\partial X/\partial\theta(\theta,\phi_0)\|}=\big(-\sin(\theta),\cos(\theta),0\big), $$ and $$ E_2(\theta):=\frac{\partial X/\partial\phi(\theta,\phi_0)}{\|\partial X/\partial\phi(\theta,\phi_0)\|}=\big(\cos(\theta)\cos(\phi_0),\sin(\theta)\cos(\phi_0),-\sin(\phi_0)\big). $$ Note that $\{E_1(\theta),E_2(\theta)\}$ forms a basis for $T_{X(\theta_0,\phi_0)}\mathbf{S}^2$. Thus equation $(\ast)$ above is equivalent to $$ \langle V'(\theta), E_1(\theta)\rangle =0\quad\text{and}\quad \langle V'(\theta), E_2(\theta)\rangle =0. \quad (\ast\ast) $$ It remains to solve this system of differential equations. Since $V(\theta)\perp V'(\theta)$, $V(\theta)$ has unit length. So we may write $$ V(\theta)=\cos(\alpha(\theta))E_1(\theta)+\sin(\alpha(\theta))E_2(\theta), $$ for some angle function $\alpha$. Differentiation yields that $$ V'=E_1'\cos(\alpha)-\sin(\alpha)\alpha'E_1+\sin(\alpha)E_2'+\cos(\alpha)\alpha' E_2. $$ Furthermore, it is easy to compute that $$ E_1'=-\cos(\phi_0) E_2-\sin(\phi_0)E_3\quad\text{and}\quad E_2' =\cos(\phi_0) E_1, $$ where $E_3(\theta):=X(\theta,\phi_0)$. Thus we obtain: $$ V'=\sin(\alpha)(\cos(\phi_0)-\alpha')E_1+\cos(\alpha)(\alpha'-\cos(\phi_0))E_2 +(some\; terms) E_3. $$ So for ($\ast\ast$) to be satisfied, we must have $\alpha'=\cos(\phi_0)$ or $$ \alpha(\theta)=\cos(\phi_0)\theta+\alpha(0), $$ which in turn determines $V$. Note in particular that the total rotation of $V$ with respect to the meridian $X(\theta,\phi_0)$ is given by $$ \alpha(2\pi)-\alpha(0)=\int_0^{2\pi}\alpha'd\theta=2\pi\cos(\phi_0). $$ Thus $$ \phi_0=\cos^{-1}\left(\frac{\alpha(2\pi)-\alpha(0)}{2\pi}\right). $$ The last equation gives the relation between the total precession of the swing plane of a pendulum during a $24$ hour period, and the longitude of the location of that pendulum on Earth, as observed by Foucault.

Historical Note: The precession of pendulum was mentioned by the Dutch geometer Jan Arnoldus Schouten in a letter to Levi-Civita in 1918, where he wrote "a remarkable physical application is the Foucault pendulum". This paper appears to give a nice account of the historical development of parallel transport through the correspondence between Schouten and Levi-Civita.

score 10 · Answer 3 · answered Mar 26 '21 at 19:48

I prefer to introduce the Levi-Civita connection before the concept of parallel transport. On the surface, parallel transport seems like it should be more intuitive and easier to visualize geometrically. On the other hand, as you yourself indicate, it's not so obvious.

Although I have a preference for introducing manifolds abstractly and not as a submanifold of Euclidean space, I prefer to introduce Riemannian geometry starting with the induced geometry of a hypersurface in Euclidean space. The Levi-Civita connection arises naturally when you study the directional derivative of a vector field. It arises from two important observations: First, the directional derivative splits into tangential and normal components. Second, the normal component depends only on the (second fundamental form) surface and not on the curve or the derivative of the vector field. So it is natural to define the directional derivative of the vector field as only the tangential component.

Now you study this directional derivative. It has (at least) two nice properties: 1) As Willie mentioned, if you define the Hessian of a function using it, the Hessian is symmetric. 2) It is compatible with the metric and torsion-free. More importantly, it is the unique connection satisfying these two properties. This means that any property of the connection is a property of the metric.

This now makes the concept of the Levi-Civita connection for an abstract Riemannian manifold a compelling one. Parallel transport is now a natural development from that.

score 9 · Answer 4 · answered Mar 29 '21 at 04:32

9

I like the bike wheel interpretation introduced by Mark Levi (A “bicycle wheel” proof of the Gauss–Bonnet theorem Exposition. Math. 12.2 (1994), 145–164).

It takes no time in class and helps to build right intuition.

answered Mar 29 '21 at 04:32

Anton Petrunin

43,739

score 1 · Answer 5 · answered Oct 29 '21 at 05:07

If intuition is what you want then I think the essence of the matter is best expressed using locally flat coordinates at each point as follows. At any point $p$ in a Riemannian (or pseudo-Riemannian, for those of us who use this in the context of general relativity) manifold, one can always choose "locally flat coordinates" (lfc) such that the metric components gave vanishing first partial derivatives at $p$. The (Levi-Civita) covariant derivative of a vector (or tensor) at $p$ is simply the partial derivative with respect to any lfc. A vector is parallel transported along a curve if at each point in lfc its components have vanishing derivative with respect to the curve parameter. This is of course not a practical definition, since the lfc are different at each point. For practicality, one can use the connection components in any coordinate system. Note however that if the curve is a geodesic, then there exist coordinates that cover a neighborhood of the curve and are locally flat at every point on the curve. In that case, parallel transport just keeps all the components of the vector constant in such a coordinate system. An example is transport around the equator of the 2-sphere using standard spherical polar coordinates.

How should you explain parallel transport to undergraduates?

(Very Rough) Summary of Answers in Previous Questions

Outline of the Type of Intuition I Desire

Questions

5 Answers5