148

I suppose this question can be interpreted in two ways. It is often the case that two or more equivalent (but not necessarily semantically equivalent) definitions of the same idea/object are used in practice. Are there examples of equivalent definitions where one is more natural or intuitive? (I'm meaning so greatly more intuitive so as to not be subjective.)

Alternatively, what common examples are there in standard lecture courses where a particular symbolic definition obscures the concept being conveyed.

Gerry Myerson
  • 39,024
QPeng
  • 33
  • 9
    This definition of the product topology is really not so bad when you correct the typos and translate it to words: it says every point X x Y should have an open neighborhood that's a product of open sets in X and in Y. What's wrong with that? – Jonathan Wise Dec 02 '09 at 16:52
  • 14
    Well, the natural generalization of that definition is the box topology, whereas the natural generalization of Daniel's definition is the (categorical) product topology. – Qiaochu Yuan Dec 02 '09 at 17:07
  • There's nothing wrong with it at all :) I mean I never claimed as such, I just think the latter definition encapsulates 'Why it's interesting' in addition to defining it. – QPeng Dec 02 '09 at 17:16
  • 2
    There are two separate issues here: (1) Wise's translation of the symbolic definition is an improvement because it is conceptual instead of symbolic. Symbolic definitions are harder for most of us to understand but they have the advantage that they can be checked mechanically. (But another disadvantage is that they typically contain misprints.) I will make the second point in a separate comment. – SixWingedSeraph Dec 02 '09 at 17:24
  • 16
    My second comment: (2) The definition in terms of open sets is spiritually a construction, not a definition. It may be described as "a construction in terms of open sets that works only for finite products". The definition in terms of coarsest topology is a genuine definition, and is generally accepted as the correct definition, but it doesn't give you a construction. The genuine definition gives you much more intuition about the product, but sometimes you need a construction. Some of my fellow category theorists regard that bit about needing a construction as a heresy. – SixWingedSeraph Dec 02 '09 at 17:24
  • 12
    The definition in terms of a coarsest topology gives you a perfectly valid construction: take the inverse image of every open set. – Qiaochu Yuan Dec 02 '09 at 17:28
  • Actually, I think Jonathan's definition in words above does have a misprint, and is satisfied by the trivial topology. "...should have a basis of neighborhoods which are products..." fixes it. – George Lowther Dec 02 '09 at 22:32
  • 2
    That the products $U\times V$ for $U,V$ open form a basis of the product topology is an important fact, and I do not see anything wrong in using it as a definition. If you use another definition, you will still want to know this fact.

    Of course, it is also very important to know that a map to a product is contnuous if and only if all of its components are.

    – Carsten S Dec 03 '09 at 15:19
  • 2
    @Qiaochu: that is not very constructive! "Every" open set? That's quite a lot! You could get away with saying every inverse images under the projection maps but then you need to take finite intersections to get the topology which somewhat counters the simplicity of the categorical definition. – Andrew Stacey Dec 04 '09 at 15:03
  • Do constructions need to be constructive? :P It proves the existence of the initial topology and that's enough for products, subspaces, the weak topology, etc.

    Anyway, the point has already been made on MO several times that the particular way a universal object is constructed is less important than its universal property.

    – Qiaochu Yuan Dec 04 '09 at 15:51
  • 1
    Constructions need to be constructive in the context of the discussion. SWS nicely contrasted the two definitions as being constructive and intuitive. In that context, your response reads as though you think that the "proper" definition fulfills both roles. It doesn't, not least because it is tautological (one of the objects from which you are pulling back open sets is itself!). The "constructive" definition says "These are the open sets" and that's very useful for directly working with that object and for gaining intuition as to how universal objects behave in general. – Andrew Stacey Dec 04 '09 at 21:30
  • 3
    Bother this character limit! The point is that the universality of universal objects is fine once you have a good understanding of what that means. Before that, it's a load of mumbo-jumbo seemingly designed to confuse and obfuscate and keep the riff-raff out. To gain that understanding, you need to work with actual examples where the structure is plain and simple and can be manipulated. This is an excellent example to do this for the above reasons and since it obviously extends the topology on R^2 (and R^n). – Andrew Stacey Dec 04 '09 at 21:33
  • 13
    This general point about definitions needs to be made: The definition is intended to give a (more or less) minimal technical description of the concept that implies all true theorems about the concept and nothing else. It doesn't matter if the definition emphasizes technical aspects and doesn't mention some big intuitive ideas about it. That's not what definitions are for. A teacher should provide many ways to think about the concept, some of which might constitute definitions. – SixWingedSeraph Dec 05 '09 at 01:35
  • 1
    A comment about the comments: When people say ‘constructive’, they should be saying ‘predicative’. It's true that many constructive mathematicians also try to be predicative, but it's cleaner to separate the concepts. (There's a slight complication in that quantifying over functions is less impredicative than quantifying over subsets, which is a distinction that can only be made constructively, but we're way beyond that here.) – Toby Bartels Feb 12 '14 at 18:13

31 Answers31

262

Many topics in linear algebra suffer from the issue in the question. For example:

In linear algebra, one often sees the determinant of a matrix defined by some ungodly formula, often even with special diagrams and mnemonics given for how to compute it in the 3x3 case, say.

det(A) = some horrible mess of a formula

Even relatively sophisticated people will insist that det(A) is the sum over permutations, etc. with a sign for the parity, etc. Students trapped in this way of thinking do not understand the determinant.

The right definition is that det(A) is the volume of the image of the unit cube after applying the transformation determined by A. From this alone, everything follows. One sees immediately the importance of det(A)=0, the reason why elementary operations have the corresponding determinant, why diagonal and triangular matrices have their determinants.

Even matrix multiplication, if defined by the usual formula, seems arbitrary and even crazy, without some background understanding of why the definition is that way.

The larger point here is that although the question asked about having a single wrong definition, really the problem is that a limiting perspective can infect one's entire approach to a subject. Theorems, questions, exercises, examples as well as definitions can be coming from an incorrect view of a subject!

Too often, (undergraduate) linear algebra is taught as a subject about static objects---matrices sitting there, having complicated formulas associated with them and complex procedures carried out with the, often for no immediately discernible reason. From this perspective, many matrix rules seem completely arbitrary.

The right way to teach and to understand linear algebra is as a fully dynamic subject. The purpose is to understand transformations of space. It is exciting! We want to stretch space, skew it, reflect it, rotate it around. How can we represent these transformations? If they are linear, then we are led to consider the action on unit basis vectors, so we are led naturally to matrices. Multiplying matrices should mean composing the transformations, and from this one derives the multiplication rules. All the usual topics in elementary linear algebra have deep connection with essentially geometric concepts connected with the corresponding transformations.

  • 123
    "Even relatively sophisticated people will insist that det(A) is the sum over permutations..." Yes indeed. How else do you prove that SL(n) is an algebraic group? How you want to think of a determinant depends on the situation. – JS Milne Dec 09 '09 at 06:34
  • 76
    Of course you are right, and perhaps my post is a bit of a rant! I apologize. (But surely it is implicit in the question that all the equivalent formulations of a definition might find a suitable usage.) My point is that in an undergraduate linear algebra class, a computational approach to the determinant obscures its fundamental geometric meaning as a measure of volume inflation. The permutation sum definition is especially curious in an undergraduate course, because the method is not feasible (exponential time), whereas other methods, such as the LU decomposition, are polynomial time. – Joel David Hamkins Dec 10 '09 at 14:09
  • Along the same lines, the way kernel is defined in linear algebra is misleading. I know I was confused at first, but I had not yet had abstract algebra. – B. Bischof May 27 '10 at 14:09
  • 3
    @B. Bischof: I don't understand. What is the misleading way to explain the kernel, and what is a better way? – Tom Ellis May 27 '10 at 15:48
  • 6
    The problem with the dynamical perspective is that it is way harder to grasp for algebraic/combinatorial-minded people than any formula, however complicated it is. I still don't get the difference between a transformation of points and a transformation of coordinates; for me, they're all endomorphisms of a vector space. – darij grinberg May 27 '10 at 16:38
  • 7
    I must comment on the determinant: yes, it is the sum over permutations - it's an entirely combinatorial object after all. More seriously: yes, it's good to know that it's a volume, but it depends an incredible lot on what you are going to do with it. The Lindström-Gessel-Viennot Lemma, the Matrix Tree Theorem, the main theorem for the Cartier Foata Monoid, etc., all don't have that much to do with volumes... – Martin Rubey May 27 '10 at 16:51
  • 14
    By the way, there are serious difficulties with "defining" the determinant as a volume: to begin with, volumes are positive... – Victor Protsak May 28 '10 at 01:30
  • 20
    Victor, of course I mean the signed volume. And I don't think it is so difficult. One can even treat it axiomatically: inflating one dimension by a factor multiplies the volume by that factor; swapping two coordinates reverses orientation; skews do not change volume. From these principles one can derive the usual formulas, while also providing a feasible means to compute it. – Joel David Hamkins Jun 02 '10 at 21:42
  • 49
    Another way to say it "axiomatically" is that the determinant is the induced endomorphism of the top exterior power of the vector space. Of course, it probably shouldn't be defined that way in a first linear algebra course! – Mike Shulman Jul 16 '10 at 03:14
  • 1
    @Mike Shulman: This is how Bourbaki defines it. In fact, before I saw your comment, I was about to post it myself =)! – Harry Gindi Jul 17 '10 at 13:45
  • 13
    To come a bit late to the party --Joel, it seems to me that the problem with 'signed volume' is that there is no such thing; if I ask you for the signed volume of the unit cube, you have no way to answer. Instead, you must know the transformation that I applied to get that unit cube; and if the notion is intrinsic to the transformation and not to the shape, then why pretend to attach it to the shape? Then the axiomatisation that you suggest is revealed as what it is --an axiomatisation, not of volume, but rather of how to attach numbers to transformations. – LSpice Apr 11 '11 at 13:20
  • 5
    The issue of multiplicativity, I think, is equally subtle; it relies on the fact that a linear transformation distorts all volumes by an equal factor. I can imagine this being disturbing for a reasonably sophisticated undergraduate; why shouldn't a 2D linear transformation distort more an ellipse with major axis parallel to an eigenvector of large eigenvalue than an ellipse with major axis parallel to an eigenvector of small eigenvalue? – LSpice Apr 11 '11 at 13:22
  • 8
    L Spice, regarding signed volume, it is associated of course to the way the shape is described, for example by the order in which the coordinates are enumerated. The issue of your second comment is exactly what would be discussed when you define the determinant the way I suggest. By tiling the ellipse with smalll squares which are transformed into parrallelograms, for example, students thereby gain this important insight about volume transformation, which is totally lacking in the determinant-as-ugly-formula account. Such kind of knowledge and teaching is exactly what I am advocating. – Joel David Hamkins Apr 29 '11 at 01:13
  • 52
    This reminds me of a story recounted by a friend of mine in graduate school. He spent a lot of time in the department, and one evening was approached by an undergraduate taking a fancy class that had introduced the trace of a linear transformation in the slick coordinate-free manner. This undergraduate had been tasked with computing the trace of a certain $2\times 2$ matrix and had no idea how to proceed. – Ramsey Apr 25 '12 at 04:58
  • 9
    @L Spice: "if I ask you for the signed volume of the unit cube, you have no way to answer" because signed volume is not a property of parallelepipeds; it is a property of parallelepipeds with ordered sides. – Vectornaut Apr 25 '12 at 05:52
  • 1
    @Victor Protsak: Yes, volumes are positive... but volume functions, like signed volume functions, are unique up to scaling, and every volume function is the absolute value of a signed volume function. (Of course, you have to be working over an absolute value field for this to make sense.) – Vectornaut Apr 25 '12 at 05:55
  • 1
    Indeed the sum over permutations is not really a definition but rather the solution to a problem which can be formulated in a more conceptual way and which essentially has a unique solution. In this case the (historical) conceptual definition of the determinant is as the resultant of n linear homogeneous equations in n variables. This generalizes to nonlinear equations, but then who knows what the analogue of the sum over permutations is... – Abdelmalek Abdesselam Apr 25 '12 at 14:45
  • 5
    I once taught Analysis (2 Semesters five hours per week) to students who did not have Linear Algebra parallel to it. When doing integration in $\mathbb R^n$ I introduced $|det|$ as the volume of .... and derived all properties needed for the transformation formula of integrals. It worked well. – Peter Michor Dec 29 '12 at 22:08
  • 3
    FWIW, my undergrad linear algebra course introduced the determinant exactly as JDH wants: define it as a signed volume, and then deduce how it is affected by row operations. It went fine. – Sam Clearman Apr 21 '15 at 05:03
  • 1
    This has been a popular topic on math.stackexchange: http://math.stackexchange.com/questions/668/whats-an-intuitive-way-to-think-about-the-determinant – Amritanshu Prasad Apr 21 '15 at 06:35
  • 17
    I would like to add that the definition via the transformation of the volume carries another problem: While one can probably work out most problems (signs etc.) over $\mathbb{R}$ the determinant is defined for linear transformations of vector spaces over arbitrary fields and there is no obvious way to define a unit-cube and its volume in (e.g.) $\mathbb{F}_2^n$. – Sebastian Schoennenbeck Apr 21 '15 at 07:51
  • People should know that the determinant is the (oriented) volume of the image of the unit cube, but how does one explain why the determinant should be defined in the same way when the scalars are members of a finite field? Does some idea of oriented volume of the image of the unit cube work in that case? – Michael Hardy May 17 '15 at 23:21
  • 4
    I know this is a very old post, but could you please recommend ressources that teach LA the way you describe it, with geometric justifications? – jeremy radcliff Jun 01 '15 at 16:50
  • While some books and teachers do determinants this way, others approach determinants either as a natural formula to determine singularity, or else geometrically. It depends on the text. – Jim Hefferon Jul 05 '16 at 19:45
  • This was also pointed out by V. I. Arnold. While I was a PhD student, I actually did this test on my colleagues. Surprisingly, many of my fellow students failed to tell/understand, why the definition of determinant is in that way - a messy combination of easily forgettable coefficients, $\pm$, and all that. :( – RSG Jul 06 '16 at 01:23
  • 1
    The right definition is that det(A) is the volume of the image of the unit cube after applying the transformation determined by A Thanks for this! I cannot tell you how many times I've heard "the determinant is the signed volume of a parallelepiped ..." without really thinking about what that means. – Greg Nisbet Jul 06 '16 at 05:17
  • I don't understand that definition though. I am just learning linear algebra, actually understanding this definition would surely help me. – Tomáš Zato Jan 26 '17 at 09:54
  • I am, indeed, one of those people who studied linear algebra in the more "standard" way, the way in which it is usually taught today. I am thus wondering if one could suggest a more "creative" linear algebra book, where dynamical approaches are used towards the basic and more advanced concepts. I would like to read such a book book to understand linear algebra deeper. – sequence Oct 21 '19 at 17:12
  • @MikeShulman: but how do you check that $\bigwedge^n V \neq 0$ if $n = \dim V$? It is clearly generated by $e_1 \wedge \ldots \wedge e_n$ for a choice of basis, but how do we know this element is nonzero? I would use the determinant for that (or something that basically boils down to that, e.g. the antisymmetriser on $V^{\otimes n}$ if $\operatorname{char} k = 0$ or $\operatorname{char} k > n$)... – R. van Dobben de Bruyn May 15 '20 at 01:59
  • I personally like the definition with permutations. It is very suitable for the proof of Lindstrom - Gessel - Viennot lemma, for example - and how to do this with volume? – Fedor Petrov Dec 09 '20 at 22:41
  • 1
    Clearly, the correct definition is that the determinant is any antisymmetric multilinear functional $(\mathbb{R}^{d})^d \equiv \mathbb{R}^{d\times d} \rightarrow \mathbb{R}$ that assigns the value 1 to the canonical basis $e_1, e_2, \dots, e_d$. Obviously, the uniqueness is a trivial exercise left for the reader, so this need not be specified in the definition. The canonical basis is also obvious so this too need not be specified. /s – Lars Jul 20 '21 at 19:29
171

Here's another algebra peeve of mine. The definition of a normal subgroup in terms of conjugation is pretty strange until it's explained that normal subgroups are the ones you can quotient by. Again, in my opinion I think normal subgroups should be introduced as kernels of homomorphisms from the get-go.

Qiaochu Yuan
  • 114,941
  • 70
    Many textbooks define normal subgroups before even talking about homomorphisms, which is totally bonkers in my opinion. – Steven Gubkin Dec 04 '09 at 19:44
  • 61
    Right. Unlike normal subgroups, homomorphisms are totally intuitive: they turn true equations into other true equations. That's something students have been doing their whole lives. – Qiaochu Yuan Dec 04 '09 at 21:30
  • 42
    I totally agree with this and always tell students to think of "kernel of some homomorphism" as the definition and "closed under conjugation by any element of G" as a fact that can be shown to be equivalent to it. – gowers Dec 05 '09 at 22:35
  • 44
    I agree that as soon as define "normal subgroup" you should prove that they are exactly the kernels of homomorphisms, but in some situations (e.g., algebraic groups) its hard to show that normal subgroups are kernels and in other situations (e.g., group schemes) they aren't. – JS Milne Dec 09 '09 at 06:21
  • 15
    Let us also remember that homomorphisms gained a foothold more than a century after normal subgroups. You need the idea of an abstract group in order for the quotients and homomorphism theorems to make sense (which is also the metamathematical reason behind the difficulties mentioned by JS Milne). – Victor Protsak May 28 '10 at 01:27
  • 6
    I don't entirely agree with the sentiment of this answer. See my remarks here for a different view-point: http://mathoverflow.net/questions/13089/why-do-so-many-textbooks-have-so-much-technical-detail-and-so-little-enlightenmen/15629#15629 – Emerton Sep 16 '10 at 22:23
  • 1
    In algebra class I attended we started with monoids and Cayley's embedding theorem. When we moved onto groups, it thus seemed very natural to consider left regular group action and investigate the case when the equivalence classes actually form a group. – Vít Tuček Apr 04 '11 at 11:53
  • 5
    The same thing is true of ideals. – Mikola May 20 '11 at 22:22
  • 4
    And then how would you prove the commutator subgroup is normal for students who learn about that subgroup for the first time? Or prove the cosets by a normal subgroup form a group? As Milne would say, introduce multiple points of view at the start rather than worry that one is "the" definition. – KConrad May 18 '15 at 00:37
  • 3
    The examples of horrible formulas pinpoint a basic cause of obscuring definitions: they often are algorithmic or, put differently, constructive. So the question is, whether a definition should say what things are or whether they should say how to get them. – Manfred Weis May 18 '15 at 03:12
  • 1
    I think of conjugation as a “change of perspective” – just as conjugation of matrices is looking at a linear map from a different base. Then it’s actually not that unnatural to look at subgroups which are invariant under any change of perspective. It also gives some intuiton for why they might describe global symmetries. For that I kind of like the definition of normal subgroups as being stable under conjugation. The problem, of course, is that students first need to build up such intuiton first. (Also, it doesn’t help much with ideals.) – k.stm May 05 '16 at 08:00
  • It's worth noting that for semigroups the "normal subgroups are kernels of homomorphisms" isn't true because the Kernel isn't even defined without a unit. In that case you don't bother defining normal subsemigroups at all and just define semigroup congruences directly.

    Now, if you specialize to groups you can view normal subgroups as those congruent to the unit under some congruence

    – saolof Oct 10 '21 at 13:09
120

In my experience, introductory algebra courses never bother to clarify the difference between the direct sum and the direct product. They're the same for a finite collection of abelian groups, which in my opinion gets confusing.

Of course, they're quite different for infinite collections. I think students should be taught sooner rather than later that the first is the coproduct and the second is the product in $\text{Ab}$. This clarifies the constructions for non-abelian groups as well, since the direct product remains a product in $\text{Grp}$ but the coproduct is very different!

Qiaochu Yuan
  • 114,941
  • 1
    I do remember having to explain the difference to many confused people. It is quite confusing to see lecturers use the two interchangeably without mention. – Sam Derbyshire Dec 02 '09 at 17:29
  • 11
    I'll vouch for this as a guinea pig: I did learn them as separate concepts my freshman year, and learned to be careful even in the case of two spaces. That gave me a good intuition about when to mistrust finite-case defined products (e.g. the box topology, above) because they "could be defined differently" in the infinite case. I didn't need to know the categorial definitions that early on. – Elizabeth S. Q. Goodman Dec 04 '09 at 05:30
  • 32
    even in the finite case, products and coproducts differ because they are NOT only the objects, but come together with structure morphisms (as every universal object). – Martin Brandenburg May 24 '10 at 21:45
  • 2
    Indeed I think one has to grasp the distinction between direct sum and direct product to truly appreciate their isomorphy in most cases. – darij grinberg May 28 '10 at 10:05
  • 1
    Not to mention that in $Grp$ the direct sum and the coproduct are also two different things. (For finitely many summands, the direct sum is again the same as the direct products; for infinitely many summands, the direct sum is neither the product nor the coproduct, although it still has a more colimitish flavour.) – Toby Bartels Apr 04 '11 at 04:22
85

I increasingly abhor the introduction of the finite ring $Z_n$ not as $\mathbb{Z}/n\mathbb{Z}$ but as the set $\{0,\ldots,n-1\}$ with "clock arithmetic". (I understand that if you want to introduce modular arithmetic at the high school level or below, this is the way to go. I am talking about undergraduate abstract algebra textbooks that introduce the concept in this way.)

Two problems:

  1. Using clocks to motivate addition modulo $n$: excellent pedagogy. Be sure to mention military time, which goes from $0$ to $23$ instead of $1$ to $12$ twice. But...using clocks to motivate multiplication modulo $n$: WTF? Time squared?? Mod $24$??? It's the worst kind of pedagogy: something that sounds like it should make sense but actually doesn't.

    Of course soon enough you stop clowning around and explain that you just want to add/subtract/multiply the numbers and take the remainder mod $n$. This brings me to:

  2. Many texts define $Z_n$ as the set $\{0,\ldots,n-1\}$ and endow it with addition and multiplication by taking the remainder mod $n$. Then they say that this gives a ring. Now why is that? For instance, why are addition and multiplication associative operations? If you think about this for a little while, you will find that all explanations must pass through the fact that $\mathbb{Z}$ is a ring under the usual addition and multiplication and the operations on $Z_n$ are induced from those on $\mathbb{Z}$ by passing to the quotient. You don't, of course, have to use these exact words, but I do not see how you can avoid using these concepts. Thus you should be peddling the homomorphism concept from the very beginning.

    As a corollary, I'm saying: the concept of a finite ring $Z_n$ for some generic $n$ is more logically complex than that of the one infinite ring $\mathbb{Z}$ (that rules them all?). A lot of people seem, implicitly, to think that the opposite is true.

Pete L. Clark
  • 64,763
  • 39
    While I agree with the premise here, you can motivate multiplication mod n by performing $a$ tasks which each require $b$ hours...what time will it be at the end? Of course, this is really doing (integer) * (residue) and not (residue)*(residue), but then you have them observe that if you do your task $n$ times, $b$ is irrelevant, and remarkably, all that matters is how many times you perform the task mod n!! – Cam McLeman May 27 '10 at 16:52
  • But if they already understand thinking about it in terms of equivalence, you should probably be moving to talking about it that way, I would think. – Harry Altman May 27 '10 at 21:44
  • 33
    Also, it's probably worth noting that strictly speaking addition on a clock doesn't make much sense either; an actual clock is not the group $Z_n$, but rather the action of that group on itself. – Harry Altman May 27 '10 at 22:09
  • 19
    @Harry I'm glad someone said it! Times form an affine space. You don't add times. "3 o'clock plus 4 o'clock" means nothing. The thing you add are time intervals. Time intervals are measured by stopwatches. Stopwatches with hands generally don't wrap at 12 or 24 hours. – Dan Piponi Jul 15 '10 at 22:34
  • 18
    A torsor, in other words, right? – yatima2975 Aug 18 '10 at 15:47
  • 24
    I have seen students who must have been exposed to the introduction to $Z_n$ that Pete warns of and who think that they are specifying a $\mathbb{Z}$-module homomorphism ${0,1,2}\rightarrow {0,1,2,3,4,5}$ by setting $i\mapsto i$. This to me is the ultimate reason to avoid introducing $Z_n$ as the set ${0,\ldots,n-1}$. – Alex B. Oct 23 '10 at 16:07
  • @ Yatima: Yes, although a less ambiguous term (at least once the context is narrowed down to algebra) is ‘heap’. – Toby Bartels Apr 04 '11 at 04:30
  • A small, somewhat related remark: I consider it a similar problem when children are taught addition of fractions using slices of pizza. This way of conceptualizing rational numbers does not lend itself well to multiplication... – Benjamin Dickman Dec 31 '13 at 14:30
  • 2
    @Cam: A very belated reply: what you suggest is very good...You are bailing out of directly defining the multiplication operation and instead defining the $\mathbb{Z}$-module structure (i.e., you are using the one ring). Once they understand that, of course the next thing you say is that -- good news! -- the quantity $(a \in \mathbb{Z}) \cdot (b \in \mathbb{Z}/n\mathbb{Z})$ depends only on the residue class of $a$ modulo $n$, so it gives a multiplication operation. This is fully consistent with what I was advocating and really not "clock multiplication" (which means nothing, I fear...). – Pete L. Clark Jul 11 '14 at 23:19
  • 2
    @Pete, I doubt that the ${0,\dots,n-1}$ definition should be used even at high school level. Personally I was introduced (along with many other students) to the $\mathbb{Z}/n\mathbb{Z}$ definition at the 7th grade, without saying such words as "rings" or "factorsets" of course. It was very simple and we could easily prove some basic properties, like that it is really a ring or a field for prime n. Can't imagine proving anything with another definition. – Anton Fetisov Jul 12 '14 at 21:35
  • 4
    It's completely maddening that the IEEE standard for the definition of "b mod n" gives a number in ${0,1,\ldots,n-1}$ only for $b\geq 0$. For $b<0$ it's supposed to take values in ${1-n,\ldots,0}$! So in order to have invariance under the finite group $\pm$ they give up the infinite group of translation invariance by $n\mathbb Z$. Just horrible. – Allen Knutson Apr 21 '15 at 19:58
  • @Allen: You've missed the point: what they really want is $n = d \cdot \operatorname{quo}(n,d) + \operatorname{rem}(n,d)$. The horrible choice of remainder follows directly from choosing to round towards zero when producing the integer quotient. –  May 18 '15 at 15:26
  • 1
    Another problem with defining $\mathbf{Z}n = { 0, \ldots, n-1 }$ is that I've seen people have trouble with the idea that, for example, $13$ names an element of $\mathbf{Z}{10}$. –  May 18 '15 at 15:31
  • 1
    Okay, I'll agree that both IEEE definitions suck, without opining as to which one sucked first. – Allen Knutson May 19 '15 at 11:52
  • \begin{gather} \overline{a}+(\overline{b}+\overline{c})= \overline{a}+\overline{(b+c)} = \overline{a+(b+c)}= \ \overline{(a+b)+c}= \overline{(a+b)}+\overline{c})=(\overline{a}+ \overline{b})+\overline{c} \end{gather} – Alexey Ustinov May 27 '15 at 06:00
  • 1
    @Alexey: Yes, you are using that $a \mapsto \overline{a}$ is a group homomorphism and that addition in $\mathbb{Z}$ is associative. – Pete L. Clark May 27 '15 at 17:42
  • @Pete L. Clark No, I use only definition $\bar a+\bar b:=\overline{a+b}$ and associativity in $\mathbb{Z}$. – Alexey Ustinov May 29 '15 at 04:13
  • 1
    The most annoying thing about the notation $\mathbb{Z}_p$ is that it really means something else, the ring of $p$-adic integers. Anyone who uses that in place of $\mathbb{Z}/p\mathbb{Z}$ is a truly evil person. – Vladimir Dotsenko Jan 17 '18 at 21:57
  • 2
    I suspect that only number theorists use the convention that ${\bf Z}_p$ is the ring of p-adic integers (and I have seen it used for either the localization or its completion), and they are certainly the most vocal about it. – David Handelman Jan 18 '18 at 00:42
  • 1
    @DavidHandelman I am almost sure that the two different notations for those that you saw were $\mathbb{Z}p$ and $\mathbb{Z}{(p)}$. Then again, I am no number theorist, I just don't want to say $\mathbb{Z}_p$ when what one means is $\mathbb{Z}/p\mathbb{Z}$... – Vladimir Dotsenko Jan 19 '18 at 14:08
  • @AlexeyUstinov, the definition $\overline a + \overline b = \overline{a + b}$ is not a definition until one shows that it is independent of the choice of representative, and that uses the fact that addition in $\mathbb Z$ is associative. (You are also using assoc. to write $\overline{a + (b + c)} = \overline{(a + b) + c}$.) – LSpice Dec 09 '20 at 23:02
78

A simple example is the two definitions for independence of events:

  1. A and B are independent iff $P(A\cap B) = P(A)P(B)$
  2. A is independent from B iff $P(A\mid B) = P(A)$

Some presentations start with Definition 1, which is entirely uninformative: nothing in it explains why on earth we bother discussing this. In contrast, Definition 2 says exactly what "independent" means: knowing that B has occured does not change the probability that A occurs as well.

A reasonable introduction to the subject should start with Definition 2; then observe there is an issue when P(B)=0, and resolve it; then observe independence is symmetric; then derive Definition 1.

  • 6
    Resolve it how, short of a serious digression into conditional expectation? – Jeff Hussmann May 27 '10 at 20:02
  • 24
    One cannot "resolve" this. If B has probability zero, any conditional probability is admissible. I think a natural approach would be showing that 2. implies 1. when P(B)>0 and then abstract and generalize to 1. The second definition is simply not workable. – Michael Greinecker Jul 15 '10 at 20:33
  • 1
    My understanding is that one can make sense of conditioning on a "probability-zero event" as long as one specifies how that event is to be approached as a limit of nonzero-probability events. – Mike Shulman Jul 16 '10 at 03:18
  • 4
    @Mike: That works, but you can only do it while forgetting about sigma-algebras in nice enough spaces. See S. M. Samuels, The Radon-Nikodym Theorem as a Theorem in Probability. http://jstor.org/pss/2321055 I think "nice enough" is "Borel" but I can't verify that at this computer. – Neil Toronto Apr 11 '11 at 15:38
  • 1
    Continuing @Mike: I think Michael Greinecker's approach is the one I'd take for non-math majors. For math majors and suitably motivated students from other areas, I'd love to teach conditioning on zero-probability events as a limit, then generalize to abstract measurable spaces. – Neil Toronto Apr 11 '11 at 15:39
  • 2
    Alternatively, if you take 2 to be the definition; you must conclude that impossible events are not independent. This actually makes sense, because it is really the same thing as the catastrophe that any false statement implies everything else. So if an impossible thing happened, then it would naturally follow that everything else would happen and so they are not logically (or probabilistically) independent. – Mikola May 21 '11 at 01:05
  • 51
    @MichaelGreinecker, irrespective of its mathematical correctness, the second definition is the right way of motivating the whole thing. We mathematicians need to stop trying to present everything perfect the first-time around. Give the reader a working definition, discuss its problems, and use that to motivate the "proper" definition. That is how you engage the reader. – goblin GONE Jul 12 '14 at 09:01
  • 13
    I'm going to have to object to this; the first definition is rather natural too. If X and Y are independent random variables, that means that the random variable $(X,Y)$ behaves in the obvious way; the possible outcomes are given simply by pairing an outcome for $X$ (weighted by the probability that $X=x$) with an outcome for $Y$ (weighted by the probability that $Y=y$). –  May 18 '15 at 14:50
  • 3
    @Mikola, but thinking of a probability-$0$ event as impossible is a very bad idea for many probability spaces (like the unit interval)! – LSpice Jan 17 '18 at 23:55
74

One that I particularly dislike is the definition of an action of a group G on a set X as being a function $f:G\times X\rightarrow X$ that satisfies certain properties. I cannot understand why anybody gives this definition when "homomorphism from G to the group of permutations of X" is not only easier to understand but is also how one thinks about group actions later.

gowers
  • 28,729
  • 99
    Why? They are really different forms of the same conjecture, and so might as well be given together. In some situations only the $G\times X\to X$ definition makes sense, for example, for an algebraic group G acting on a variety X (the automorphism group of X isn't an algebraic group). In other situations it's easier: when $G$ and $X$ have topologies, it's easier to say that $G\times X\to X$ is continuous than to first define a topology on Aut(X). – JS Milne Dec 09 '09 at 06:29
  • 35
    @gowers: Interesting, I think of the "$f$" version as the more natural of the two. What's an action? You take a $g\in G$ and an $x\in X$, and you get a new $x'\in X$. That's precisely encoded by f. Taking in a $g\in G$ and outputting "a function which sends $x$'s to $x'$'s" seems to me to obfuscate the matter. – Cam McLeman May 27 '10 at 16:41
  • 4
    For example, when you think of representations of $S_3$, say, do you think about the homomorphism of $S_3$ to $GL_2(\mathbb{R})$, or the various ways that permutation elements can move around points in $\mathbb{R}^2$? – Cam McLeman May 27 '10 at 16:43
  • 6
    Also the definition of a torsor via $G \times X \rightarrow X \times X$ needs the version you dislike... – Peter Arndt May 27 '10 at 19:28
  • 3
    @JS Milne: Moreover, in representation theory, where $X$ is a possibly infinite-dimensional topological vector space and the action is linear, there are several inequivalent topologies on $Aut(X)$, with perhaps the most familiar one being uniform (norm topology when $X$ is a Banach space). But the action is a (topological) representation if it is continuous in the $\mathit{strong}$ topology, i.e. $G\times X\to X$ is continuous. – Victor Protsak May 28 '10 at 01:50
  • 4
    I suspect the main reason for not using $G \to Aut(X)$ as the formalism for giving the definition of an action is that when $X$ has extra structure, giving the appropriate definition of $Aut(X)$ can require either a fair bit of maturity or background information. Say when $X$ is a smooth manifold, for example. – Ryan Budney Jun 19 '10 at 18:17
  • I agree very much with this. I also wonder why people define a module as an abelian group $M$ with a map $R \times M \to M$. I actually once got completely lost in a course on modules (at Canada/USA Mathcamp) when the teacher defined modules this way, rather than just saying that modules are things where you can multiply by scalars in a ring and which satisfy certain properties. – David Corwin Jul 15 '10 at 19:18
  • 29
    @ David: What's the difference between ‘you can multiply by scalars in a ring’ and ‘with a map $R \times M \to M$’? – Toby Bartels Apr 04 '11 at 04:33
  • 13
    One place that you must think about it this way is in the theory of Poisson actions. There, $X$ is a Poisson manifold, and you could consider the group $Aut(X)$ of ichthyomorphisms of it, but unless $G$ has a trivial Poisson structure the action of $G$ on $X$ is not by ichthyomorphisms. That is, each $g\in G$ does not preserve $X$'s Poisson structure. This is reflected in the fact that $g\in G$ is usually not a Poisson submanifold. All that one has is the map $G\times X \to X$. – Allen Knutson Mar 15 '12 at 13:57
  • 12
    @AllenKnutson I like "ichthyomorphism"; is it your invention? Google finds only a few occurrences of this word, and the only one that looks mathematical is this MO page. – Andreas Blass Apr 21 '15 at 14:55
  • 5
    I believe this terminology is due to Souriau. – Allen Knutson Apr 21 '15 at 19:49
  • 4
    "ichthyomorphism" : hilarious on the one hand but sadly obfuscating on the other.. – Jérôme JEAN-CHARLES May 17 '16 at 23:58
59

I normally won't bother with a 5 month old community wiki, but someone else bumped it and I couldn't help but notice that the significant majority of the examples are highly algebraic. I wouldn't want the casual reader to go away with the impression that everything is defined correctly all the time in analysis and geometry, so here we go...

1) "A smooth structure on a manifold is an equivalence class of atlases..." Aside from the fact that one hardly ever works directly with an explicit example of an atlas (apart from important counter-examples like stereographic projections on spheres and homogeneous coordinates on projective space), this point of view seems to obscure two important features of a smooth structure. First, the real point of a smooth structure is to produce a notion of smooth functions, and the definition should reflect that focus. With the atlas definition, one has to prove that a function which is smooth in one atlas is also smooth in any equivalent atlas (not exactly difficult, but still an irritating and largely irrelevant chore). Second, it should be clear from the definition that smoothness is really a local condition (the fact that there are global obstructions to every point being "smooth" point is of course interesting, but also not the point). The solution to both problems is to invoke some version of the locally ringed space formalism from the get-go. Yes, it takes some work on the part of the instructor and the students, but I and a number of my peers are living proof that geometry can be taught that way to second year undergraduates. If you still don't believe there are any benefits, try the following exercise. Sit down and write out a complete proof that the quotient of a manifold by a free and properly discontinuous group action has a canonical smooth structure using (a) the maximal atlas definition and (b) the locally ringed space definition.

2) "A tangent vector on a manifold is a point derivation..." While there are absolutely a lot of advantages to having this point of view around (not the least of which is that it is a better definition in algebraic geometry), I believe that this is misleading as a definition. Indeed, the key property that a good definition should have in my opinion is an emphasis on the close relationship between tangent vectors and smooth curves. Note that such a definition is bound to involve equivalence classes of smooth curves having the same derivative at a given point, and the notion of the derivative of a smooth curve is defined by composing with a smooth function. So for those who really like point derivations, they aren't far behind. There just needs to be some mention of curves, which in many ways are really what give differential geometry its unique flavor.

3) The notion of amenability in geometric group theory particularly lends itself to misleading definitions. I think there are two reasons. The first is that modulo some mild exaggeration basically every property shared by all amenable groups is equivalent to the definition. The second is that amenability comes up in so many different contexts that it is probably impossible to say there is one and only one "right" definition. Every definition is useful for some purposes and not useful for others. For example the definition involving left invariant means is probably most useful to geometric group theorists while the definition involving the topological properties of the regular representation in the dual is probably more relevant to representation theorists. All that being said, I think I can confidently say that there are "wrong" definitions. For example, I spent about a year of my life thinking that the right definition of amenability for a group is that its reduced group C* algebra and its full group C* algebra are the same.

4) Some functional analysis books have really bad definitions of weak topologies, involving specifying certain bases of open sets. This point of view can be useful for proving certain lemmas and working with some examples, but given the plethora of weak topologies in analysis these books should really give an abstract definition of weak topologies relative to any given family of functions and from then on specify the topology by specifying the relevant family of functions.

I'm sure I could go on and on, but these four have proven to be particularly difficult and frustrating for me.

Paul Siegel
  • 28,772
  • 17
    I REALLY want to read a differential geometry course based on locally ringed spaces. Do you have one? – darij grinberg May 27 '10 at 16:37
  • 13
    Sheaves on Manifolds by Kashiwara is the only book length treatment of differential geometry from this point of view that I know, but it is far from an introductory text. The course I referred to in my answer was taught by Brian Conrad several years ago, and he still has lots of useful handouts on his web page from that course. Other than that, I can't help you. :( – Paul Siegel May 27 '10 at 17:43
  • 2
    I think the one geometric group theorist I've talked about this with considered the existence of a Følner sequence to be the right definition... – Harry Altman May 27 '10 at 21:48
  • On the other hand, I think the atlas definition is more amenable to the intuition is that a manifold is a space which looks locally like $\mathbb{R}^n$. I agree that the sheaf definition might be nicer once one has fully developed this intuition, but I'm not entirely sure it's best to begin with.

    However, I see the argument that what we even mean by "looking locally like $\mathbb{R}^n$" is that locally we can define the same functions on it as differentiable functions in a neighborhood of $\mathbb{R}^n$. I realized this difficulty when explaining Riemann surfaces to those who didn't know CA.

    – David Corwin Jul 15 '10 at 19:29
  • 1
    Related to manifolds, I read once Arnold's On teaching Mathematics (http://pauli.uni-muenster.de/~munsteg/arnold.html). There he suggests that, after Whitney's Theorem, the good and intuitive definition for a manifold would be that of a submanifold of $R^N$.

    Surely, his speech a little exhagerated, but there one can find good examples of misleading definitions.Probably, it is true that Bourbaki lead us to give bad and unintuitively definitions in Mathematics.

    @Harry Altman: I think that the definition via Følner sequences is a very good one only when dealing with discrete groups.

    – Just a mathematician Jul 15 '10 at 20:21
  • Regarding amenability: I would not be so cavalier, myself. (How about a 'definition" in terms of vanishing Ext, or "when are invariant subsapces reducing", a la Helemskii?) I don't think the problem is misleading definitions: I think the problem - but also the worthwhileness, somehow - is the different facets of the concept, which depending on the problem at hand can be phrased in different ways – Yemon Choi Jul 15 '10 at 21:43
  • 5
    Also, this should really be four separate answers, for the purposes of this kind of community-wiki big list question – Yemon Choi Jul 16 '10 at 05:19
  • 1
    I amenability should be defined by either F$'o$lner sequences or invariant means. As an operator algebraist, I like the C$^{∗}$ and W$^{∗}$` variants as well, but more recall these as really useful facts to remember. I like the F$'o$lner sequence definition, its the most intuitive and hands on (how do you find an invariant mean? usually its not constructive), and for many approximation and dynamical purposes (entropy) it's nicest to use. The nice thing about the almost invariant vectors/pdf definition is it has may generalizations the others don't (haggerup/weakly amenable) – Benjamin Hayes May 08 '11 at 08:26
  • 10
    The definition of tangent spaces via curves has one, very substantial, disadvantage: it is not clear that the so-defined tangent space is a vector space. You can define an addition in charts and show that it is well-defined, but that looks, unfortunately, not very natural. – Johannes Ebert Apr 25 '12 at 08:39
  • 1
    For addition of curves modulo the-derivative-at-zero-of-the-composite-with-any-smooth-function-is-the-same, you define the relation u + v = w to mean that this equation holds when you compose with any smooth function and take the derivative at zero. Then you have to prove that sums are unique when they exist (easy) and that sums always exist (harder, needs passage to charts, fails in generalizations like Frölicher spaces). (Also, scalar multiplication, but that's easy.) So the charts are there, but at least they're in the proof rather than in the definition. – Toby Bartels Feb 12 '14 at 22:59
  • 1
    Sheaves on Manifolds is by Kashiwara and Schapira. – Geordie Williamson Feb 24 '16 at 17:42
55

One of my biggest annoyances is professors or books which fail to adequately distinguish between prime and irreducible elements of a ring, Herstein if I remember correctly being a (ha ha) prime example of this. The fact that these are the same in Z, where people first learn about unique factorization, doesn't help matters.

Zev Chonoles
  • 6,722
51

My biggest issue is with the coordinate-definition of tensor products. A physicist defines a rank $k$ tensor over a vector space $V$ of dimension $n$ to be an array of $n^k$ scalars associated to each basis of $V$ which satisfy certain transformation rules; in particular, if we know the array for a given basis, we can automatically determine it for a different basis. Another way to say this is that the space of tensors is the set of pairs consisting of a basis and a $n^k$ array of scalars, identified by an equivalence relation which gives the coordinate transformation law. For some strange reason, people seem to call this a coordinate-free definition. While it is in a sense coordinate-free (the transformation between coordinates lets you break free of coordinates in a sense), it is very confusing at first sight. People who use this definition will they say that certain operations are coordinate-free. What they mean by this, and it took me a long time to figure this out, is that you can do a certain algebraic operation to the coordinates of the tensor, and the formula is the same no matter which basis you work with (e.g., multiplying a covariant rank $1$ tensor with a contravariant rank $1$ tensor to get a scalar, or exterior differentiation of differential forms, or multiplying two vectors to get a rank $2$ tensor).

The much nicer definition uses tensor products. This is a coordinate-free construction, as opposed to the coordinate-full description given above. This definition is nice because it connects to multilinear maps (in particular, it has a nice universal property). It also helped me see why tensors are different from elements of some $n^k$-dimensional vector space over the same field (they are special because we are equipped not just with a vector space but with a multilinear map from $V \times \cdots \times V \to V \otimes \cdots \otimes V$. The covariant/contravariant distinction can be explained in terms of functionals. This allows you to talk about contraction of tensors without worrying having to prove that it is coordinate-invariant! Finally, once you have all that under your best, you can easily derive the coordinate transformation laws from the multilinearity of $\otimes$.

David Corwin
  • 15,078
  • 45
    When I look at physics texts on tensors-even mathematically literate and careful texts like Frankel-I wonder how ANYONE understands the monstrousity they present tensors as.As formulas that transform by indecies that raise and lower by certain rules.No wonder only geniuses understood relativity theory before mathematicains began cleaning it up. – The Mathemagician Jul 15 '10 at 22:13
  • 3
    I'm now satisfied to know that my physics TA also agrees with me about physicists' approach to tensors. – David Corwin Dec 19 '10 at 05:43
  • 20
    I think this is outdated. Many physicists learn about tensors in a course on General Relativity and one of the standard textbooks is Wald, "General Relativity". It defines tensors in terms of multilinear maps, not as a collection of scalars obeying certain transformation rules. The same is true of Carroll, "Spacetime and Geometry." Most theoretical physicists in this day and age understand this view of tensors. – Jeff Harvey Dec 23 '10 at 15:41
  • 8
    @Jeff Harvey: When I was doing my bachelor's and master's in physics (in the '00s), I never got the impression that "most theoretical physicists in this day and age understand [the coordinate-free] view of tensors." Maybe it depends on subfield / institution? Certainly I met a lot of people who did understand the coordinate-free view, but I also met a lot of people who appeared not to. This often made life difficult for me, because I have trouble understanding the coordinate-full view, and I had a very hard time getting people to help me translate things into coordinate-free language. – Vectornaut Apr 25 '12 at 06:44
  • I've never heard this definition (the bad one) described as "coordinate-free"; I've only ever heard that term applied to the good one. – Toby Bartels Oct 06 '14 at 06:04
  • 8
    A justification for the bad definition is that physicists also sometimes deal with arrays of scalars that vary with the basis but according to some other transformation rule. So you can say: this quantity is a tensor, that one is not. An example is Christoffel symbols. Yes, these should be understood as the coordinates of a connection, but it took a while for that perspective to develop. And some people might still be thinking: who knows what transformation rules we might see next; we must remain flexible. – Toby Bartels Oct 06 '14 at 06:07
  • Gromov generalized the physicist's definition to define geometric structures on manifolds. So he seems to like their weird point of view. – Ben McKay Apr 21 '15 at 17:23
38

I'd say the standard definition of singular homology is pretty bad.

It's a historical relic in some sense -- topologists were so concerned by naturality, whether manifolds have combinatorially distinct triangulations and issues such as that, that they decided those preoccupations were more important than imparting a solid foundational intuition as to what a homology class is.

In my experience, people who see Poincare's proof of Poincare duality first vs. the people who see a singular homology exposition usually have a far better command of what is actually going on, to the point where they view Poincare duality is something light and natural, while most students that see it through the eyes of singular homology more often see it as something distant and intractible.

And all that effort is for what? So students can know Poincare duality is true on topological manifolds, when all the examples they've seen are smooth manifolds.

edit: my preferred way to describe Poincare's proof is to modernize it a tad. Your set-up is a triangulated manifold $M$, then you construct the dual polyhedral decomposition (a CW-decomposition) so that the (simplicial) $i$-cells of $M$ are in bijective correspondence with the (dual polyhedral) $m-i$-cells of $M$. This is much more straightforward than living in the simplicial world. Then you show that (up to a sign change) the chain complex for the simplicial homology is the chain complex for the cohomology of the dual polyhedral decomposition. The fussiest bit is keeping track of the orientations in the orientable case.

Ryan Budney
  • 43,013
  • 9
    I don't really understand, what you want to change concretely. What definition of homology do you prefer? In my opinion, your way of presenting Poincare duality is indeed more intuitive (so it is surely not wrong to give the students an idea of it), but has at least 3 disadvantages: 1) You first have to prove that smooth manifolds can be triangulated. 2) You have to show that the isomorphism does not depend on the choices (in some sense). 3) These ideas do not generalize well to other situations like more sophisticated dualities or the Thom iso (I think). – Lennart Meier May 27 '10 at 15:33
  • 7
    Another good intuitive proof of Poincare duality (in the sense of equality of Betti numbers) is via Morse theory: replace $f$ with $-f.$ – Victor Protsak May 28 '10 at 01:39
  • 6
    @Meier, Re (1) proving that manifolds have triangulations is at least as fundamental as any homology of fundamental group construction with manifolds so this seems totally natural to me. (2) depends on what applications you're interested in. After Poincare duality is set up properly there are many alternative formulations you can give it -- once there is a firm foundation in place. Re (3), the search for generality is essentially the complete opposite point of my post. To a student there's little point generalizing something for which there's little initial grasp. – Ryan Budney May 28 '10 at 22:53
  • 3
    @Victor, actually using the replace f by −f trick you see than on an oriented manifold the Morse complex is isomorphic to its dual and an orientation is required to construct the map. Depending on whether you want to carefully give the construction of the Morse complex or prove the existence of a triangulation both methods give a concrete picture of the dual cocycle but require a fair amount of geometric work. – Tom Mrowka Apr 04 '11 at 13:24
33

Another simple example is the definition for equivalence relations:

  1. R(.,.) is an equivalence relation iff R is reflexive, symmetric, and transitive.
  2. R(.,.) is an equivalence relation iff there exists a function f such that R(a,b) iff f(a)=f(b).

Most presentations start with Definition 1, which contains no hint as to why we bother discussing such relations or why we call them "equivalences". In contrast, Definition 2 (along with a couple of examples) immediately tells you that R captures one particular attribute of the elements of the domain; and, since elements with the same value for this attribute are called "equivalent", R is called an "equivalence".

A reasonable introduction should start with Definition 2, then go on to prove Definition 1 is a convenient alternative characterization.

  • 18
    I've never actually seen the second definition explicitly, although I've used it implicitly often enough. I don't completely see how that's a clearer exposition, though. – Cory Knapp Dec 06 '09 at 12:15
  • 10
    A function is a great way to capture the intuitive meaning of "some property that we want to be the same." The second definition also doesn't require introducing three new concepts. – Qiaochu Yuan Dec 06 '09 at 17:00
  • 3
    It's true, the second definition doesn't require introducing the concepts of a reflexive, symmetric, or transitive relation. But I think it is still useful to give Definition 1 eventually; after all, there are plenty of relations that are transitive (for example) without being equivalences. – Gabe Cunningham Dec 06 '09 at 22:58
  • 82
    When equivalence relations are introduced, it is usually shown that giving an equivalence relation on a set is the same as giving a partition of the set. This seems a little more natural than your (2). – JS Milne Dec 09 '09 at 06:37
  • I first saw this as Definition 2.1 in http://courses.csail.mit.edu/6.042/fall05/ln4.pdf . – Christos Dec 17 '09 at 06:26
  • @Christos: As previously commented there is the 3. about partitions of a set. There are cases where you better use def 1 to prove there is one. In fact we bother discussing many kinds of relations. Before introducing equivalence relations, you can take some time to define strict and inclusive order relations, which are antisymmetric. It prepares you to the fact that all these concepts are related. And is a good preparation for lattice theory. – ogerard May 29 '10 at 05:53
  • 30
    I always thought (1) was very nice and intuitive, after all, it says "an equivalence relation is a relation that behaves like =", which for undergrads is a nice introduction to the idea that one might care about other kinds of similarity than equality. – Ketil Tveiten Jun 02 '10 at 13:08
  • 2
    Although I like Definition 2, it's important to realise that it's weaker than Definition 1 in many categories, and it's Definition 1 that people mean when they talk about an ‘equivalence relation in’ an arbitrary category $C$. – Toby Bartels Apr 04 '11 at 04:38
  • 3
    As an example where Def 2 fails, equinumerosity of two sets is an equivalence relation. However, absent Axiom of Choice, two equinumerous sets do not have have the same "size" – David Harris Apr 04 '11 at 11:51
  • Ketil, I think that Definition 2 probably makes a much stronger case (for people who aren't used to abstracting down to axioms) than Definition 1 for why equivalence relations can be said to ‘behave like’ equality! – LSpice Apr 11 '11 at 14:07
  • Actually, I've always thought that 1 was the more fundamental definition, and that functions were specific ways to `categorify' equivalence relations in specific circumstances. – Mikola May 21 '11 at 01:11
  • 32
    Definition 2 has the downside that it isn't intrinsic; you have to specify a codomain for the function $f$, and then you have to decide how to define $f$. For example, consider the set of measurable functions on $[0,1]$, with $R(g,h)$ iff $g=h$ almost everywhere. I can't off the top of my head figure out how to define $f$, or even what its codomain should be (other than "the set of equivalence classes", which begs the question). – Nate Eldredge Apr 25 '12 at 15:59
  • @Nate: I'm not an analyst so I might be making a mistake, but if $L$ and $B$ are the sets of measurable functions and bounded measurable functions on $[0,1]$, then one could define $F: L \to {f: B \to \mathbb R}$ by defining $(F(g))(h) = \int_0^1 g(x)h(x) dx$. If I understand right then $F(g) = F(h)$ iff $g$ and $h$ only differ on a set of measure zero. Of course, this may or may not be natural, but I think in some contexts definition (2) can be more natural than (1). – Peter Samuelson May 21 '13 at 12:43
  • 4
    I have to agree with the comment of @Ketil, and take it even further. I've been growing to feel like setoid (a set equipped with an equivalence relation, a.k.a. preordered sets where the preorder relation is invertible) is a more natural notion than set, and particularly moreso than a set of equivalence classes. There are just way too many things that are most naturally worked with in the form "Here are names for the elements. Here are pairs of names we consider to be the same." –  May 18 '15 at 15:14
24

One often sees the cumulants of a probability distribution defined by saying the cumulant-generating function is the logarithm of the moment-generating function: $$ \sum_{n=1}^\infty \kappa_n \frac {t^n}{n!} = \log \sum_{n=0}^\infty \operatorname{E}(X^n) \frac{t^n}{n!} = \log\operatorname{E}\left( e^{tX} \right). $$ This fails to explain one of the basic motivations behind such a concept as the cumulants of a probability distribution.

The variance $\operatorname{var}(X) = \operatorname{E}\left( (X - \operatorname{E}(X))^2 \right)$ is simultaneously

  • $2$nd-degree homogenous: $\operatorname{var}(cX)=c^2\operatorname{var}(X)$;
  • translation-invariant: $\operatorname{var}(c+X) = \operatorname{var}(X)$;
  • cumulative: $\operatorname{var}(X_1+\cdots+X_n) = \operatorname{var}(X_1)+\cdots+\operatorname{var}(X_n)$ if $X_1,\ldots,X_n$ are independent.

The higher-degree central moments also enjoy the first two properties (with the appropriate degree of homogeneity in each case), but the third property fails for $4$th and higher-degree central moments. (That it works for the $3$rd-degree central moment has been known to surprise people. It's trivial to prove it.)

All of the cumulants have the three properties above (with the degree of homogeneity equal to the degree of the cumulant).

For example:

$$\text{4th cumulant} = \Big(\text{4th central moment}\Big) - 3 \cdot \Big( \text{variance}\Big)^2.$$

This is $4$th-degree homogeneous, translation-invariant, and cumulative.

Each cumulant above the $1$st degree is the unique polynomial in the central moments having those three properties and for which the coeffient of the $n$th-degree central moment in the $n$th cumulant is $1$.

Is this not a more intuitive and motivating characterization of the cumulants than is the "definition" that speaks of the logarithm of the moment-generating function?

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
24

A function is a collection of ordered pairs such that ...

Gerald Edgar
  • 40,238
  • 54
    Really? I think the alternate definition is much more misleading: a function is a rule... by which most students immediately think "algebraic formula." – Qiaochu Yuan Dec 05 '09 at 02:05
  • 1
    When I was teaching (in the UK) a first year course recently, I defined a function as a "rule" for assigning something from the domain to something in to codomain. I think tried to give some examples throughout the course to suggest that "rule" didn't mean "formula". The abstract definition is fine in a first course in set theory, say, but at the point when my students first need to think about functions, it's waaaaay to abstract! – Matthew Daws Dec 05 '09 at 15:46
  • 35
    I would almost prefer not even to say what a function is at all. I'd just say that if f is a function from A to B and x is an element of A then f(x) is an element of B. And that's all you need to know.

    Of course, I'm exaggerating a bit, and this point of view is not sufficient after a while (e.g. how would you decide whether the set of functions from A to B is countable, how would you define function spaces, etc.?) but in some situations this is the most important fact that you need from the basic definition of functions. Of course, one would also give examples, including artificial ones.

    – gowers Dec 05 '09 at 23:25
  • 1
    When you say "function" students think "computer program" (which is "constructible function" in a university logic course). I've always wondered, why not just start with those in high school? – Ilya Nikokoshev Dec 06 '09 at 01:02
  • 21
    The nice thing about the subset of $A \times B$ definition is that it's clear what it means for one function to be equal to another. If a function is a rule you have to specify what it means for one rule to be equal to another. Similarly, things like the union and intersection of functions do not immediately make sense. – Ryan Budney Jan 11 '10 at 06:54
  • 4
    The more I learn about mathematics, the more I am tempted to think that "a rule" is the best way to think about functions. The rule may be arbitrarily complicated, but all functions that we can talk about are basically such rules, and from the constructive viewpoint these are all functions we should ever care about. – darij grinberg May 27 '10 at 16:41
  • 2
    Even in a calculus course, the "rule" point of view does plenty of damage (e.g., with respect to implicit differentiation). – Cam McLeman May 27 '10 at 16:47
  • (Actually, I agree with darij's comment, but the sort of rule you'd get for getting $y$ as a function of $x$ when $x$ and $y$ are only implicitly related is not something calculus students would think of as a rule). – Cam McLeman May 27 '10 at 16:48
  • 67
    I think that the set-of-pairs definition is a neat formal trick, but not really how anyone intuitively thinks about a function (people use mental images of "rules of correspondence" or "machine that produce an output given an input", etc). I had a friend who disagreed and claimed that he truly thought of functions as sets of pairs. A few days later I heard him talking about the graph of a function and asked him "by the graph of a function you simply mean the function, right?". After that incident he agreed with me that nobody thinks of functions as sets of pairs. :) – Omar Antolín-Camarena May 27 '10 at 20:33
  • 6
    I agree that defining a function to be what we usually think of as its graph does not accord completely with intuition. I disagree that this makes it misleading or pedagogically unsound. I think it is rather a brilliant and useful construction, which can often be souped up to show that the set of morphisms between two structures $X$ and $Y$ is itself a structure (function space topology, manifolds, schemes...). A formal definition does not have to be completely intuitive, and getting used to non-intuitive definitions (e.g. of continuity via preimages of open sets) is part of the game. – Pete L. Clark May 27 '10 at 22:35
  • @Cam: I should have written "algorithm" rather than "rule", yes... – darij grinberg May 28 '10 at 10:03
  • 2
    @Omar: This definition by subset of the product of domain and codomain is not only a trick. This is a good basis for combinatorial thinking and fits nicely with the various ways of counting fundamental objects. – ogerard May 29 '10 at 06:03
  • 2
    Ever heard of the empty function (from the empty set to the empty set)? – Amritanshu Prasad Mar 08 '11 at 06:32
  • 10
    Regarding Tim Gowers's comment on not saying what a function is, you can take functions as a basic concept (along with sets) in the foundations of mathematics, in place of the element-hood relation. This is what Lawvere's ETCS does (although there are yet other differences between ETCS and ZFC than this). – Toby Bartels Apr 04 '11 at 06:18
  • 17
    Another problem with this definition is that it's wrong -- in modern mathematics (though less so in the informal language of some analysts, IME) a function has a codomain. Under this definition a function has an image, but any superset of the image could be its domain. As an undergraduate, I was given this definition several times, and it bothered me. A function is a triple $(A, B, R)$ where R is a subset of $A\times B$ such that... – Max Apr 04 '11 at 14:01
  • 4
    The $A$ is not needed, but it makes the presentation of inverse functions more symmetrical and allows one to define partial functions with fixed domains so that functions are partial functions. – Max Apr 04 '11 at 14:02
  • 3
    One pedagogical way is to introduce functions as rules and then show that one can always reduce to the rule "for a, pick the unique b such that $(a,b)\in F$" with F being a set of orderdered pairs such that... – Michael Greinecker Apr 12 '11 at 12:58
  • 1
    If we leave functions undefined as Tim Gowers suggested, then in order to do more advanced things, we only need (besides what he said: that if $x \in A$ and $f\colon A \to B$, then $f(x) \in B$) that $f = g$ iff $f(x) = g(x)$ for all $x \in A$ (if $f\colon A \to B$). Now you know the structure of a set of functions, can decide if it's countable, can put a topology on it, etc. – Toby Bartels Feb 12 '14 at 22:24
  • OK, you also need the axiom of unique choice in order to prove that a given construction defines a function. That does bring us most of the way back to sets of ordered pairs. Still, it is a perspective that works. – Toby Bartels Feb 12 '14 at 22:26
  • 8
    The reason this definition is essential is to deal with equality of functions. Are $f(x)=x-x$ and $g(x)=0$ the same function on $\mathbb{R}$? One of the key insights of 19th-century mathematics was this modern view that functions are their graphs. There are other formalizations of this idea, but you need to choose one: to clarify that the function is the graph, and not the rule you use to generate it. – Lior Silberman Apr 21 '15 at 06:29
20

Similar to gowers's answer about group actions, a module over a ring R is an abelian group M together with a function $f:R\times M \to M$ that satisfies certain properties. It may set the beginner's mind at ease to hear, "They're just like vector spaces except over arbitrary rings instead of only fields," which is misleading in itself but is a good mnemonic for remembering the definition. However, I usually find it more intuitive to think of a module over R as a homomorphism from R to the endomorphism ring of an abelian group, and with this definition no mnemonic is necessary.

Jonas Meyer
  • 7,279
  • 19
    I agree. It took me an incredibly long time to realize that vector spaces are fields acting on abelian groups. – Qiaochu Yuan Dec 06 '09 at 04:48
  • 23
    But describing modules as "vector spaces over a ring" most directly establishes the motivation of quite a bit of the work done in an introductory course on modules (more or less, try to see how much of the theory of vector spaces goes through). When someone first sees a module, the chances that looking at a morphism from a ring to an endomorphism algebra will sound natural are quite small. The point of view afforded by the "a module is a morphism" fits more naturally in the state of mind induced by representation theory (of groups, say), but I imagine very few people become familiar with... – Mariano Suárez-Álvarez May 27 '10 at 17:43
  • 5
    ...representation theory (of anything) soon enough that that can be used as motivation/context for modules and friends. – Mariano Suárez-Álvarez May 27 '10 at 17:44
  • Thanks Mariano, point taken. Although I now think in this way, most likely the "vector spaces over a ring" approach was appropriate for my first encounter, and I will try keep this in mind if/when I introduce students to modules. I do think that both approaches should be emphasized, even in a first course, but perhaps this is more to build intuition for later use than to provide an a priori intuitive viewpoint. – Jonas Meyer May 27 '10 at 20:23
  • 6
    I am a representation theorist and I have serious reservations about introducing modules as "homomorphisms into the endomorphism algebra of ....". For example, direct sum of modules is hardly natural in this setting and on an even more rudimentary level, addition of morphisms (i.e. the abelian group structure of a module) is hardly intuitive. More generally, geometrical perspective is irrevocably lost as soon as you adapt the "morphism" point of view (try defining a simple module=irreducible representation in the morphism setting). – Victor Protsak May 28 '10 at 02:07
  • 6
    Also, modules over commutative rings have some special properties which can't be captured easily (or at all) if you think of a module as a representation. Concerning "vector space over a ring" perspective: it's been a long time since I read van der Waerden's Algebra, but I believe, he was descriptively speaking of "groups with operators". – Victor Protsak May 28 '10 at 02:40
  • Victor, thank you. I suppose that of the several ways of thinking of modules, none is best for all situations, and this post reflects a particular bias, apparently one shared by a few readers. – Jonas Meyer May 29 '10 at 23:26
  • See my comment on gowers's answer, which essentially anticipates your post:

    "I agree very much with this. I also wonder why people define a module as an abelian group M with a map RMM . I actually once got completely lost in a course on modules (at Canada/USA Mathcamp) when the teacher defined modules this way, rather than just saying that modules are things where you can multiply by scalars in a ring and which satisfy certain properties."

    – David Corwin Jul 15 '10 at 19:22
  • 1
    Question: Let $M$ be an abelian topological group. Is there a natural topology on the topological endomorphism ring $\mathrm{End}(M)$ such that for all topological rings $R$, there is a bijection between continuous actions $R\times M\to M$ and continuous ring homomorphism $R\to\mathrm{End}(M)$? – Lior Silberman Apr 21 '15 at 06:32
19

I know that this comment will be somewhat controversial, but I strongly believe that the standard (algebraic) textbook definition of d of a differential form is unpedagogical.

I much prefer the route taken in Arnold's GTM book on classical mechanics: Just define d of a form as the thing that makes Stokes theorem true!

Then one derives the algebraic formula for d of a form. Everything is motivated at every step, and the student isn't confronted with a confusing algebraic definition of unknown origin.

Jon
  • 1
  • 7
    Well, to use that as a definition, you need to show that there is a thing which makes Stokes theorem true... – Mariano Suárez-Álvarez Dec 03 '09 at 03:02
  • True, but if you define in this way, then the derivation of the algebraic form would then provide that construction, so that's not actually a problem. This would hardly be the first time such an approach has been taken... – Simon Rose Dec 03 '09 at 04:34
  • 8
    Right. That approach is essentially the same as defining functors via universal properties; the construction to prove they exist is less important than the property. – Qiaochu Yuan Dec 03 '09 at 15:07
  • 3
    The standard definition of the exterior derivative really isn't unpedagogical; it's pretty much the only sensible definition once you have agreed on skew-symmetry (in fact, anyone that accepts the determinant as sensible should think the same of the exterior derivative). – Sam Derbyshire Dec 04 '09 at 07:32
  • 24
    Personally the algebraic formula for $d$ leaves me cold. I have always found it much easier to define it on functions via $df(X) = Xf$ for a vector field $X$, and then extending it as an odd derivation over the wedge product which obeys $d^2=0$. It is easy to see that this defines it uniquely. I think that this is pedagogical and easy to remember. – José Figueroa-O'Farrill Dec 04 '09 at 19:25
  • John Hubbard's calculus textbook does what Jon is looking for. Well, not stated exactly that way -- you define $dw$ to be the thing that makes Stokes theorem true infinitesimally. You then deduce the standard formula. The proof of Stokes theorem isn't any simpler but the definition of $dw$ is more appealing. – Ryan Budney Jan 11 '10 at 06:52
  • 10
    @José: It took me a while to understand what "standard algebraic definition" Jon meant (presumably, the one given by an explicit formula with partial derivatives, signs, and omitted indices), because all along I was thinking about your definition, which I think is excellent. – Victor Protsak May 28 '10 at 02:20
  • I completely agree! In a more sophisticated way, you could define it as the operator which makes a nice dual to singular homology when you integrate.

    Div, grad, curl, and all that by H. M. Schey uses this appraoch in a sense, though he doesn't discuss differentials forms. He defines div, for example, by taking a box around a point and then letting it shrink to $0$ and taking the limit of the flux integral divided by the volume.

    – David Corwin Jul 15 '10 at 19:21
18

Inspired by some of the comments, I would nominate the definition of infinite product topology in terms of its open sets, found in, e.g., Munkres' otherwise excellent Topology. "The product topology on $X = \prod_{\alpha \in J} X_\alpha$ is the topology generated by the sets of the form $\pi_\alpha^{-1}(U_\alpha)$, where $U_\alpha$ is an open subset of $X_\alpha$." One then proves that one can also use the basis of sets of the form $U = \prod_{\alpha \in J} U_\alpha$ where $U_\alpha$ is open in $X_\alpha$, and $U_\alpha = X_\alpha$ for all but finitely many $\alpha \in J$. This just makes it look like an annoying and unnatural modification of the box topology.

Better in my opinion is to view $X = \prod X_\alpha$ explicitly as a function space (not as some sort of tuples, though they are really functions underneath), and to use the terminology of nets. Then it becomes clear that the product topology is just the topology of pointwise convergence, i.e. a net $f_i \to f$ iff the nets $f_i(\alpha) \to f(\alpha)$ for all $\alpha \in J$.

Under this definition, Tychonoff's theorem, which previously seemed pretty obscure, has an obvious application when combined with Heine-Borel: given any set $J$ and a pointwise bounded net of functions $f_i : J \to \mathbb{R}$, there is a subnet that converges pointwise. This is maybe the most useful application, especially in functional analysis. (Indeed, I understand this was actually Tychonoff's original theorem, that an arbitrary product of closed intervals is compact.) For instance, it makes Alaoglu's theorem clear, once you see that the weak-* topology is just a toplogy of pointwise convergence.

It's nice then to compare this with the Arzela-Ascoli theorem, which says that if $J$ is a compact Hausdorff space and the functions $f_i$ are not only pointwise bounded but also continuous and equicontinuous, then a subnet (in fact a subsequence) converges not only pointwise but in fact uniformly.

Nate Eldredge
  • 29,204
  • It's interesting to note this is similar in spirit to item 4) in Paul Siegel's answer. – Mark Meckes May 27 '10 at 17:48
  • 2
    It is not similar only in spirit! The product topology is the weak topology for the family of natural projection maps on the product. The precise relation with Nate's answer is that if $X$ is a set equipped with the weak topology corresponding to a family of maps $f_\alpha$ then a net $x_i$ converges in $X$ if and only if $f_\alpha(x_i)$ converges for each $\alpha$.

    I don't claim that the notion of a weak topology belongs in point set topology classes (though even the subspace topology is the weak topology for the inclusion map), but it is a surprisingly convenient organizing principle.

    – Paul Siegel May 27 '10 at 18:20
  • Good point!${}$ – Mark Meckes May 28 '10 at 13:34
  • Really nice observation,Nate. – The Mathemagician May 28 '10 at 18:07
  • 12
    Actually, understanding the product topology as the one forced upon you if you want the categorical product on the particular category $Top$ was one of the first things that really sold me on the usefulness and power of category theory. – Todd Trimble Apr 04 '11 at 10:50
  • 1
    If you do the "box topology" but saying that products of closed sets are closed, then you do get the product topology. – John Wiltshire-Gordon Apr 25 '12 at 04:56
  • 5
    This is great! "An annoying and unnatural modification of the box topology" is exactly what I thought when I first saw the definition of the product topology in Munkres. On the other hand, I've never been comfortable with defining a topology by specifying its convergent nets; how do I check that what I've defined is actually a topology? – Vectornaut Apr 25 '12 at 06:25
  • 1
    I'm with @Todd Trimble on this one: I think defining the product of topological spaces categorically makes the definition very easy to use, and gives good motivation for the definition of the product topology (which is used to "implement" the categorical product, proving its existence). – Vectornaut Apr 25 '12 at 06:26
  • 1
    Nowadays, I define all of the topologies categorically: the quotient topology is the one that allows you to detect continuity by composition, etc. You almost don't even have to delve into the points to prove most stuff this way. – Jeff Strom Nov 29 '17 at 21:07
13

What about definitions that are elegantly concise to such an extent that they confound intuition? A classic of the genre:

  1. A forest is an acyclic graph;
  2. A tree is a connected forest.

(Presumably most of us would be less surprised to hear a forest defined as a disjoint union of trees.) But perhaps there is something to be said for a shocking definition: I shall never forget this, and probably I will always remember the moment I first saw it.

In a similar vein, I once saw a video of John H Conway giving a lecture on ordinals. He began, conventionally enough, by defining the notion of well-ordered set. But the definition he gave was an unconventional one:

A set $S$ equipped with a transitive relation $\mathord{(\leq)}\subseteq S\times S$ such that every non-empty $T\subseteq S$ has a unique least element $m\in T$ such that $m\leq t$ for all $t\in T$.

Notice that this implies reflexivity (take $T$ to be a singleton); totality (by existence of the least element of a two-element set); and antisymmetry (by uniqueness of the least element of a two-element set). So it’s equivalent to the usual definition. And it is certainly memorable! But I doubt I would have understood it if I wasn’t already familiar with the usual definition.

  • I'm not immediately seeing what's unconventional about the definition of a well ordered set. Is it the fact that the relation is not required by definition to be an order, only transitive? – LSpice Aug 19 '18 at 18:20
  • 1
    @LSpice Yes, exactly. The usual definition in my experience is something like the one on wikipedia: “a well-order on a set S is a total order on S with the property that every non-empty subset of S has a least element in this ordering.” The definition Conway used (I don't know whether it's original to him) weakens the total order to a transitive relation, at the expense of requiring uniqueness of the least element. – Robin Houston Aug 19 '18 at 22:01
  • 3
    Oh, I see! I was confused because the definition emphasised the word "least", which made me think that's where the difference from the usual one lay. – LSpice Aug 19 '18 at 22:14
11

Since this is a big list, I might as well comment 5 years later.

David Corwin mentioned tensor products, and the top post is about linear algebra, so I thought I would mention that, in my opinion, coordinate definitions in general tend to obscure meaning. Before going on, I'll mention that I'm not saying coordinates are bad! I just think that introducing ideas with coordinates tends to be very unrevealing.

A few examples which I find are obscured by coordinates are:

  1. Derivatives. The easiest example is the differential of a map $f:\mathbb{R}^m \rightarrow \mathbb{R}^n$. This is often given by the Jacobian matrix, and while the Jacobian is very useful in computation, it was not at all clear to me how it generalises the ordinary derivative, until I saw the proof that it satisfies the coordinate-free definition of a derivative at a point. Namely, the map $D_af:\mathbb{R}^m \rightarrow \mathbb{R}^n$ such that

$$\lim_{\lvert x \rvert \rightarrow 0}{{\lvert f(a+x)-f(a)-D_af(x) \rvert}\over{\lvert x \rvert}} = 0.$$

  1. Tensor products and Tensors. David Corwin already covered this.

  2. Local coordinates on manifolds. I think a lot of elegant definitions and properties are lost when using local coordinates, for example, the tangent space becomes very unwieldy and unnatural when interpreted in a local coordinate setting (although it does become more intuitive).

  3. Matrices and linear maps. I recommend reading the top post. But I'll mention that I am personally most bothered by determinants: they made no sense at all to me until I learned of the definition of the determinant as the top exterior power of the associated linear map!

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
  • 3
    I also prefer thinking in terms of exterior powers, but another intuitive way of introducing determinants (at least over $\mathbb{R}$) is that they measure expansion of volume by applying linear maps. A precise related result is that every continuous group homomorphism $GL_n(\mathbb{R}) \to \mathbb{R}^\times$ is essentially a power of the determinant mapping (see e.g. https://golem.ph.utexas.edu/category/2011/08/mixed_volume.html#c039412). – Todd Trimble Apr 21 '15 at 13:59
11

I see the problem crop up: a certain mathematical object has many characterizations, any one of which can be taken as the definition. Which do you use when you are introducing the subject?

The first one that comes to mind is the basis of a vector space. Perhaps this is not the best example for the title question of this thread of discussion, but I know that this confuses some students. When I last taught linear algebra, we taught them at least four characterizations. It's not really that any of the characterizations is obscuring or misleading. Rather, each one highlights some important property(-ies). Of course, the better students enjoy seeing all of the characterizations, and they appreciate every one. The less facile students get flustered because they want there to be just One Right Way of thinking about them.

A similar issue arises with the characterizations of an invertible matrix or linear transformation, though at least with a matrix it seems most reasonable to define an invertible matrix as one that has an inverse, namely another matrix that you can multiply it by to get the identity matrix.

The issue comes up in spades when introducing matroids.

  • 5
    I usually tend towards the historical definition. Usually that's the one that is best-motivated for people with the least background, since it's what motivated the creator. For example, if you look at Hassler Whitney's original papers on characteristic classes, they're extremely raw, explicit and beautiful. A very charming introduction, IMO. – Ryan Budney Dec 04 '09 at 05:50
  • 3
    I've never taught linear algebra, so I don't know what I'm talking about here. But perhaps the claim that having an inverse is the most natural characterization of "invertible" is just an artifact of the language? If we used the word "nonsingular" or "nondegenerate" for this property, other characterizations might seem more natural. – Michael Lugo Dec 04 '09 at 15:12
  • 3
    @Michael Lugo: I've never taught linear algebra either, but it seems to me that the essential property of an bijective function is that it has an inverse. One reason to believe this definition is the "right" one is that its generalization, the idea of an isomorphism, is far more important than the idea of "a morphism which is both monic and epic." – Vectornaut Apr 25 '12 at 06:14
9

Ok I'm joining very late but let me tell this.

I think the following definition of relation, that is often given, is misleading:

"Wrong" definition: A relation $R$ between the set $A$ and the set $B$ is an arbitrary subset of the cartesian product, $R\subseteq A\times B$.

In fact, I think it is a "wrong" definition:

  • it obfuscates the fact that the datum of source set and target set is important and itself part of the definition. For example, if you defined a function as a relation (in the sense of the definition above) satisfying the functional property ($\forall x\in A\exists ! y\in B: (x,y)\in R$) then the notion of codomain will not be well defined (or not explicitly defined), and one could be led to think that $x\mapsto x^2$ as a function $\mathbb R \to [0,+\infty)$ is literally equal to $x\mapsto x^2$ as a function $\mathbb R\to \mathbb R$.
  • It allows you to define, given $A$ and $B$ (in that order), the set $\mathsf{Rel}(A,B)$ of relations between $A$ and $B$, but not (immediately) the class of all relations (or the set of all relations within a given universe $U$).
  • It makes less clear that there should be a category $\mathsf{Rel}$, of sets with relations as morphisms, of which the $\mathsf{Rel}(A,B)$ are the hom-sets.

The right definition should of course be:

"Right" definition: A relation is a triple $(A,B,R)$ where $A$, $B$ are sets and $R\subseteq A\times B$.

Now the notion of codomain is well defined (or explicitly defined). And also the category $\mathsf{Rel}$ is well defined.

LSpice
  • 11,423
Qfwfq
  • 22,715
9

Convolution: whether it is convolution of functions, measures or sequences, it is often defined by giving an explicit formula for the resulting function (or measure, etc.). While this definition makes calculations with convolutions relatively easy, it gives little intuition into what convolution really is and often seems largely unmotivated. In my opinion, the right way to define convolution (say, of two finite complex Radon measures on an LCA group $G$, which is a relatively general case) is as the unique bilinear, weak-* continuous extension of the group product to $M(G)$ (the space of measures as above), where $G$ is naturally identified with point masses. Then one can restrict the definition to $L^1 (G)$ and get the well known explicit formula for convolution of functions. Of course, a probabilist will probably prefer to think of convolution as the probability density function associated to the sum of two independent absolutely continuous random variables. And there are other possible alternative definitions (see this Mathoverflow discussion). But the formula definition is really the hardest one to get intuition for, in my opinion.

Mark
  • 4,804
  • 5
    I agree that at first glance the explicit formula doesn't say much, but I guess you don't expect the "unique bilinear, weak-* continuous extension of the group product to M(G)" definition to appear e.g. in an undergraduate real analysis course... – Michal Kotowski Apr 11 '11 at 13:33
  • 1
    True, though there are some instances where such a definition becomes more easy (e.g. in a course on the representation theory of finite groups, where measures become functions and all continuity issues disappear). – Mark Apr 12 '11 at 09:41
  • 6
    Convolution is like multiplication of polynomials (except the exponents come from some group G and the coefficients can be density functions on G instead of finite sums). – John Wiltshire-Gordon Apr 25 '12 at 04:52
6

Induced representations defined in terms of tensor products of $G$ modules and in terms of vector-valued functions on $G$. It would be nice, if more textbook in representation theory stress this more heavily. Both definition have their advantages and disadvantages, I guess, but I personally feel more comfortable with the interpretation in terms of functions.

Marc Palm
  • 11,097
  • 3
    I don't think any of these definitions is misleading, but the fact that many books give only one of them (and some proceed to then use the other...) certainly is! – darij grinberg Apr 11 '11 at 15:03
  • 1
    This. I'm working on semigroup representations and been having problems getting a clear mental image about induced representations. I've been looking at the group-oriented literature but am wary of getting my intuition wrong for semigroups. – kastberg Apr 11 '11 at 17:22
5

I remember being confused by the multivariable calculus approach to vectorfields. Sometimes thinking of functions just as functions and sometimes as feilds. It should be possible to convey the idea of having a space of directional derivatives attatched to each point without having to talk about vector bundles.

In general I can accept that there is more going on behind the scenes than there is time for in a course, but simply knowing that there is a more general and "right" way of doing things is very helpful to me. This also tends to make the course much more interesting.

K.J. Moi
  • 988
3

The entire branch of point-set topology as taught in most textbooks has completely unintuitive definitions that obscure the entire subject. For instance, the definition of a topological space in terms of open sets tells you nothing about the meaning of point-set topology. It would be much more clear if topological spaces where originally defined in terms of topological closure operators since closure operators since intuitively we have $x\in\overline{A}$ if the set $A$ touches the point $x$ in some way. Other unnecessarily obscured concepts in point-set topology include the definitions of product topology, subspace topology, Hausdorff spaces, regular spaces, compact spaces, and continuous functions. Furthermore, some definitions are obscured when the spaces are not required to be Hausdorff. For instance, the notions of compactness, paracompactness, regularity, and normality do not have much meaning without the Hausdorff separation axiom. If one has a non-Hausdorff space where every open cover has a finite subcover, then one should call that space quasi-compact and not compact. It is a shame that general topology is taught in such a meaningless fashion.

  • 1
    What's a better definition of compactness for Hausdorff spaces? – Vectornaut Apr 25 '12 at 05:59
  • 2
    And a Hausdorff space is a space where every net(or filter) converges to at most one point. When we put this all together, a compact Hausdorff space is a space where every net(filter) accumulates at some point and converges to at most one point. – Joseph Van Name Dec 18 '12 at 02:45
  • And how to define paracompactness then? If you define it in terms of decompositions of unit you lose conection with compactness – Ostap Chervak Jan 17 '13 at 18:59
  • 1
    There are many non-trivial ways of defining paracompactness. A good topology textbook should therefore prove some of these characterizations of paracompactness. Perhaps the most intuitive one would be that a paracompact space is a $T_{1}$-space where each open cover has an open barycentric refinement. This characterization therefore says that paracompact spaces are precisely the spaces where the collection of all open covers generates a uniformity. Moreover, this uniformity is supercomplete. In fact, a space is paracompact iff it has a compatible supercomplete uniformity. – Joseph Van Name Jan 17 '13 at 22:28
  • Your proposed alternative definition of compact is equivalent to the open-cover one even for non-Hausdorff spaces. So why is this meaningless again? – Toby Bartels Feb 12 '14 at 23:19
  • It is meaningless to a student who has not seen the notion of compactness before. I just don't see how a student will quickly see the intuition behind the notion of compactness simply based on the open cover definition. At the very least, the open cover definition of compactness should be supplemented with the basic result relating compactness to convergence so that it becomes meaningful. – Joseph Van Name Feb 13 '14 at 01:28
  • Joseph what you're saying sounds interesting. Any idea where I can learn more about the approach that emphasizes closure-operators? – goblin GONE Jul 12 '14 at 09:11
  • 2
    I've always thought this same thing about the way in which topology is presented. Here's a simple example to help motivate the idea that the usual rigorous definition of a topology on a set captures the idea of the way a stretchable and bendable space is connected together: On the one hand, one could say the open subsets of the set $[0,1)$ are its intersections with open subsets of $\mathbb R$ with the usual topology; on the other hand one could allow sets containing $0$ to be considered open only if they include some subset of the form $[0,\varepsilon)\cup(1-\varepsilon,1)$. Then$,\ldots$ – Michael Hardy Mar 09 '16 at 21:43
  • 2
    $\ldots,$one has two different topologies, and the second one the two ends of the interval are glued together. This should convince the student that the way in which the manifold is connected together is a question of which sets are considered open. $\qquad$ – Michael Hardy Mar 09 '16 at 21:44
2

A function $f: \mathbb R\to \mathbb R$ is continuous if $f^{-1}(G)$ is open for all $G$ open in $\mathbb R$ is less intuitive than the delta epsilion definition of a continuous function.

Mykie
  • 189
  • 12
    Yes, but it is important. I think it is only taught once students have developed intuition for the epsilon delta definition. – David Corwin Jul 15 '10 at 23:41
  • 25
    "A function $f$ is continuuous if and only $lim_nf(x_n) =f(lim_n x_n)$ for all convergent sequences in the domain of definition $f$". This is intuitive (once convergence of sequences is understood), convenient for proofs, general (holds for metric spaces). – Johannes Ebert Oct 23 '10 at 21:16
  • 7
    Maybe, but I find the following variant to be more intuitive than both of them:

    A function f is continuous at a point x if the preimage of any neighborhood of f(x) is a neighborhood of x.

    – ACL Apr 04 '11 at 07:17
  • 10
    Johannes's version generalizes beyond metric spaces if you generalize sequences to nets. – Toby Bartels Feb 12 '14 at 23:16
  • IMHO the definition of continuity via limits is not really satisfactory even for $f:A\subset\mathbb{R}\to\mathbb{R}$. You have to introduce the notion of accumulation point and of deleted neighborhood, you have to talk about uniqueness of the limit, you have treat separately isolated points. Quite a mess, compared with the semplicity of "for any nbd $V$ of $f(x)$ there is a nbd $U$ of $x$ s.that $f(U)\subset V$." – Pietro Majer Dec 09 '20 at 22:42
  • @PietroMajer: why must you introduce deleted neighborhoods? The definition "$f$ is continuous at $z$ if an only if for every net $x$ converging to $z$, the net $f\circ x$ converges." treats isolated points just fine, has built-in respect to uniqueness and does not require deleted neighborhoods. Or are you specifically talking about the calculus-textbook definition using limits from below and above etc? – Willie Wong Dec 10 '20 at 04:58
  • It's true, and yes, I was mostly referring to teaching issues (I understood this was the topic, from the title). – Pietro Majer Dec 10 '20 at 18:32
2

When we write tensor products, it's optional to indicate the ring over which we do it; we can write $M\otimes N$ or $M \otimes_R N.$ But for elements, we always write $x\otimes y$ without reference to $R.$ You must keep it in mind and that can induce lapses. For example, $v\otimes u^2 - u\otimes uv$ may be $\ne0{:}$ it depends on the base ring, and that doesn't appear [in the notation].

Sometimes the problem is not the concept so much as the notation we use for it.

LSpice
  • 11,423
1

A discrete probability distribution is often defined as one for which the number of values that a random variable with that distribution can take is finite or countably infinite.

I prefer to define it as one for which one has $\displaystyle \sum_x \Pr(X=x) = 1,$ where the sum is over all values $x$ for which $\Pr(X=x)>0.$

(One should not define it as one for which the support is finite or countably infinite. For example, suppose a probability distribution assigns positive probability to the singleton of every rational number between $0$ and $1,$ and the sum of those probabilities is $1.$ Then every real number between $0$ and $1$ (inclusive) is in the support, since every interval about every such number gets positive probability.)

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
  • 1
    Is your definition of 'support' the usual one? I wouldn't know what the word meant without further context, since I expect the support of a function to be a subset of its domain, not of its co-domain; but, if I had to guess, I'd probably make the guess that it consisted of elements with positive-measure pre-image, not that it was given by what I think is your definition (that every neighbourhood has a positive-measure pre-image). – LSpice Aug 19 '18 at 18:24
  • @LSpice : The support of a measure on the set of Borel subsets of a topological space is the set of all points whose every open neighborhood is assigned positive measure. $\qquad$ – Michael Hardy Aug 19 '18 at 18:44
  • @LSpice : I don't see how you think the codomain is mentioned here. – Michael Hardy Aug 19 '18 at 18:45
  • It seemed that you were speaking of the support (as a subset of $\mathbb R$) of a random variable $X$, which is a function whose codomain is (presumably) $\mathbb R$. – LSpice Aug 19 '18 at 19:59
  • @LSpice : I was talking about the support of a probability distribution, whose domain is the set of all Borel subsets of $\mathbb R. \qquad$ – Michael Hardy Aug 19 '18 at 20:00
  • 1
    @LSpice : Looking at this several years later, I wonder whether I can express this more clearly. The following are two different things: (1) a random variable; (2) the probability distribution of a random variable. Different random variables, that may even be probabilistically indpendent of each other, can have the same probability distribution, i.e. different instances of (1) can map to the same instance of (2). I was defining the support of (2), not of (1). $\qquad$ – Michael Hardy Aug 01 '23 at 00:51
  • Re, thank you for the explanation, and I'm glad I'm not the only one who does occasional tours of my old posts to see what I used to know. – LSpice Aug 01 '23 at 01:19
1

"Prime number" is sometimes defined as a number with exactly two positive divisors, which are itself and $1.$ The deficiency of this characterization is only that it doesn't motivate the definition in the following way. $$ \begin{array}{cccccccccc} & & & & 60 \\ & & & \swarrow & & \searrow \\ & & 4 & & & & 15 \\ & \swarrow & \downarrow & & & \swarrow & & \searrow \\ 2 & & 2 & & 3 & & & & 5 \end{array} $$ One could continue factoring by pulling out $1$s, but that is uninformative in that it doesn't distinguish the number being factored from any other. The definition is motivated by the fact that the number $1$ cannot play the sort of role in this process that either composite or prime numbers play.

(For Euclid this was not problematic since he didn't consider $1$ to be a number.)

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
  • 2
    I don't understand what deficiency in the definition your diagram illustrates. What should the definition be? – LSpice Jan 18 '18 at 17:04
  • @LSpice : Consider what happens if you extend the diagram further by pulling out $1$s, thus: $$ \begin{array}{cccccccccccc} & & & & 60 \ & & & \swarrow & & \searrow \ & & 4 & & & & 15 \ & \swarrow & \downarrow & & & \swarrow & & \searrow \ 2 & & 2 & & 3 & & & & 5 \ & & & & & & & \swarrow & & \searrow \ & & & & & & 5 & & & & 1 \end{array} $$ You get nothing that distinguishes the numbers you're working with from any others. In other words, the number $1$ cannot play a role in this sort of thing in the way in which prime and composite numbers do. $\qquad$ – Michael Hardy Jan 18 '18 at 17:10
  • @LSpice : The point is that this answers the naive question: "Why isn't $1$ considered a prime number?" Why does $1$ play a role that is different from that of either prime or composite numbers? $\qquad$ – Michael Hardy Jan 18 '18 at 17:12
  • 3
    I don't learn anything from either diagram about why the existing definition is bad, but I'm probably not the best judge of what's clearest to a first-time learner, so that's probably irrelevant. What should the definition be? – LSpice Jan 18 '18 at 21:01
  • @LSpice : I am undecided as to the best form in which to state a definition for beginners. Maybe I would append a comment to it, on why the number $1$ should be treated differently. – Michael Hardy Jan 19 '18 at 00:57
0

I find simplicial homology very difficult. In particular, I find the idea of a simplicial complex very hard to comprehend, except in the case of an abstract simplicial complex. Although it's not equivalent, I much prefer the idea of what Hatcher calls a $\Delta$-complex, although I still have some trouble with that definition.

-6

The first sentence in a probability talk is likely to be, Let $X$ be a random variable. As a non-probabilist who dabbles occasionally in probability, I find the notion of rv difficult to absorb; and then they have the expectation operator (integration), characteristic function (with a different meaning---indicator function means what characteristic function means in measure theory), and in general, it is a distinct language.

-6

$\pi=3.14$ cm. Tongue-in-cheek of course, but this can supposedly be found in books.

  • 2
    Are there indeed books that say $\pi$ is dimensionful and give it in terms of cgs units? – Todd Trimble Nov 30 '17 at 00:43
  • Let me, when I get back from travelling, try to dig up the book that says there are books which do this. From there we just have to trust the author I'm afraid. – Eivind Dahl Dec 01 '17 at 15:00