Should the formula for the inverse of a 2x2 matrix be obvious?

Question

As every MO user knows, and can easily prove, the inverse of the matrix $\begin{pmatrix} a & b \\\ c & d \end{pmatrix}$ is $\dfrac{1}{ad - bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$. This can be proved, for example, by writing the inverse as $ \begin{pmatrix} r & s \\ t & u \end{pmatrix}$ and solving the resulting system of four equations in four variables.

As a grad student, when studying the theory of modular forms, I repeatedly forgot this formula (do you switch the $a$ and $d$ and invert the sign of $b$ and $c$ … or was it the other way around?) and continually had to rederive it. Much later, it occurred to me that it was better to remember the formula was obvious in a couple of special cases such as $\begin{pmatrix} 1 & b \\ 0 & 1 \end{pmatrix}$, and diagonal matrices, for which the geometric intuition is simple. One can also remember this as a special case of the adjugate matrix.

Is there some way to just write down $\dfrac{1}{ad - bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$, even in the case where $ad - bc = 1$, by pure thought—without having to compute? In particular, is there some geometric intuition, in terms of a linear transformation on a two-dimensional vector space, that renders this fact crystal clear?

Or may as well I be asking how to remember why $43 \times 87$ is equal to $3741$ and not $3731$?

I was just discussing this with a friend; I think this is a great pedagogical question. — Daniel Litt, Feb 21 '12 at 02:40
I remember it in a boring form: the diagonals are easy, so they just change places, while the off-diagonals are special, so they suffer a sign "inversion" -- actually, hardly anything to remember ;-) — Suvrit, Feb 21 '12 at 03:23
You know you want to get the determinant along the diagonal, which is ad-bc, so you know the first column has to be d,-c. Likewise for the second column. — Jonny Evans, Feb 21 '12 at 07:39
There is a pneumonic I teach my students: "NBC is sad." For some reason the television station and the emotion are memorable. It stands for "negate B and C, switch A and D." This isn't geometric intuition, but I've never forgotten the pneumonic!
Another pneumonic I like is "MARBLE", which is not linear algebra. It stands for "Maps at Right Blank are Left Exact." (Because I always forget the convention for left and right exact functors.) — Hiro Lee Tanaka, Sep 28 '13 at 13:15
(These look like fine memory aids, but you must mean "mnemonic", not "pneumonic"...) — Noam D. Elkies, Dec 29 '13 at 18:05
The geometry associated with Poloni's answer makes the algebra transparent, is easy to remember, and is suitable for introductory lessons on matrices and vectors in linear algebra. — Tom Copeland, Apr 27 '15 at 04:17
I'd feel much better about this question if only it said somewhere, "assuming $ad-bc\ne0$." — Gerry Myerson, Apr 30 '15 at 13:37
I remember it by thinking $I^{-1} = I$ (which one won't ever forget), so you can only make the off-diagonals negative. — tvk, Jun 19 '16 at 23:48
$4371$=$47313$ is the dimension of Baby Monster representation. — , May 04 '17 at 15:50

Noam D. Elkies · Answer 1 · 2012-02-26T05:53:51.723

256

Think about $\left({\phantom-d\phantom--b\atop-c\phantom{--}a}\right)$ as $tI - A$ where $t=a+d$ is the trace of $A$. Since $A$ satisfies its own characteristic equation (Cayley-Hamilton), we have $A^2 - t A + \Delta \cdot I = 0$ where $\Delta = ad-bc$ is the determinant. Thus $\Delta \cdot I = t A - A^2$. Now divide both sides by $\Delta \cdot A$ to get $A^{-1} = \Delta^{-1}(tI-A)$, QED.

edited Feb 26 '12 at 05:53

answered Feb 21 '12 at 04:38

Noam D. Elkies

77,218

12

I sometimes give this and the $3 \times 3$ analog of this formula as an exercise; If A is an invertible $3 \times 3$ matrix then $A^{-1}=\Delta^{-1}(A^2-tA +\frac{t^2-s}{2}I)$ where $s=tr(A^2)$, and secretly I'm assuming $1 \neq =-1$. – Guillermo Mantilla Feb 21 '12 at 06:55
28

Noam, you win Linear Algebra.
(To supplement this, maybe you could provide an entertaining linear-algebra-related anagram or two.)
– Elizabeth S. Q. Goodman Feb 21 '12 at 07:37
6

That is awesome. – Frank Thorne Feb 21 '12 at 15:12
21

@Elizabeth S. Q. Goodman: thanks! :-) Linear-algebra anagrams, though? My heuristic for finding "list anagrams" via lattice basis reduction is linear algebra of a kind, but that's surely not what you meant. The closest I can come is something like "label ${\bf R} \oplus {\bf R}$ again", which is what a ${\rm GL}_2({\bf R})$ matrix does, and is an anagram of "linear alg$\oplus$bra". Likewise "label ${\bf R}^e/{\bf R}$ again", which works exactly if I may ignore the "/". Otherwise, try posting a "What are some good math anagrams?" question to mathoverflow ... – Noam D. Elkies Feb 26 '12 at 05:49
6

[cont'd] ..., asking not to repeat old standards like logarithm/algorithm, $\int/\Delta$, and the Banach-Tarski joke. Make it community wiki, and hope some good examples get posted before the question gets closed. – Noam D. Elkies Feb 26 '12 at 05:49
6

@F.Thorne: Thank you! And I see that I should also thank you for not accepting my answer, which made it eligible for a gold star... – Noam D. Elkies Feb 26 '12 at 05:55
This answer is wonderful! Though as a nit: it seems to use both juxtaposition and $\cdot$ as notation for scalar $\times$ matrix multiplication. – Geoffrey Irving Mar 07 '24 at 08:16

Daniel Litt · Accepted Answer · 2020-08-14T15:38:18.560

83

EDIT (8/14/2020): A couple people have suggested that this answer should come with a warning -- this is a pretty fancy approach to an elementary question, motivated by the fact that I know the OP's interests. Some of the other answers below are probably better if you just want to invert some matrices :). I've also fixed a couple of minor typos.

My favorite way to remember this is to think of $SL_2(\mathbb{R})$ as a circle bundle over the upper half-plane, where $SL_2(\mathbb{R})$ acts on the upper half-plane via fractional linear transformations; then the map sends an element of $SL_2(\mathbb{R})$ to the image of $i$ under the corresponding fractional linear transformation. The fiber over a point is the corresponding coset of the stabilizer of $i$.

This naturally gives the Iwasawa decomposition of $SL_2(\mathbb{R})$ as $$SL_2(\mathbb{R})=NAK$$ where

$$K=\left\{\begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} , ~0\leq\theta<2\pi \right\}$$

$$A=\left\{\begin{pmatrix} r & 0\\ 0 &1/r\end{pmatrix},~ r\in \mathbb{R}\setminus\{0\}\right\}$$

$$N=\left\{\begin{pmatrix} 1 & x \\ 0 & 1\end{pmatrix},~ x\in \mathbb{R}\right\}$$

Here $K$ is the stabilizer of $i$ in the upper half-plane picture; viewed as acting on the plane via the usual action of $SL_2(\mathbb{R})$ on $\mathbb{R}^2$ it is just rotation by $\theta$ (and likewise if we view the upper half plane as the unit disk, sending $i$ to $0$ via a fractional linear transformation). $A$ is just scaling by $r^2$, in the upper half-plane picture, and is stretching in the $\mathbb{R}^2$ picture. $N$ is translation by $x$ in the upper half-plane picture, and is a skew transformation in the $\mathbb{R}^2$ picture.

In each case, the inverse is geometrically obvious: for $K$, replace $\theta$ with $-\theta$; for $A$ replace $r$ with $1/r$, and for $N$, replace $x$ with $-x$. Since $$SL_2(\mathbb{R})=NAK$$ this lets us invert every $2\times 2$ matrix by "pure thought", at least if you remember the Iwasawa decomposition (which is easy from the geometric picture, I think). Of course this easily extends to $GL_2$; if $A$ has determinant $d$, then $A^{-1}$ had better have determinant $d^{-1}$.

If you'd like to derive the formula you've written down by "pure thought" it suffices to look at any one of these cases if you remember the general form of the inverse; or you can simply put them all together to give a rigorous derivation.

edited Aug 14 '20 at 15:38

answered Feb 21 '12 at 02:58

Daniel Litt

22,187

2

This is a spectacular answer. +1, sir. – Frank Thorne Feb 21 '12 at 15:07
66

Daniel, you really remember the inversion for 2 x 2 matrices by this method? I remember it the same way I remember the quadratic formula: I burned it into my brain back in high school. What you describe seems more like a way to understand the formula than to remember it. – KConrad Feb 22 '12 at 15:19
4

@KConrad: In practice I do actually just recall the formula from memory; but just dredging it up from memory isn't by favorite way to remember it. In my ideal world, perhaps, we would have much less burned into our brains in high school; rather we would develop understanding and intuition (like this and other answers purport to give). On the other hand, I guess, sometimes you just gotta invert some $2\times 2$ matrices, and thinking about the upper half-plane is probably not the easiest way to do that ;-). – Daniel Litt Feb 22 '12 at 18:14
7

I also wanted to note that Frank's method of using a few special cases where the geometry was obvious (e.g. unipotents) to remember the general formula actually amounts to a geometric proof of the general formula, if one does enough geometric special cases. – Daniel Litt Feb 22 '12 at 18:19
5

Since KConrad entered the discussion, I'll mention that he wrote up a great treatment of the Iwasawa decomposition: http://www.math.uconn.edu/~kconrad/blurbs/grouptheory/SL(2,R).pdf – Frank Thorne Feb 23 '12 at 03:55
1

How does this method yield the formula for the inverse? If I understand correctly, what you suggest is (1) compute this Iwasawa decomposition of $A$ (2) invert each term one by one, to get a different decomposition of $A^{-1}$ (because the factors are in the reverse order). But at this point why not using any other classical matrix decomposition, for instance the SVD? And how does it help me computing the inverse of $\begin{bmatrix}1 & 2 \ 3 & 4\end{bmatrix}$, assuming that I cannot compute its Iwasawa decomposition in my head? – Federico Poloni Aug 14 '20 at 15:45
@FedericoPoloni: write a generic matrix as a product of generic elements of N, A, K, invert each and multiply the results together. Then observe that the formula you get agrees with the standard one. I agree this certainly isn't a good computational tool! – Daniel Litt Aug 14 '20 at 16:48
Perhaps @FedericoPoloni's point was that you claimed that "remember the formula in special cases" could be made a rigorous proof, and that, while it certainly can, it requires first observing that the proposed formula for the inverse is anti-multiplicative—surely an unappealing recipe for intuition? – LSpice Aug 14 '20 at 18:15
@LSpice I've deleted my well-meant but perhaps ill-judged comment, so perhaps you can delete your response to me – Yemon Choi Aug 14 '20 at 18:46
@YemonChoi, done. – LSpice Aug 14 '20 at 19:14

Qiaochu Yuan · Answer 3 · 2012-02-21T04:16:02.177

45

Recall that the adjugate $\text{adj}(A)$ of a square matrix is a matrix that satisfies $$A \cdot \text{adj}(A) = \text{adj}(A) \cdot A = \det(A).$$

Like the determinant, the adjugate is multiplicative. Categorically, the reason the determinant is multiplicative is that it comes from a functor (the exterior power), so one might expect that the adjugate also comes from a functor, and indeed it does (the same functor!).

More precisely, let $T : V \to V$ be a linear transformation on a finite-dimensional vector space with basis $e_1, ... e_n$. Then the adjugate of the matrix of $T$ with respect to the basis $e_i$ is the matrix of $\Lambda^{n-1}(T) : \Lambda^{n-1}(V) \to \Lambda^{n-1}(V)$ with respect to an appropriate "dual basis" $$(-1)^{i-1} \bigwedge_{j \neq i} e_j$$ of $\Lambda^{n-1}(V)$ (it becomes an actual dual basis if you identify $\Lambda^n(V)$ with the underlying field $k$ by sending $e_1 \wedge ... \wedge e_n$ to $1$). The exterior product $V \times \Lambda^{n-1}(V) \to \Lambda^n(V)$ can then be identified with the dual pairing $V \times V^{\ast} \to k$, and the action of the exterior product on endomorphisms of $V$ and $\Lambda^{n-1}(V)$ can be identified with the composition of endomorphisms of $V$ (remembering that $\text{End}(V)$ is canonically isomorphic to $\text{End}(V^{\ast})$). This categorifies the above statement.

When $n = 2$, the dual basis is $e_2, - e_1$ but $\Lambda^1$ is the identity functor, and the formula follows. The geometric intuition comes from thinking about the exterior product in terms of oriented areas of parallelograms in $\mathbb{R}^2$.

edited Feb 21 '12 at 04:16

answered Feb 21 '12 at 04:02

Qiaochu Yuan

114,941

7

Yeah, I think "$A^{-1} = \frac{1}{\det(A)}\adj (A)$" is the easiest way to remember, because for a 2x2 matrix computing the adjugate is trivial – William Feb 21 '12 at 05:22
I tried drawing parallelograms corresponding to various linear transformations. Although the intuition was clear for a subset of matrices generating $SL_2$ (such as those occurring in the Iwasawa decomposition, as mentioned by Daniel Litt), I was unable to "see" this for a general linear transformation. Do you have more to say about your last sentence? Thank you! – Frank Thorne Feb 21 '12 at 15:11
1

@Frank: I guess the geometric picture is something like this. Identifying $\Lambda^2(V)$ with $\mathbb{R}$ corresponds to choosing a volume form on $\mathbb{R}^2$, equivalently a symplectic form. So $\text{SL}_2(\mathbb{R})$ is isomorphic to the symplectic group and the inverse and symplectic adjoint coincide for matrices of determinant $1$. Now the symplectic adjoint satisfies $\langle Tv, w \rangle = \langle v, T^{\dagger} w \rangle$ where $\langle , \rangle$ denotes the symplectic form, and plugging $e_1, e_2$ into $v, w$ one can see what this condition means geometrically. – Qiaochu Yuan Feb 21 '12 at 18:26
3

This is fantastic! I always thought the adjugate was just another "playing with squares of numbers" trick... I'm pleasantly surprised to see that it has a "deeper meaning." – Vectornaut Feb 22 '12 at 14:37
@Qiaochu: 1+. This seems to be a general definition for the adjugate of an endomorphism of a locally free object of rank $n$ in a symmetric monoidal cocomplete category. – Martin Brandenburg Feb 22 '12 at 21:15
Very nice. I had been looking for an invariant construction of the adjugate. – Spiro Karigiannis Feb 23 '12 at 13:40
4

This is essentially how I teach Cramer's rule. – Allen Knutson Mar 03 '12 at 23:21
6

In particular, the adjugate has two weirdnesses: the size $n-1$ determinants, and the transpose. Those are coming from the $\Lambda^{n-1}$ and the ${}^*$, respectively. – Allen Knutson Oct 07 '15 at 21:29

score 27 · Answer 4 · answered Feb 22 '12 at 15:33

27

I remember the inverse by looking at the corresponding linear fractional transformation. It sends $\frac{-d}{c}$ to $\infty$ and $\infty$ to $\frac{a}{c}$, so the inverse had better reverse this; it follows that the $c$ should stay put and the $a$ and $d$ should switch, and so the $b$ and $c$ get negated.

answered Feb 22 '12 at 15:33

Grant Lakeland

555
4
11

Strictly speaking this determines the inverse only up to sign, but this is still a good way for remembering the formula. – François Brunault Feb 22 '12 at 16:44
1

1+, this nice! Just for completeness: The linear fractional transformation is $\mathbb{P}^1 \to \mathbb{P}^1$, $z \mapsto \tfrac{az+b}{cz+d}$. – Martin Brandenburg Feb 22 '12 at 20:05
@Martin: Yes, thank you. @François: Yes, true; I should mention that I was assuming the "switch one pair and negate the other" comment from the original question! – Grant Lakeland Feb 22 '12 at 21:27

score 26 · Answer 5 · answered Feb 23 '12 at 08:35

26

This is essentially the same as Tobias Hagge's answer and Jonny Evans's comment, but I thought that writing it up in this way would make things clearer.

Think about the product $$ \begin{bmatrix} a & b\\\\ c & d \end{bmatrix} \begin{bmatrix} ? & ?\\\\ ? & ? \end{bmatrix} =\begin{bmatrix} ad-bc & 0\\\\ 0 & ad-bc \end{bmatrix}. $$ Focus on the zero in position $(2,1)$ in the RHS. In order to get it with the row-by-column rule, the first column of the unknown matrix must be $\begin{bmatrix}d\\\\-c\end{bmatrix}$.

(Well, apart from the sign --- you could still get it wrong. But you can check that it is correct by computing the $(1,1)$ entry of the product.)

Now focus on the other zero entry in position (1,2) of the RHS, and you'll see that the second column must be $\begin{bmatrix}-b\\\\a\end{bmatrix}$. Again, if you're confused about the sign, just check the $(2,2)$ entry.

answered Feb 23 '12 at 08:35

Federico Poloni

19,330

16

(Warning: this answer from a numerical linear algebraist/matrix theorist. We guys do not have a dime of geometrical intuition, and like to always think about squares full of numbers.) – Federico Poloni Feb 23 '12 at 08:37
1

Geometrically, the off-diagonal elements of the resulting identity matrix being zero translates into the first column of the inverse matrix being orthogonal to the second row of the matrix to be inverted (A) and likewise for the second column of the inverse and the first row of the matrix A. Overall signs are determined by correct orientation of the orthognal vectors to give a normalization by the signed area (determinant) to unity. – Tom Copeland Apr 23 '15 at 20:03
I.e., an orientation and scaling giving unity for the inner products of the first (second) column of the inverse and first (second) row of A. – Tom Copeland Apr 25 '15 at 16:42
1

You don't even need to look at the zero positions I think: just looking at the (1, 1) position, namely $\begin{bmatrix}a & b\end{bmatrix}\begin{bmatrix}?\?\end{bmatrix} = ad-bc$, should show the first column to be $\begin{bmatrix}d\-c\end{bmatrix}$, via some appeal to $a, b, c, d$ being "free". Similarly the (2, 2) position for the second column. – shreevatsa Mar 07 '24 at 14:50

Martin Brandenburg · Answer 6 · 2015-04-30T07:05:37.537

$\bullet$ The sign switch is familiar from complex numbers:

The regular representation of $\mathbb{C}$ over $\mathbb{R}$ is the embedding of $\mathbb{R}$-algebras $\mathbb{C} \to M_2(\mathbb{R})$ defined by $a+ib \mapsto \begin{pmatrix} a & -b \\ b & a \end{pmatrix}$. The inverse of $a+ib$ is the conjugate $a-ib$ divided by the norm $a^2+b^2$, thus the inverse of $\begin{pmatrix} a & -b \\ b & a \end{pmatrix}$ is the adjugate $\begin{pmatrix} a & b \\ -b & a \end{pmatrix}$ divided by the determinant $a^2+b^2$.

$\bullet$ Both the sign switch and the swap of the diagonal entries can be illustrated with quaternions:

The regular representations of $\mathbb{H}$ over $\mathbb{C}$ is the embedding $\mathbb{H} \to M_2(\mathbb{C})$ mapping $u+jv \mapsto \begin{pmatrix} u & v \\ - \overline{v} & \overline{u} \end{pmatrix}$. The inverse of $u + jv$ is the conjugate $\overline{u} - j \overline{v}$ divided by the norm $|u|^2+|v|^2$. Thus, the inverse of $\begin{pmatrix} u & v \\ - \overline{v} & \overline{u} \end{pmatrix}$ is the adjugate $\begin{pmatrix} \overline{u} & -v \\ \overline{v} & u \end{pmatrix}$ divided by the determinant $|u|^2+|v|^2$.

Scott Carter · Answer 7 · 2012-02-23T03:22:34.880

12

My answer is not very highfaluting, but it is what I use to remember. Switch the diagonals, change the signs of the off-diagonals and divide by the determinant. Since the inverse of a diagonal matrix is easy, the switch should be easy to remember. On the other hand such mnemonics are dangerous. The critical points of a cubic $Ax^3+Bx^2+Cx+D$ are at $\frac{-B\pm\sqrt{B^2-3AC}}{3A}$, or so I remember.

edited Feb 23 '12 at 03:22

answered Feb 21 '12 at 04:57

Scott Carter

5,244

2

Isn't that off by exactly a factor of two? Is that the point? – Will Sawin Feb 22 '12 at 22:52
4

Yes, it was off by a factor of 2. And I had remembered it incorrectly. I was too lazy to compute it at the time I wrote the post --- the computer was on my lap and the pen was across the room ;). I think that was the point. – Scott Carter Feb 23 '12 at 03:24

Dan Piponi · Answer 8 · 2024-03-07T00:15:59.420

8

For $\mathbf{A}$ near zero we have $$ (1-\mathbf{A})^{-1}\approx 1+\mathbf{A} $$ so it has to negate the off-diagonals.

(If you want to get all fancy about it you could notice that we use $\exp$ to map from the Lie algebra of invertible matrices to the Lie group itself and note that negation in the Lie algebra corresponds to the inverse in the group. But the zero element of the Lie algebra maps to $1$ in the group so the negation is only directly visible off the diagonal.)

edited Mar 07 '24 at 00:15

answered Mar 06 '24 at 23:58

Dan Piponi

8,086

Maybe this somehow relates to the seeing the inverse as $A^{-1} = \Delta^{-1}(tI-A)$ in the answer by Noam D. Elkies above? Maybe with an analogy (for $x$ near $1$) with $x^{-1} = (1 - (1-x))^{-1} \approx 1 + (1 - x) = 2 - x$ (or $x^{-1} = (1 + (x-1))^{-1} \approx 1 - (x-1) = 2 - x$)? – shreevatsa Mar 07 '24 at 14:46
It's all related. But I was trying to come up with the simplest possible example because that's all you need for a mnemonic. – Dan Piponi Mar 07 '24 at 15:09

score 5 · Answer 9 · 2017-03-20T12:58:57.897

It is probably too old question to answer, but I couldn't resist. Consider $M_2\mathbb R$=$\{a+b\iota :a,b\in \mathbb C\}$. Additional relations are present: $\iota^2=1$, $\iota b=\bar b\iota$, which make enough to multiply $2\times 2$ matrices as split quaternions. For reader convenience $\iota=\begin{pmatrix} 1 & 0 \\\ 0 & -1 \end{pmatrix}$.

Now adjugate matrix to $a+b\iota $ is $\bar a-b\iota$. Let's calculate $(a+b\iota)(\bar a-b\iota)=a\bar a-b\bar b+(-ab+ba)\iota=a\bar a-b\bar b$, because complex numbers multiplication is commutative.

The determinant of matrix $a+b\iota$ is $a\bar a-b\bar b$.

Tobias Hagge · Answer 10 · 2012-02-21T07:37:56.773

5

Mnemonic: make the product diagonals the determinant, then scale.

The off diagonals are zero because the area of a parallelogram with planar edge vectors $c_1,c_2$ is the length of the scaled projection $|c_1 \cdot i c_2| = |c_2 \cdot i c_1|$, and the mnemonic sets row $r_k$ in the inverse to $(ic_{3 - k})^T$.

edited Feb 21 '12 at 07:37

answered Feb 21 '12 at 07:31

Tobias Hagge

251

score 4 · Answer 11 · edited Mar 07 '24 at 00:52

There's lots of great answers here, but they may be inaccessible to students first encountering this material, so here's my intuition for the 2x2 inverse at an undergraduate (maybe even high-school) level:

First, let's define two matrices $A$ and $B$, and our goal will be to find entries for $B$ such that $B= A^{-1}$.

$$A = \begin{bmatrix} \color{#d66}a & \color{#4b2}b \\ \color{#d66}c & \color{#4b2}d \end{bmatrix}, \space B = \begin{bmatrix} \color{#28d}e & \color{#28d}f \\ \color{#a6a}g & \color{#a6a}h \end{bmatrix} $$

and we'll give special names to the columns of $A$ and the rows of $B$:

$$\color{#d66}{A_x} = \begin{bmatrix} \color{#d66}a \\ \color{#d66}c \end{bmatrix}, \space \color{#4b2}{A_y} = \begin{bmatrix} \color{#4b2}b \\ \color{#4b2}d \end{bmatrix}, \space \color{#28d}{B_x} = \begin{bmatrix} \color{#28d}e & \color{#28d}f \end{bmatrix}, \space \color{#a6a}{B_y} = \begin{bmatrix} \color{#a6a}g & \color{#a6a}h \end{bmatrix}$$

and we'll think of these as just being vectors hanging out in 2-dimensional space.

Now, by the definition of the inverse we want $BA = A^{-1}A = I$. So what is $BA$?

$$BA = \begin{bmatrix} \color{#28d}e & \color{#28d}f \\ \color{#d6a}g & \color{#d6a}h \end{bmatrix} \begin{bmatrix} \color{#d66}a & \color{#4b2}b \\ \color{#d66}c & \color{#4b2}d \end{bmatrix} = \begin{bmatrix} \color{#28d}{B_x} \cdot \color{#d66}{A_x} & \color{#28d}{B_x} \cdot \color{#4b2}{A_y} \\ \color{#a6a}{B_y} \cdot \color{#d66}{A_x} & \color{#a6a}{B_y} \cdot \color{#4b2}{A_y}\end{bmatrix}$$

(I should note here that a corresponding expression for $AB$ can be found by imagining the columns of $B$ acting on the rows of $A$, but for the sake of simplicity I'll just be showing the geometric intuition for this version.)

Now, we've said that we want this expression to equal $I$ which gives us the following equation:

$$\begin{bmatrix} \color{#28d}{B_x} \cdot \color{#d66}{A_x} & \color{#28d}{B_x} \cdot \color{#4b2}{A_y} \\ \color{#a6a}{B_y} \cdot \color{#d66}{A_x} & \color{#a6a}{B_y} \cdot \color{#4b2}{A_y}\end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$$

Which is really just a set of four equations stating that the dot-product of a row from $B$ with a column from $A$ is equal to either $1$ or $0$. These four equations are the constraints that will constrain our answer for the four unknowns in the inverse matrix we are trying to find.

Each constraint has a simple geometric interpretation:
When the dot-product of two vectors equals $0$, their directions must be perpendicular. Given you know the direction of a vector, you must choose some length such that its dot-product with the other vector equals $1$. Thus we can use these four equations to constrain the direction and length of each of the rows of $B$.

For instance, in the top-right corner we have $\color{#28d}{B_x} \cdot \color{#4b2}{A_y} = 0$, which means that the first row of $B$ is perpendicular to the second column of $A$. We already know that we will need to rescale the result, so our goal right now is to merely find any vector which is perpendicular to $\color{#4b2}{A_y}$. The procedure to do so is simple enough: just swap the two components and negate one of them, giving us $\color{#28d}{B_x} = \begin{bmatrix} \color{#4b2}d & \color{#4b2}{-b} \end{bmatrix}$.

Going through a similar process for the bottom-left corner with $\color{#a6a}{B_y} \cdot \color{#d66}{A_x} = 0$, gives us $\color{#a6a}{B_y} = \begin{bmatrix} \color{#d66}{-c} & \color{#d66}{a} \end{bmatrix}$. Now there is some freedom as to which component gets negated, but it comes out in the wash when we choose the lengths so that $\color{#28d}{B_x} \cdot \color{#d66}{A_x}$ and $\color{#a6a}{B_y} \cdot \color{#4b2}{A_y}$ both equal $1$.

In this case, if you carry out the calculation, you'll see I've negated the components such that

$$\color{#28d}{B_x} \cdot \color{#d66}{A_x} = \color{#a6a}{B_y} \cdot \color{#4b2}{A_y} = \color{#d66}a \color{#4b2}d - \color{#4b2}b \color{#d66}c = \det(A)$$

In other words, with these row vectors for $B$, we have the the following equation:

$$\begin{bmatrix} \color{#28d}{B_x} \cdot \color{#d66}{A_x} & \color{#28d}{B_x} \cdot \color{#4b2}{A_y} \\ \color{#a6a}{B_y} \cdot \color{#d66}{A_x} & \color{#a6a}{B_y} \cdot \color{#4b2}{A_y}\end{bmatrix} = \begin{bmatrix} \det(A) & 0 \\ 0 & \det(A) \end{bmatrix}$$

(You'll find that regardless of which component you negate, you always end up with $\pm \det(A)$, and the negative sign will be undone by the next step)

So by dividing each row vector in $B$ by $\det(A)$ we will successfully rescale their lengths such that the dot-products along the diagonal both equal $1$. And thus we arrive at the final version of $B$:

$$ B = \frac{1}{\det(A)}\begin{bmatrix} \color{#4b2}d & \color{#4b2}{-b} \\ \color{#d66}{-c} & \color{#d66}{a} \end{bmatrix} = A^{-1} $$

Well, this may seem like a lot of words and math to explain a simple formula, but now that I've given the explanation, it's quite easy to remember the intuition:
Because the diagonal entries of $I$ are $1$, the length of a row vector in $A^{-1}$ is constrained by the corresponding column vector in $A$ such that their dot product is $1$. And because the off-diagonal entries of $I$ are $0$, the direction of any row vector in $A^{-1}$ is constrained to be perpendicular to the remaining column vectors of $A$. And notice by the way I've laid this out, this intuition works with any size matrix, although the formula is not so simple, obviously.

score 0 · Answer 12 · answered Feb 21 '21 at 02:59

0

Let $ A $ be an $ n \times n $ invertible matrix. We're looking for an $ M $ such that $ AM = I $. Let's write columns of $ A $ as $ A_1, \ldots, A_n $ ( since $ A $ is invertible, these form a basis of $ \mathbb{R}^n $ ), and those of $ M $ as $ M_1, \ldots, M_n $.

Focusing on $ M_j $, we get $ A_1 m_{1j} + \ldots + A_n m_{nj} = e_j $ ( where $ e_1, \ldots, e_n $ is the standard basis of $ \mathbb{R}^n $ ). Hence $ m_{ij} $ is that scalar $ t $ such that $ e_j - t A_i $ lies in span of $ \{ A_k : k \neq i \} $ [ This is the geometric part ].

So $ \det( A_1, \ldots, e_j - m_{ij} A_i , \ldots, A_n ) = 0 $, from which $ m_{ij} $ can be found [ Here determinants can be avoided when $ n = 2 $. For example $ e_1 - m_{11} A_1 $ is parallel to $ A_2 $, so equating slopes gives $ m_{11} $ ].

answered Feb 21 '21 at 02:59

Venkata Karthik Bandaru

101

1

This doesn't seem to explain what's special about $n = 2$, where there is a reasonably memorable formula for the entries of $A^{-1}$ as rational functions of the entries of $A$; whereas I think few people remember such formulæ for larger $n$ (although of course they exist). Why should the $n = 2$ formula particularly be obvious? – LSpice Feb 21 '21 at 03:41
Yes this is only to answer the geometric intuition part of the question, it doesn't explain how the $ n = 2 $ formula looks special. Maybe I should've mentioned this in the beginning. ( In case the answer is more off-topic, I wouldn't mind deleting / having it deleted ) – Venkata Karthik Bandaru Feb 21 '21 at 05:37

Michael Hardy · Answer 13 · 2024-03-07T00:53:44.773

0

$$( \operatorname{diag}(a,d))^{-1} = \operatorname{diag}\left( \frac1a, \frac1d \right) = \frac1{ad} \operatorname{diag}(d,a) $$ That answers at least part of the question.

edited Mar 07 '24 at 00:53

answered Mar 07 '24 at 00:41

Michael Hardy

11,922
11
81
119

1

Are you missing an inverse somewhere? I note that this special case was hinted at in the comments by Suvrit (12 years ago!) – David Roberts Mar 07 '24 at 00:47
@DavidRoberts : fixed. – Michael Hardy Mar 07 '24 at 00:53

Should the formula for the inverse of a 2x2 matrix be obvious?

13 Answers13

Linked