What is the adjoint of a linear operator and why is it useful?

Question

The concept of linear operators and their adjoints arises frequently in some corners of signal processing, but is not particularly well documented, at least from a signal processing perspective (you could probably find a decent amount of more abstract mathematical theory over on the math SE).

So what is the adjoint of a linear operator? What is the intuition for what they do? What are they good for? What would be some examples of linear operators and their corresponding adjoints? Where could I find more information on this topic?

Edit

I've provided an answer below, but if anyone can provide any more references (links, articles, books, papers, videos, tutorials etc.) it would be appreciated.

Gillespie · Accepted Answer · 2023-11-30T01:55:42.077

Here's the best practical information I have so far on linear operators and their adjoints. There's only one book I've come across that discusses this very practically (which I reference below); perhaps others can add additional resources.

Definition

The adjoint of a linear operator is a generalization of the Hermitian transpose of a matrix to linear operators. Let $A$ represent a linear operator. Then the adjoint, $A^H$, satisfies the following relation, where $ \langle \cdot, \cdot \rangle $ refers to the inner product:

$$ \langle A x, y \rangle = \langle x, A^H y \rangle $$

Expanding first the left side and then the right, we get:

$$ (A x)^H y = \langle x, A^H y \rangle $$ $$ x^H A^H y = x^H A^H y $$

The operator $A$ can be specified as an explicit matrix, in which case $A^H$ is just the complex conjugate transpose of the matrix $A$ (or $A^T$ for real matrices). But since the explicit matrix version of many operators would be very large, it is more efficient if $A$ is an implicit function implementation of the operation in question. But then the difficulty is getting the adjoint function.

Examples of Operators and Their Adjoints

The Discrete Fourier Transform is a familiar example of a linear operator. It can be implemented as an explicit matrix, $D$ (e.g, with MATLAB's dftmtx function). Then the DFT of a vector $x$ is $X = D x$. But it is much better to use the FFT, which is an implicit, efficient, functional implementation of the DFT. The FFT is an implicit operator implementation of the DFT.

If we had a DFT matrix $D$, then the adjoint of the DFT is just the conjugate transpose of the matrix $D$ ($D^H$). But how would we implement this as an implicit function? It turns out that the adjoint of the FFT operator is the IFFT.¹ You can easily verify this in MATLAB using the dftmtx and fft functions.

Further examples of operators and their corresponding adjoints are shown in the table below:²

Forward operator	Adjoint
Matrix multiply	Conjugate transpose matrix multiply
Convolve	Cross-correlate
Truncate	Zero pad
Decimate	Insert zeros
Stretch	Squeeze
FFT	Inverse FFT
Plane-wave superposition	Beamform
Ray tracing	Tomography

What are Adjoints Used For?

Solving Inverse Problems
Adjoints are often used to solve inverse problems. These arise when some forward operation ($A$) has been applied to an unknown "scene" or "channel" ($x$) to produce some measurements ($b$): $$ A x = b \tag{1} $$

The problem is to use our knowledge of the data collection operation, $A$, and the measurements, $b$, to obtain an estimate of the scene or channel which we will call $\hat{x}$.

Approximate Inverses
One use of adjoint operators is as approximate inverse operators. If the inverse of an operator ($A^{-1}$) does not exist, but the adjoint ($A^H$) does, the adjoint essentially comes as close to the inverse as possible.

An example would be an operator that decimates a vector by a factor of 2, removing every other sample. The adjoint of decimation is zero insertion. So a length 10 vector decimated by a factor of 2, and then zero-inserted back to its original length would have every other sample replaced by 0. We cannot reverse the decimation operation because decimation throws information away, but the adjoint attempts to come as close as possible.

Iterative Estimation
The adjoint of an operator can be used (along with the forward operator) to solve $A x = b$ iteratively. Iterative processing can use least squares methods such as LSQR (MATLAB) or conjugate gradient. If $x$ is known to be sparse, basis pursuit (or basis pursuit denoising) techniques can produce better results (using packages such as SPGL1). Many of these iterative algorithms rely on having the adjoint operator.

Example
A data collection system can be modeled as a set of operations that interrogate a "scene." The overall operator $A$ could be a composite of many smaller operators: $A = O_1 \cdot O_2 \cdot O_3 ... O_N $. Example systems include:

Seismograms for seismetic imaging
An MRI machine, X-ray, or CT scanner for medical imaging
A radar transmitter and receiver
An optical camera

Ideally, an inverse of the data collection operation ($A^{-1}$) could be applied to the measured data to recover an image of the scene that gave rise to the data. But often the true inverse operation does not exist, and moreover, if it did, it might be unstable and sensitive to noise.

If we instead have adjoints for each of the operations that make up the data collection scheme, we can use them to approximate the inverse operation and get an estimate of the the scene: $A^\dagger \approx A^H = O_N^H \cdot O^H_{N-1} \cdot O^H_{N-2} ... O_1^H $. In many applications (e.g., many of the image formation examples above), the adjoint can actually be preferable to the true inverse because it tolerates instability or information loss in the data better than the inverse.²

If the adjoint is not an acceptable approximation to the inverse, it can at least be used with the forward operator to solve the problem iteratively, as noted above.

Intuition: What is an Adjoint Doing?

An adjoint can be thought of as somewhat of a half-way point to an inverse. It reverses the phase changes of an operation, but not necessarily the magnitude changes. To illustrate, consider the definition of the generalized Moore-Penrose inverse: $$ A^\dagger = (A^H A)^{-1} A^H $$

The inverse is composed of two parts: $(A^H A)^{-1}$, and $A^H$. The latter is the adjoint. One can readily see therefore that applying the adjoint is the first step in inversion. The second step is taking the inverse of the autocorrelation of the operator.

In some cases the adjoint and the inverse of an operator are identical (i.e., for unitary operators). For example, the autocorrelation matrix of the DFT matrix $D$ is an identity matrix, giving $D^\dagger = I^{-1} D^H = D^H$.

Code Packages

Packages implementing many linear operators and their adjoints exist in many languages. For example:

Python: PyLops (and no doubt many others)
MATLAB: SPOT (originally SPARCO)
- Unfortunately SPOT is written in object-oriented MATLAB (a horrible thing to do). The original SPARCO package was not, but can only be accessed through archive.org: here is the latest known version
Julia: LinearMaps, LinearOperators, LinearMapsAA, etc.

Books and References

The only practical book I've come across that talks about operators is Geophysical Image Estimation by Example, by Jon Claerbout of Stanford.

Here is a paper that describes the concept as well: "Efficient adjoint computation for wavelet and convolution operators," by Folberth and Becker.

I'll add any other books, lectures, videos, etc. as I find them.

Footnotes

¹Note that the DFT operation is a special case, for which the inverse operator is also the adjoint (assuming the scale factors related to $N$ are chosen appropriately). This is known as a unitary operator.

²See Chapter 1 of "Geophyisical Image Estimation by Example," by Jon Claerbout

This was great. I've been on a crusade to have more folks understand linear algebra, especially in my field, since it's such a powerful set of concepts that not only provide many great results we use, bit is elegant in representation and computation — Envidia, Oct 18 '23 at 01:37
If either of you (or anyone else) has any resources on this topic that are remotely helpful, please share! I've scoured the world for info and haven't found much. — Gillespie, Oct 19 '23 at 13:31
@Gillespie Have you read Mathematical Methods and Models for Signal Processing by Moon & Stirling. It dives into the importance of the adjoint operator and its relation with the Normal Equations, particularly the Gramian Matrix and the Cross-Correlation vector. — Ahsan Yousaf, Feb 20 '24 at 09:21
@AhsanYousaf no I haven't, I'll have to take a look at that. Thanks! — Gillespie, Feb 20 '24 at 20:29