Here's the best practical information I have so far on linear operators and their adjoints. There's only one book I've come across that discusses this very practically (which I reference below); perhaps others can add additional resources.
Definition
The adjoint of a linear operator is a generalization of the Hermitian transpose of a matrix to linear operators. Let $A$ represent a linear operator. Then the adjoint, $A^H$, satisfies the following relation, where $ \langle \cdot, \cdot \rangle $ refers to the inner product:
$$
\langle A x, y \rangle = \langle x, A^H y \rangle
$$
Expanding first the left side and then the right, we get:
$$
(A x)^H y = \langle x, A^H y \rangle
$$
$$
x^H A^H y = x^H A^H y
$$
The operator $A$ can be specified as an explicit matrix, in which case $A^H$ is just the complex conjugate transpose of the matrix $A$ (or $A^T$ for real matrices). But since the explicit matrix version of many operators would be very large, it is more efficient if $A$ is an implicit function implementation of the operation in question. But then the difficulty is getting the adjoint function.
Examples of Operators and Their Adjoints
The Discrete Fourier Transform is a familiar example of a linear operator. It can be implemented as an explicit matrix, $D$ (e.g, with MATLAB's dftmtx function). Then the DFT of a vector $x$ is $X = D x$. But it is much better to use the FFT, which is an implicit, efficient, functional implementation of the DFT. The FFT is an implicit operator implementation of the DFT.
If we had a DFT matrix $D$, then the adjoint of the DFT is just the conjugate transpose of the matrix $D$ ($D^H$). But how would we implement this as an implicit function? It turns out that the adjoint of the FFT operator is the IFFT.1 You can easily verify this in MATLAB using the dftmtx and fft functions.
Further examples of operators and their corresponding adjoints are shown in the table below:2
| Forward operator |
Adjoint |
| Matrix multiply |
Conjugate transpose matrix multiply |
| Convolve |
Cross-correlate |
| Truncate |
Zero pad |
| Decimate |
Insert zeros |
| Stretch |
Squeeze |
| FFT |
Inverse FFT |
| Plane-wave superposition |
Beamform |
| Ray tracing |
Tomography |
What are Adjoints Used For?
Solving Inverse Problems
Adjoints are often used to solve inverse problems. These arise when some forward operation ($A$) has been applied to an unknown "scene" or "channel" ($x$) to produce some measurements ($b$):
$$
A x = b \tag{1}
$$
The problem is to use our knowledge of the data collection operation, $A$, and the measurements, $b$, to obtain an estimate of the scene or channel which we will call $\hat{x}$.
Approximate Inverses
One use of adjoint operators is as approximate inverse operators. If the inverse of an operator ($A^{-1}$) does not exist, but the adjoint ($A^H$) does, the adjoint essentially comes as close to the inverse as possible.
An example would be an operator that decimates a vector by a factor of 2, removing every other sample. The adjoint of decimation is zero insertion. So a length 10 vector decimated by a factor of 2, and then zero-inserted back to its original length would have every other sample replaced by 0. We cannot reverse the decimation operation because decimation throws information away, but the adjoint attempts to come as close as possible.
Iterative Estimation
The adjoint of an operator can be used (along with the forward operator) to solve $A x = b$ iteratively. Iterative processing can use least squares methods such as LSQR (MATLAB) or conjugate gradient. If $x$ is known to be sparse, basis pursuit (or basis pursuit denoising) techniques can produce better results (using packages such as SPGL1). Many of these iterative algorithms rely on having the adjoint operator.
Example
A data collection system can be modeled as a set of operations that interrogate a "scene." The overall operator $A$ could be a composite of many smaller operators: $A = O_1 \cdot O_2 \cdot O_3 ... O_N $. Example systems include:
- Seismograms for seismetic imaging
- An MRI machine, X-ray, or CT scanner for medical imaging
- A radar transmitter and receiver
- An optical camera
Ideally, an inverse of the data collection operation ($A^{-1}$) could be applied to the measured data to recover an image of the scene that gave rise to the data. But often the true inverse operation does not exist, and moreover, if it did, it might be unstable and sensitive to noise.
If we instead have adjoints for each of the operations that make up the data collection scheme, we can use them to approximate the inverse operation and get an estimate of the the scene: $A^\dagger \approx A^H = O_N^H \cdot O^H_{N-1} \cdot O^H_{N-2} ... O_1^H $. In many applications (e.g., many of the image formation examples above), the adjoint can actually be preferable to the true inverse because it tolerates instability or information loss in the data better than the inverse.2
If the adjoint is not an acceptable approximation to the inverse, it can at least be used with the forward operator to solve the problem iteratively, as noted above.
Intuition: What is an Adjoint Doing?
An adjoint can be thought of as somewhat of a half-way point to an inverse. It reverses the phase changes of an operation, but not necessarily the magnitude changes. To illustrate, consider the definition of the generalized Moore-Penrose inverse:
$$
A^\dagger = (A^H A)^{-1} A^H
$$
The inverse is composed of two parts: $(A^H A)^{-1}$, and $A^H$. The latter is the adjoint. One can readily see therefore that applying the adjoint is the first step in inversion. The second step is taking the inverse of the autocorrelation of the operator.
In some cases the adjoint and the inverse of an operator are identical (i.e., for unitary operators). For example, the autocorrelation matrix of the DFT matrix $D$ is an identity matrix, giving $D^\dagger = I^{-1} D^H = D^H$.
Code Packages
Packages implementing many linear operators and their adjoints exist in many languages. For example:
Books and References
The only practical book I've come across that talks about operators is Geophysical Image Estimation by Example, by Jon Claerbout of Stanford.
Here is a paper that describes the concept as well: "Efficient adjoint computation for wavelet and convolution operators," by Folberth and Becker.
I'll add any other books, lectures, videos, etc. as I find them.
Footnotes
1Note that the DFT operation is a special case, for which the inverse operator is also the adjoint (assuming the scale factors related to $N$ are chosen appropriately). This is known as a unitary operator.
2See Chapter 1 of "Geophyisical Image Estimation by Example," by Jon Claerbout