What is affine transformation in regard to neural networks?

Question

I have been reading a paper recently on Highway Neural Networks and found the following:

$y=H(x,W_H)$

$H$ is usually an affine transform followed by a non-linear activation function, but in general it may take other forms.

After Googling about affine transform I can't say I fully understand what it means. Can somebody please elaborate?

score 13 · Answer 1 · answered Oct 24 '18 at 11:12

Affine transformation is of the form, $$ g(\vec(v) = Av+b $$ where, $A$ is the matrix representing a linear transformation and $b$ is a vector.

In other words, affine transformation is the combination of linear transformation with translation.

Linear transformation always carry vector $b$ = 0 in the source space to 0 in target space.

E.g

$ y=3x + 4$ , in school we called it linear equation, but it is not speaking strictly about linear transformation, because it has translation (+4), and linear transformation don't do that.

so, every linear transformation is affine (just set b to the zero vector). However, not every affine transformation is linear.

Now, in context of machine learning, linear regression attempts to fit a line on to data in an optimal way,

line being defined as , $ y=mx+b$. As explained its not actually a linear function its an affine function. And probably should be renamed. Its good to get the terminologies right.

Similarly, in a single layer of neural network is often expressed mathematically as: $$y(\vec{x})=W\vec{x}+\vec{b} $$

$W$ is the weight matrix and $\vec{b}$ is the bias vector. This function is also usually referred to as linear although it's actually affine.

I have a follow up question. We know, when we have invertible matrices, A composition of affine transformations is an affine transformation. Does it hold for Neural nets? — SSaha, Nov 19 '21 at 18:25
In feed-forward neural nets, we have affine transformations, but the point is that before feeding the output to the next layer we should apply a kind of nonlinearity. Consequently, I guess you may have figured out the answer. — Green Falcon, Nov 19 '21 at 18:50
But we can consider it linear if we include the bias term in $W$ (bias trick)? — ado sar, Jan 28 '24 at 14:56

score 8 · Accepted Answer · edited Nov 19 '21 at 22:37

8

It is a linear transformation. For example, lines that were parallel before the transformation are still parallel. Scaling, rotation, reflection etcetera. With regard to neural networks, it is usually just the input matrix multiplied by the weight matrix.

edited Nov 19 '21 at 22:37

Ethan

1,633
9
24
39

answered Aug 13 '16 at 12:06

Jan van der Vegt

9,368
35
52

2

This answer is wrong. Affine transformations have a nonlinear component, namely translation. – user76284 Nov 05 '21 at 20:05
1

@user76284 Actually we can consider affine transformations linear under the homogeneous coordinate system. – Green Falcon Nov 19 '21 at 18:48
@GreenFalcon That's not what "linear transformation" means in this context. – user76284 Nov 20 '21 at 01:49
@user76284 Yes actually. I just wanted to point out how people make translation a linear transformation in another space. – Green Falcon Nov 20 '21 at 10:49

score 2 · Answer 3 · answered May 01 '18 at 06:26

If $x$ is your input vector, an affine transformation over $x$ will have this form:

$$ y = Ax+b $$

where the coefficients of the matrix $A$ and the vector $b$ are the parameters of the transformation. So it's like a linear function over $x$ but it's not a linear mapping in terms of vector spaces, although you can take it to the form of a linear mapping using homogeneous coordinates.

Tom Huntington · Answer 4 · 2022-04-09T23:58:00.440

To undersand mathematically what a linear and affine transformations are, read Pratik_Katte's answer.

But in machine learning what are called linear layers are actually mathematically affine transformations $\mathbb{R}^n\rightarrow\mathbb{R}^m$ (i.e. transfomations on the features as a vector). What are called affine layers are actually $n$ affine transformations $\mathbb{R}\rightarrow\mathbb{R}$ (i.e. transformations on the coordinates of the features)

class Affine(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.alpha = nn.Parameter(torch.ones(dim))
        self.beta = nn.Parameter(torch.zeros(dim))
def forward(self, x):
    return self.alpha * x + self.beta

https://github.com/facebookresearch/deit/blob/main/resmlp_models.py

Where as the psudo-code for linear layers might go

class Linear(Module):
    def __init__(self, in_features: int, out_features: int):
        super(Linear, self).__init__()
        self.weight = Parameter(torch.rand((out_features, in_features))
        self.bias = Parameter(torch.rand(out_features))

What is affine transformation in regard to neural networks?

4 Answers4