9

I have been reading a paper recently on Highway Neural Networks and found the following:

$y=H(x,W_H)$

$H$ is usually an affine transform followed by a non-linear activation function, but in general it may take other forms.

After Googling about affine transform I can't say I fully understand what it means. Can somebody please elaborate?

Ethan
  • 1,633
  • 9
  • 24
  • 39
minerals
  • 2,147
  • 3
  • 17
  • 19

4 Answers4

13

Affine transformation is of the form, $$ g(\vec(v) = Av+b $$ where, $A$ is the matrix representing a linear transformation and $b$ is a vector.

In other words, affine transformation is the combination of linear transformation with translation.

Linear transformation always carry vector $b$ = 0 in the source space to 0 in target space.

E.g

$ y=3x + 4$ , in school we called it linear equation, but it is not speaking strictly about linear transformation, because it has translation (+4), and linear transformation don't do that.

so, every linear transformation is affine (just set b to the zero vector). However, not every affine transformation is linear.

Now, in context of machine learning, linear regression attempts to fit a line on to data in an optimal way,

line being defined as , $ y=mx+b$. As explained its not actually a linear function its an affine function. And probably should be renamed. Its good to get the terminologies right.

Similarly, in a single layer of neural network is often expressed mathematically as: $$y(\vec{x})=W\vec{x}+\vec{b} $$

$W$ is the weight matrix and $\vec{b}$ is the bias vector. This function is also usually referred to as linear although it's actually affine.

Pratik_Katte
  • 131
  • 1
  • 2
  • Any source for this? – user3352632 Oct 28 '21 at 09:16
  • I have a follow up question. We know, when we have invertible matrices, A composition of affine transformations is an affine transformation. Does it hold for Neural nets? – SSaha Nov 19 '21 at 18:25
  • In feed-forward neural nets, we have affine transformations, but the point is that before feeding the output to the next layer we should apply a kind of nonlinearity. Consequently, I guess you may have figured out the answer. – Green Falcon Nov 19 '21 at 18:50
  • But we can consider it linear if we include the bias term in $W$ (bias trick)? – ado sar Jan 28 '24 at 14:56
8

It is a linear transformation. For example, lines that were parallel before the transformation are still parallel. Scaling, rotation, reflection etcetera. With regard to neural networks, it is usually just the input matrix multiplied by the weight matrix.

Ethan
  • 1,633
  • 9
  • 24
  • 39
Jan van der Vegt
  • 9,368
  • 35
  • 52
2

If $x$ is your input vector, an affine transformation over $x$ will have this form:

$$ y = Ax+b $$

where the coefficients of the matrix $A$ and the vector $b$ are the parameters of the transformation. So it's like a linear function over $x$ but it's not a linear mapping in terms of vector spaces, although you can take it to the form of a linear mapping using homogeneous coordinates.

Martín
  • 21
  • 2
0

To undersand mathematically what a linear and affine transformations are, read Pratik_Katte's answer.

But in machine learning what are called linear layers are actually mathematically affine transformations $\mathbb{R}^n\rightarrow\mathbb{R}^m$ (i.e. transfomations on the features as a vector). What are called affine layers are actually $n$ affine transformations $\mathbb{R}\rightarrow\mathbb{R}$ (i.e. transformations on the coordinates of the features)

class Affine(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.alpha = nn.Parameter(torch.ones(dim))
        self.beta = nn.Parameter(torch.zeros(dim))
def forward(self, x):
    return self.alpha * x + self.beta

https://github.com/facebookresearch/deit/blob/main/resmlp_models.py

Where as the psudo-code for linear layers might go

class Linear(Module):
    def __init__(self, in_features: int, out_features: int):
        super(Linear, self).__init__()
        self.weight = Parameter(torch.rand((out_features, in_features))
        self.bias = Parameter(torch.rand(out_features))