1

I am trying to implement a simple logistic regression algorithm from scratch in python (for learning purposes). Every article I've seen online so far presents the following expression for $z$ (argument of the sigmoid function):

$$z = \theta^T\cdot x$$

However when they implement it in code they always use np.dot(X, theta) which is not the same thing. I have carefully tried to trace the dimensions of all the arrays as follows:

Dot product property: $\theta^T\cdot x = x^T\cdot\theta$

  • $x$ is of dimension $l\times (d+1)$ where $l$ is the number of records and $d+1$ the number of features (including the $x_0$ ones vector).

  • $\theta$ is of dimensions $1 \times (d+1)$ (every column in $x$ gets a weight, including the ones vector).

  • $z$ is the dot product of $\theta^T$ and $x$, it will have dimensions $((d+1)\times 1)\cdot(l \times 1) = ??$ --> DOES NOT WORK!

There appears to be no way to arrange the last statement such that it remains mathematically correct and yielding an array of dimension $l\times1$ or $1\times l$ without fundamentally changing either:

  • the shape of $x$
  • the equation itself (as they do in the code)

So which is correct? The code or the mathematical statement? Why are they contradictory? Please help. Thanks.

user32882
  • 251
  • 3
  • 9
  • Could you point at a reference for the sigmoid function that has the z=...? I'm looking at https://scicomp.stackexchange.com/questions/4826/logistic-regression-with-python and https://www.geeksforgeeks.org/understanding-logistic-regression to get into the right area. Already reading this implementation https://github.com/Selim78/variance-reduction-methods as well. – Michael McGarrah Mar 10 '19 at 16:24
  • In https://www.johnwittenauer.net/machine-learning-exercises-in-python-part-3/ I'm seeing them use "sigmoid(X * theta.T)" for their implementations. – Michael McGarrah Mar 10 '19 at 16:30
  • 1
    It's almost certainly the case that $\theta^{T}x$ is supposed to be the dot product of two vectors (stored as column vectors.) – Brian Borchers Mar 10 '19 at 21:37

1 Answers1

1

They’re doing a dot product across each record using one call to np.dot(.,.). They are doing this to take advantage of fast numpy operations on the whole dataset instead of having you loop through computing $z$ for each record individually using explicit Python code. They are essentially saying that $X = [x_1, x_2, \cdots, x_n]^T$ and doing $Z = X \theta$ such that $Z = [z_1, \cdots, z_n]^T$ and $z_k = \theta^T x_k$.

spektr
  • 4,238
  • 1
  • 18
  • 19