1

Let us suppose a matrix $\mathbf{A}$ and a vector $\mathbf{b}$.

$$ A = \begin{bmatrix}a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} $$

and a vector $\mathbf{b}$ $$b = \begin{bmatrix}b_{1} & b_{2} & b_{3}\end{bmatrix}$$

I want to create a matrix $\mathbf{C}$ whose lines be the lines of $\mathbf{A}$ multiplied by $\mathbf{b}$, yielding

$$ C = \begin{bmatrix}a_{11}b_1 & a_{12}b_2 & a_{13}b_3\\ a_{21}b_1 & a_{22}b_2 & a_{23}b_3\\ a_{31}b_1 & a_{32}b_2 & a_{33}b_3 \end{bmatrix} $$

For it, I am using a Do loop, as follows

Do[C[[i, ;;]] = A[[i, ;;]]*b, {i, 1, Length@A}]

Does anyone know a built-in function able to do that?

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574

1 Answers1

3

Here a brief comparison for medium-sized, dense numerical arrays:

n = 5000;
A = RandomReal[{-1, 1}, {n, n}];
b = RandomReal[{-1, 1}, {n}];

r1 = b # & /@ A; // RepeatedTiming // First

r2 = A.DiagonalMatrix[b]; // RepeatedTiming // First

r3 = A.DiagonalMatrix[SparseArray[b]]; // RepeatedTiming // First

r4 = Compile[{{u, _Real, 1}, {v, _Real, 1}}, u v,
      RuntimeAttributes -> {Listable}, Parallelization -> True
      ][A, b]; // RepeatedTiming // First

r1 == r2 == r3 == r4

0.320

1.70

0.094

0.080

True

The problem of the second method is that it has complexity $O(n^3)$. All the others are of order $O(n^2)$. The last method is both vectorized and parallelized. That gives it a slight edge over the third method for this example and on my machine (a somewhat dated 4-Core Haswell CPU).

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
  • Thank you for answering. Adding RuntimeOptions -> "Speed" I got 0.088 s, while not using it I got 0.073. Another thing that I noticed it was that adding CompilationTarget -> "C" I got a worse time too. Do you know why? – Riobaldo Tatarana May 06 '20 at 20:50
  • 1
    Hm. To my knowledge, RuntimeOptions -> "Speed" should have no effect unless CompilationTarget -> "C" is set and the code has a specific structure (e.g., when CompileGetElementoccurs). So I would blame that to "fluctuations". Or maybe your computer ran hot and the CPU is throttling...^^.CompilationTarget -> "C"should do no good becauseTimes`, when called upon packed arrays already calls optimized, vectorized libraries; so compiling the call to that library once more adds just calling overhead. And maybe a couple of superfluous internal copy operations... – Henrik Schumacher May 06 '20 at 20:56
  • I don't had ran using CompilationTarget -> "C" and RuntimeOptions -> "Speed". Using the both I got 0.22 s, the same time of r1. Did my computer get crazy? :O – Riobaldo Tatarana May 06 '20 at 21:02
  • 1
    Ah, also have in mind that CompilationTarget -> "C" has quite a lot of overhead because it has to compile the function with your system's C-compiler. Per default, Compile just runs the code in the "WVM" (Wolfram Virtual Machine); that is not as fast as running a compiled library but preparing the code for that takes only a couple of milliseconds. – Henrik Schumacher May 06 '20 at 21:05