3

I'm looking for tips on quickly applying the following update to $B$ pairs of $(w_i,x_i)$

$$\text{step}(w_i,x_i)=w_i-a x_i \langle w_i, x_i \rangle$$

Below is a readable but very slow version batchStep which uses MapThread which seems unsupported by FunctionCompile.

What's a good pattern for compiling a function like this, For loops writing into pre-allocated array?

d = 2;
h = ConstantArray[1., d];
B = 200;
a = 1.;

step[w_, x_] = w - a x x . w; dist = MultinormalDistribution[DiagonalMatrix[h]]; batchStep[W_] := With[{X = RandomVariate[dist, B]}, MapThread[step, {W, X}]];

W0 = RandomVariate[NormalDistribution[], {B, d}]; Nest[batchStep, W0, 100]; // Timing

Here's an attempt with FunctionCompile which fails with Cannot find a definition for the function Closure CreateRaw that \ takes an argument with the type PackedArray[Real64, 1:Integer64]

  1. Is For loop that writes into pre-allocated array a correct replacement for MapThread in this setting?
  2. Is there a way to specify all intermediate arguments to be Real32 (if it matters for performance)
d = 2;
h = ConstantArray[1., d];
B = 200;
a = 1.;
fun = Function[{Typed[W, "Real32"]},
   Module[{data, out, i, X, diagSqrt},
    diagSqrt = Sqrt[h];
    X = diagSqrt*# & /@ RandomVariate[NormalDistribution[], {B, d}];
    out = ConstantArray[0., {B, d}];
    For[i = 1, i <= B, i += 1,
     out[[i]] = step[W[[i]], X[[i]]]];
    out
    ]
   ];
dec = FunctionDeclaration[fun, Typed[{"Real32"} -> "Real32"]@fun];
cf = FunctionCompile[dec, fun]
Yaroslav Bulatov
  • 7,793
  • 1
  • 19
  • 44
  • Haven't looked into it, but I'd guess the real barrier is not MapThread, but RandomVariate, see: https://mathematica.stackexchange.com/q/1124/1871 – xzczd Mar 29 '23 at 03:05
  • RandomVariate is about half of the time when using MKL initialization – Yaroslav Bulatov Mar 29 '23 at 03:39
  • I believe, currently, MapThread is only defined for "DenseArray". Check for detail in system files here: SystemFiles\Components\Compile\TypeSystem\Declarations\RectangularArray\DenseArray\Functional\MapThread.m. – Silvia Aug 27 '23 at 06:14

1 Answers1

1

It appears MapThread is so optimized that there's no point in using compilation in this case. Every modification I tried makes code slower.

ClearAll["Global`*"];
d = 1000;
B = 1000;
a = 2/(d + 1);
numSteps = 100;
sample := RandomVariate[NormalDistribution[], {B, d}];
W0 = sample;

Print["matrix"]; SeedRandom[1, Method -> "MKL"]; batchStep[W_] := With[{s = sample}, W - as (Total[Ws, {2}])]; Nest[batchStep, W0, numSteps] // First // First // AbsoluteTiming

Print["mapthread"]; SeedRandom[1, Method -> "MKL"]; step[w_, x_] = w - a x x . w; batchStep[W_] := MapThread[step, {W, sample}]; Nest[batchStep, W0, numSteps] // First // First // AbsoluteTiming

Print["compiled mapthread"]; d = 1000; B = 1000; a = 2/(d + 1); numSteps = 100; SeedRandom[1, Method -> "MKL"]; s = sample; batchStep = Compile[{{W, _Real, 2}}, With[{a0 = a}, MapThread[#1 - a0 #2#2 . #1 &, {W, RandomVariate[NormalDistribution[], {B, d}]}] ], CompilationTarget -> "C"]; Nest[batchStep, W0, numSteps] // First // First // AbsoluteTiming

Yaroslav Bulatov
  • 7,793
  • 1
  • 19
  • 44