2

I am trying to fine tune a procedure to reduce its running time. It involves two large arrays "data" and "positions":

size=10^6;
data=RandomReal[1,size];
positions=RandomInteger[{1,size},24*size];

Here, "positions" never changes. However, "data" is constantly changing along the procedure and every time this happens, I need to produce the array

evalData=data[[positions]];

My questions is, considering both the sizes and the fact that positions is fixed, what would be the fastest way of refreshing evalData every time data changes?

I have tried compiling Part but it takes about the same time as using Part directly. Also, ParallelMap and ParallelTable are very slow (even when I tried it on a computer with 10 cores). I also thought about compiling Part together with positions but this seems to take too much memory. Any advice?

UPDATE 1: Perhaps my first question was overly simplified. This is in the context of a gradient descent algorithm, and the array positions actually has rank two.

size=10^6;
data=RandomReal[1,size];
positions=RandomInteger[{1,size},{6*size,4}];

I am trying to optimize the following code:

Do[
dataTemp = Transpose[Map[data[[#]] &, Transpose[positions]]];
dataDescent = compiledFunction[dataTemp];
data=data-0.5*dataDescent;
,1000]

where compiledFunction is listable and performs a computation on each array of size 4 stored in dataTemp, in parallel, and it is currently much faster than the line

dataTemp = Transpose[Map[data[[#]] &, Transpose[positions]]];

I could store transposedPositions=Transpose[positions]; and save some time by instead calling

dataTemp = Transpose[Map[data[[#]] &, transposedPositions]];

but it is not a huge improvement.

UPDATE 2:

Found the following related unanswered question

mmen
  • 163
  • 5
  • This is a tricky situation to optimize, since the best approach for it probably depends on what exactly you're doing with evalData. – eyorble Sep 24 '22 at 15:18
  • 1
    The evalData=data[[positions]] call takes about 0.5 seconds on my machine. For comparison, pure allocation of an array of this size using evalData=ConstantArray[0.,24*size] takes close to 0.1 seconds, a similar order of magnitude. Btw, I do wonder how this call can be the bottleneck of any computation. Unless of course you do not actually use most of the entries in evalData, in which case, why construct it at all? – user293787 Sep 24 '22 at 15:18
  • I have updated the question to add more context. – mmen Sep 24 '22 at 15:41
  • 1
    Have you tried Extract? – Michael E2 Sep 24 '22 at 16:32
  • Here is a try with Extract: data = RandomReal[1, 10^6]; positions = RandomInteger[{1, Length[data]}, 6*10^6]; AbsoluteTiming[data[[positions]];] // First positions = positions /. i_Integer :> {i}; AbsoluteTiming[Extract[data, positions];] // First gives `0.085491

    0.554953It is slower thanPart`.

    – mmen Sep 24 '22 at 16:44
  • Here is something puzzling. `In[325]:= data = RandomReal[1, 10^6]; positions = RandomInteger[{1, Length[data]}, {4, 6*10^6}]; positionsFlat = Flatten[positions]; RepeatedTiming[Map[data[[#]] &, positions];] // First RepeatedTiming[{data[[positions[[1]]]], data[[positions[[2]]]], data[[positions[[3]]]], data[[positions[[4]]]]};] // First RepeatedTiming[data[[positionsFlat]];] // First

    Out[328]= 0.436071

    Out[329]= 0.45696

    Out[330]= 0.554789` 3rd case consistently gets larger times, despite performing the same evaluations with less functions being called.

    – mmen Sep 24 '22 at 16:58
  • 1
    Use a packed array for positions in using Extract[]. Then Extract[] is twice as fast as Part[] in Mma online. (Your data is not very big. Is that the typical size?) – Michael E2 Sep 25 '22 at 04:25
  • Oh, didn't realize that that ReplaceAll unpacks the array. Fixed it, but it only makes Extract have the same time as Part on my machine. (Also, in my application I would have multiple data arrays, say 100 of them, and they would all go through the same procedure in parallel) – mmen Sep 25 '22 at 09:43

0 Answers0