Faster code for specific problem

Question

Sorry, but I need the following code much faster and I'm running out of ideas. I do appreciate any help. I post two of my fastest ideas (tab and mat), although they are not really fast. Thanks in advance.

Clear[bNum,nNum,weights,newMat,sortMat,tab];

bNum=20;
nNum=100;

weights=Table[RandomReal[{-10,10}],{b,1,bNum},{n,1,nNum}];
newMat=Table[RandomReal[{-10,10}],{b,1,bNum},{n,1,nNum}];
sortMat=Table[i,{i, -10, 10, 0.1}];

tab=Table[
Sum[weights[[k,i]]*If[newMat[[k,i]] <= sortMat[[j]],1.,0.],{i,1,nNum}],{k,1,bNum}, 
{j,1,Length[sortMat]}];//AbsoluteTiming

mat=Block[{k=1},
Reap[Do[
Sow[Table[Sum[weights[[k,i]]*If[newMat[[k,i]] <= sortMat[[j]],1.,0.],{i,1,nNum}],{j,1,Length[sortMat]}]];
k++,{bNum}]]][[2,1]];//AbsoluteTiming

Rephrase your question to be an actual question, format your code (press the golden question mark in the top right of the input text field to read about formatting, it leads to http://mathematica.stackexchange.com/editing-help ), and explain what you are trying to do rather then just post a code snippet with the request of "make it faster". — jVincent, Oct 05 '12 at 14:10
To restate jVincent's plea: please state your actual problem, format your code, and then we can be more helpful... — J. M.'s missing motivation, Oct 05 '12 at 14:23
You can compile it. There is one Wolfram help link http://reference.wolfram.com/mathematica/tutorial/CompilingMathematicaExpressions.html — Murta, Oct 05 '12 at 14:37
Hi Chris, and welcome to the Mathematica StackExchange. Sorry if some of the first responses you received seem overly critical. We sometimes forget that new folks need some time to understand just how the site works. Don't forget to upvote good answers (and other people's questions) using the triangle above the number next to the post, and use the checkmark to "accept" the answer to your question that you think best answers it. — DavidC, Oct 05 '12 at 14:48
This is faster: Timing[tab2 = Transpose[Table[ Map[Total,weights*UnitStep[sortMat[[j]]-newMat]], {j,Length[sortMat]}]];] In general you want to use Mathematica's list-based operations for this sort of thing. — Daniel Lichtblau, Oct 05 '12 at 16:46
Will sortMat always contain an evenly spaced list of numbers? If so, a significant speedup is possible. Otherwise, the best solutions will likely first sort the rows of newMat and weights in parallel and accumulate the weights before doing any testing: that should get a few orders of magnitude speedup :-). — whuber, Oct 05 '12 at 16:55
@whuber: yes the list contains always evenly spaced numbers. — chris, Oct 06 '12 at 05:52

score 9 · Answer 1 · edited Apr 13 '17 at 12:55

9

As I have advocated in another thread, usually the best way (by far) to speed up a calculation is first to improve the algorithm. Then--if it remains necessary--you can focus on platform-specific methods to speed up execution.

Here, newMat and weights are viewed as lists of vectors to be processed in parallel and (according to a comment to the question) sortMat is really a sequence $(y_0, y_0 + dy, y_0 + 2 dy, \ldots, y_0 + (n-1) dy)$. For each $y$ in this sequence, the algorithm seeks the sum of weights associated with elements $x$ of newMat less than or equal to $y$.

[To find those elements, we change the units of measurement of x and sortMat so that the new sortMat is the integral sequence $(1, 2, \ldots, n)$ and we sort x (and sort the weights in parallel with x to keep their association intact). After these preliminaries, the integer part of any entry in x tells us precisely how many elements of sortMat are less than it. Thus--work a few simple examples to check--the differences in this sorted, re-expressed x tell us how many times each cumulative weight ought to appear in the output. The larger sortMat is relative to x, the more computation this approach saves.]

An efficient approach would capitalize on a preliminary sorting of newMat (which requires sorting the corresponding vector weights in parallel). Do this with Ordering. This reduces the amount of computation to the length of each row of newMat rather than the length of sortMat and it reduces a triple nested loop to a double nested loop (which is the minimum possible, given that the output is a rank 2 array).

ClearAll[process];
process[x_, w_, {y0_, dy_, ny_}] := Module[{x0, w0, v, i, f},
   x0 = Min[ny, Max[0, Ceiling[(# - y0)/dy]]] & /@ x;
   x0 = {0}~Join~Sort[x0]~Join~{ny};
   w0 = {0}~Join~Accumulate[w[[Ordering[x]]]];
   f := Function[{v, i}, Flatten[ConstantArray[#[[1]], #[[2]]] & /@ ({v, i}\[Transpose])]];
   f[w0, Differences[x0]]
   ];

Let's check the timing and the correctness of this solution. Because it's much faster, let's make newMat have ten times as many rows as before:

bNum = 200;
nNum = 100;
weights = Table[RandomReal[{-10, 10}], {b, 1, bNum}, {n, 1, nNum}];
newMat = Table[RandomReal[{-10, 10}], {b, 1, bNum}, {n, 1, nNum}];
sortMat = Table[i, {i, -10, 10, 0.1}];

Now:

tab = Table[Sum[weights[[k, i]]*
      If[newMat[[k, i]] <= sortMat[[j]], 1., 0.], {i, 1, nNum}], 
          {k, 1, bNum}, {j, 1, Length[sortMat]}]; // AbsoluteTiming

{8.2744732, Null}

and (describing sortMat with the list $(y_0, dy, n)$ = $(-10, 0.1, 201)$)

tab0 = MapThread[process[#1, #2, {-10, 0.1, 201}] &, {newMat, weights}]; // AbsoluteTiming

{0.0920053, Null}

It's 89 times faster, yet produces the same result to within floating point error:

Plus @@ (Abs[Chop[#]] & /@ (Flatten[tab] - Flatten[tab0])) == 0.0

True

Now, if you wish to speed up the improved algorithm with a compilation, go ahead.

Edit

The comments and a further chat session (linked in the comments) uncovered some implicit assumptions in the original solution and led to an improvement that more closely matches the original. The two algorithms are not completely identical when applied to machine-precision numbers, though, due to differences in how the values in sortMat are computed and compared to the values in newMat. Testing indicates they do agree except when elements of newMat appear to equal elements of sortMat: an equality test might or might not return True in such cases.

edited Apr 13 '17 at 12:55

Community

1

answered Oct 07 '12 at 18:11

whuber

20,544
2
59
111

I'd replace If[newMat[[k, i]] <= sortMat[[j]], 1., 0.] with Boole[newMat[[k, i]] <= sortMat[[j]]]. – J. M.'s missing motivation Oct 08 '12 at 03:30
1

@J.M. You do see that's in the non-optimized part, right? – Mr.Wizard Oct 08 '12 at 03:33
@Mr. Wizard: Ah, right. After looking at the method in more detail, I think a judicious use of Dot[] might lead to a better route... – J. M.'s missing motivation Oct 08 '12 at 03:36
@J.M. I'd like to see another good use of Dot. Please. – Mr.Wizard Oct 08 '12 at 03:39
I suppose something like Outer[First[#1].Boole[Thread[Last[#1] <= #2]] &, Transpose[{weights, newMat}], sortMat, 1] or MapThread[Table[#1.Boole[Thread[#2 <= k]], {k, sortMat}] &, {weights, newMat}], @Mr.Wizard – J. M.'s missing motivation Oct 08 '12 at 03:57
1

@whuber: thanks a lot for this impressive speed improvement. This is exactly what I was looking for.
Thanks to all the others as well.
– chris Oct 08 '12 at 08:24
@whuber: I did compile it and it works just fine. Thanks again. However, I would very much appreciate if you could explain your idea behind the vector x0. I just can't get a grip on it. – chris Oct 09 '12 at 16:52
I inserted a third paragraph outlining the idea. – whuber Oct 09 '12 at 17:04
@whuber: Thanks for the enlightenment. However, I still have trouble adapting the code once the values of sortMathave smaller range than the values in newMat. For example, if you change (in my previous example) sortMat to sortMat=Table[i,{i, -9, 9, 0.1}]; the code breaks down. – chris Oct 10 '12 at 14:52
Chris, I am surprised at that statement, because I developed and tested this method on a much shorter version of sortMat. Nevertheless, I went back and compared my results with yours using various ranges for sortMat and obtained exactly the same results as your code. Could you please then tell us precisely what "breaks down" means? Is it possible you forgot to change the argument {-10, 0.1, 201} to {-9, 0.1, 181} to reflect the newer version of sortMat? – whuber Oct 10 '12 at 15:06
Thanks for your patience. If you change sortMat to sortMat=Table[i,{i, -9, 9, 0.1}] negative entries in Differences[x0] are generated. Thus, mapping the constant array produces an error. I have the arguments as they should be {-9,0.1,181} . – chris Oct 10 '12 at 15:21
I see the problem. Although sorting makes negative entries impossible, I assumed that sortMat would span the range of the x values. That assumption is reflected in prepending $0$ and appending ny in the first line of the solution. When this assumption is false--and I apologize for not pointing this out and implementing it in the code--you need to clamp the values of x0 to the range from 0 to ny. I'll insert that line. (It doesn't change the timing at all, fortunately.) – whuber Oct 10 '12 at 15:35
Thank you. While trying to truly understand your code I worked with the following simple example: x = {1., 1.9, 1.6, 1.7, .5}; w = {.4, .3, .2, .05, .05}; ny = 20; y0 = 0.4; dy = .1; Executing process[x, w, {y0, dy, ny}]however, leads to a different result than my tab function using newMat=xand weights=wand end = y0 + (ny - 1)*dy; sortMat = Range[y0, end, dy]; How can this be? It seems that I'm missing something. – chris Oct 10 '12 at 15:55
That's a great way to test, Chris. The differences occur when some values of x appear to equal some values of sortMat. We are running into floating point error. To see this, set x = {1, 19, 16, 17, 5}/10 and ny = 20; y0 = 4/10; dy = 1/10 so that MMA uses exact arithmetic: you will find that process and your code now get exactly the same results. – whuber Oct 10 '12 at 16:05
I just tested it. Unfortunately, the results differ. I do also think that once the code is used in a compiled function the floating point issue will still be present, right? – chris Oct 10 '12 at 16:21
let us continue this discussion in chat – whuber Oct 10 '12 at 16:26

Mr.Wizard · Accepted Answer · 2012-10-10T23:24:52.650

7

Updated with whuber's fix, and added `process3`

Murta showed Compile and whuber showed a great algorithm improvement.
Let me show some syntax tweaking.

whuber's process function can be written more efficiently and concisely like this:

process2[x_, w_, {y0_, dy_, ny_}] :=
  Module[{x0, w0},
    x0 = Ceiling[(x - y0)/dy] ~Clip~ {0, ny};
    x0 = Join[{0}, Sort @ x0, {ny}];
    w0 = {0} ~Join~ Accumulate[ w[[Ordering @ x]] ];
    Join @@ MapThread[ConstantArray, {w0, Differences@x0}]
  ]

Your code producing the random tables can be made much more efficient, as there is a specific syntax within RandomReal (and RandomInteger, etc.) for this.

bNum = 200;
nNum = 100;
weights = RandomReal[{-10, 10}, {bNum, nNum}];
newMat  = RandomReal[{-10, 10}, {bNum, nNum}];

Timing comparison with process:

MapThread[process[#1, #2, {-10, 0.1, 201}] &, {newMat, weights}]; //Timing //First

MapThread[process2[#1, #2, {-10, 0.1, 201}] &, {newMat, weights}]; //Timing //First

0.0842

0.01744

(My code originally showed about a doubling of performance over whuber's. It's now nearly 5X due to the poor performance of Min[ny, Max[0, Ceiling[(# - y0)/dy]]] & /@ x in the fix, which I handled with Clip.)

I decided to implement a form of the function that operates on newMat and weights directly, that is moving the outer MapThread inside the function. This allows certain optimizations at the cost of additional memory consumption.

process3[x_, w_, {y0_, dy_, ny_}] :=
  Module[{xm, wm},
    xm = Ceiling[(x - y0)/dy] ~Clip~ {0, ny};
    xm = ArrayFlatten @ {{0, Sort /@ xm, ny}};
    wm = ArrayFlatten @ {{0, MapThread[Accumulate @ #2[[Ordering@#]] &, {x, w}]}};
    Join @@ MapThread[ConstantArray, {#2, Differences@#}] & ~MapThread~ {xm, wm}
  ]

process3[newMat, weights, {-10, 0.1, 201}] //Timing //First

0.01496

edited Oct 10 '12 at 23:24

answered Oct 08 '12 at 03:18

Mr.Wizard

271,378
34
587
1,371

Might as well do FoldList[Plus, 0, w[[Ordering@x]]] instead of {0} ~Join~ Accumulate[w[[Ordering@x]]]... :) – J. M.'s missing motivation Oct 08 '12 at 03:29
@J.M. prior experience suggests that will be slower, but I'll test it now. Yup, slower, at least on version 7. – Mr.Wizard Oct 08 '12 at 03:30
BTW: you don't need to generate sortMat anymore. :) – J. M.'s missing motivation Oct 08 '12 at 03:50
@Mr.Wizard: Thank you very much for another great improvement. – chris Oct 08 '12 at 08:28
Thank you: collectively, we're now up to about a factor of 150 improvement over the original. Perhaps the last challenge remaining would be to compile this code, thereby combining both ideas. Compile doesn't like the last step (of joining constant arrays of varying lengths). If there is a way to do this efficiently and get it compiled, possibly the timing can be improved further (but probably not dramatically). – whuber Oct 08 '12 at 22:35
@whuber you could also get a slight improvement processing all of newMat in Floor[(x - y0)/dy + 1] at once. – Mr.Wizard Oct 08 '12 at 23:53
Good idea. For simplicity, I decided at the outset to process the input row by row rather than comprehensively, but in some circumstances a little precomputation of this nature could save some time. Before going in such directions, we would want to find out the intended ranges of the input data sizes, such as values for bNum, nNum, and the length of sortMat. Those would indicate whether such measures would be worthwhile. – whuber Oct 09 '12 at 04:35

score 4 · Answer 3 · answered Oct 06 '12 at 13:01

Here is one demonstration on how to optimize your code using compile. I see that you are not using the functional programin mindset yet, this could be improved in your code too, but here I will focus just on compile.

Here we create your original matrices:

Clear[bNum,nNum,weights,newMat,sortMat,tab];

bNum=20;
nNum=100;

weights = RandomReal[{-10, 10}, {bNum, nNum}];
newMat = RandomReal[{-10, 10}, {bNum, nNum}];
sortMat = Range[-10, 10, 0.1];

Now I created 3 exemples of your tab computation:

tab[weights_,newMat_,sortMat_]:=Module[{bNum,nNum},
    {bNum,nNum}=Dimensions[weights];
    Table[
            Sum[
                weights[[k,i]]*If[newMat[[k,i]]<=sortMat[[j]],1.,0.]
                ,{i,1,nNum}
                ]
            ,{k,1,bNum},{j,1,Length[sortMat]}
    ]
]

tabCompiled=Compile[{{weights,_Real,2},{newMat,_Real,2},{sortMat,_Real,1}},

    Module[{bNum,nNum},
    {bNum,nNum}=Dimensions[weights];
    Table[
            Sum[
                weights[[k,i]]*If[newMat[[k,i]]<=sortMat[[j]],1.,0.]
                ,{i,1,nNum}
                ]
            ,{k,1,bNum},{j,1,Length[sortMat]}
        ]
    ]
];

tabCompiledC=Compile[{{weights,_Real,2},{newMat,_Real,2},{sortMat,_Real,1}},

    Module[{bNum,nNum},
    {bNum,nNum}=Dimensions[weights];
    Table[
            Sum[
                weights[[k,i]]*If[newMat[[k,i]]<=sortMat[[j]],1.,0.]
                ,{i,1,nNum}
                ]
            ,{k,1,bNum},{j,1,Length[sortMat]}
        ]
    ]
    ,CompilationTarget->"C"
];

The first without compile, the second compiled, the third compiled in C. Compare the diferences:

tab[weights, newMat, sortMat]; // AbsoluteTiming
tabCompiled[weights, newMat, sortMat]; // AbsoluteTiming
tabCompiledC[weights, newMat, sortMat]; // AbsoluteTiming
{0.833, Null}
{0.046, Null}
{0.016, Null}

I hope that it could help you.

score 1 · Answer 4 · answered Oct 10 '12 at 21:37

You can get a decent speed up with a little tweak of the method suggested in Daniel Lichtblau's comment, by transposing newMat and weights so that Total can be used more efficiently:

tab2 = Module[{nt = Transpose[newMat], wt = Transpose[weights]}, 
  Transpose[Total[UnitStep[# - nt] wt] & /@ sortMat]];

With the array sizes in the question this gives me a speedup factor of about 120.

The code can be dropped directly into Compile for another factor of 2:

c = Compile[{{newMat, _Real, 2}, {weights, _Real, 2}, {sortMat, _Real, 1}},
   Module[{nt = Transpose[newMat], wt = Transpose[weights]}, 
    Transpose[Total[UnitStep[# - nt] wt] & /@ sortMat]]];

tab3 = c[newMat, weights, sortMat];

Faster code for specific problem

4 Answers4

Edit

Updated with whuber's fix, and added process3

Updated with whuber's fix, and added `process3`