11

Edit: I changed the sample code to get at my question better. Slowdown due to accessing a shared variable is a much more relevant issue (to me at least) than slowdown due to writing to a shared variable.

I've been frustrated to find that parallelizing my code often makes it slower. After long wondering why this is, I think I found that SetSharedVariable and SetSharedFunction are to blame.

Naïvely I would expect these functions to work efficiently and create little overhead. Yet instead they seem to be hugely problematic. Here's a basic example:

SetSharedVariable[foo]
foo = 6;
ParallelTable[
  foo^foo^foo;
  , {j, 1, 2}] // AbsoluteTiming

ParallelTable[
  Module[{foo2 = 6},
   foo2^foo2^foo2;]
  , {j, 1, 2}] // AbsoluteTiming

(*
{0.053959, {Null, Null}}
{0.015549, {Null, Null}}
*)

Can someone explain why the overhead for accessing a value every iteration is so much larger just because it's a shared variable? Even better, can someone explain or link to a resource for understanding how to use SetSharedVariable without huge slowdowns (this happens for my longer calculations too), and perhaps the truly correct situations in which to use it? (I often use it just to shut up Mathematica from spewing errors, which I'm sure is not wise.)

(I use Mathematica 11.0.0.0 on a MacBook with Dual Core)

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
Max
  • 1,050
  • 6
  • 14

2 Answers2

12

Do SetSharedVariable/SetSharedFunction ruin the benefits of ParallelTable?

Yes, they do. This has been discussed here many times, for example at:

Mathematica uses separate processes for parallelization. This means that the parallel threads cannot share any memory. What SetSharedVariable really does is that it causes that variable to always be evaluated and set on the main kernel. This involves a callback from the subkernel to the main kernel. Main kernel – subkernel communication is already a major bottleneck in the parallel tools. Forcing it for every single evaluation will typically kill all speed benefits. (Note that otherwise communication may happen only as few times as the number of subkernels. This is the case with Method -> "CoarsestGrained".)

The only exception is when the evaluation on the subkernel takes significantly longer than the callback to the main kernel. For example, take

list={};
SetSharedVariable[list];
ParallelDo[AppendTo[list, f[i]], {i, 100}]

This is effective only if f[i] takes long to evaluate (say, 1 second or more), and it does not return a lot of data (say, it returns a number instead of an array). The subkernel evaluations should take significantly longer than the communication between kernels.


Because of this, the key to effective parallelization in Mathematica is to fully separate the tasks of subkernels and avoid any communication between them. If they need to access the same variable, things get much more difficult.

Functional programming is much more amenable to parallelization because it avoids mutable data structures and side effects. To put it in simple terms, a problem is well parallelizable if you can phrase it in terms of Map (ParallelMap) or ParallelCombine.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • Thanks, good to know it's not just my coding style at fault. I have one basic, related question then: If I have a variable that I want to be created and used locally for each parallel kernel, what's the best way to do that? – Max Mar 02 '17 at 00:46
  • To answer my own question... I believe the best answer is "Module". Other suggestions are welcome though. – Max Mar 02 '17 at 06:29
  • 1
    @Max It depends on what you mean exactly. ParallelTable[Module[...], ...] is good. Module[..., ParallelTable[...]] is wrong. – Szabolcs Mar 02 '17 at 08:30
  • @Szabolcs: I completely agree with what you say about the fact that extra processes with no shared memory is the reason why the effect of SetSharedVariable is so dramatic in Mathematica. On the other hand the given scenario is a worst case use of a shared variable: when one process/thread writes the others have to wait due to the necessary locking with no chance of doing anything useful. It would cause performance issues (= slower than sequential) in any system/language. Of course that slowdown might be on a different scale, but it is a very general rule and not restricted to Mathematica... – Albert Retey Mar 02 '17 at 08:49
  • @Max: While Mathematica is certainly especially unefficient in that case, you still should rethink your coding style: if you want to see speedup avoid shared variables and communication wherever you can. That is a very basic (probably the most basic?) rule in writing parallel code in whatever language/system. – Albert Retey Mar 02 '17 at 08:52
  • @Albert Yes, you are right, but I don't fully understand: is there something I should correct and clarify? – Szabolcs Mar 02 '17 at 11:21
  • Probably you could make more clear that the shown code would have a performance problem even when the parallelization implementation would use a more efficient scheme. As you could see from @Max reaction, he was concluding that "it's not my coding style at fault", but actually at least the given example code would be an anti-pattern for parallel code independetly of how parallelism is implemented... – Albert Retey Mar 02 '17 at 13:31
  • @Szabolcs: as I think of it, I probably should just give an own alternative answer instead of asking you to formulate what I am trying to say :-). I might do that later... – Albert Retey Mar 02 '17 at 13:45
  • @AlbertRetey That's a good point, and that's my fault for giving a poor example. When I referred to my coding style I meant my actual code (which is unrelated and much more parallelizable). Here I was just trying to use one line that would take a non-trivial time to compute, but I could have thought it through more. My main point was to show the magnitude of the shared variable overhead on running time. In my actual code it's usually a ~15% slowdown between sequential and parallel+SetSharedVariable. – Max Mar 02 '17 at 15:14
  • @Max: actually as already shown in my answer to your other related question I was not really looking close enough into what you actually were computing and I think for that particular example it is really just the communication overhead of the last two results which are much larger that I noticed in my first look at the question and will kill your speedup... – Albert Retey Mar 02 '17 at 21:25
4

This is more a summary of a lengthy discussion in comments than an answer:

The example as given in the question is actually a worst case for a parallel program: the worker processes do not have a lot of work to do but need to return quite large results (for j=6,7) back to the master and most probably are getting into the way of each other doing so. Using the following version you can see that almost the entire time is spent sending the j=7 result (most probably including some time waiting for j=6 to be returned):

SetSharedVariable[foo]
ParallelTable[
  Module[{res},
    Print[{"calc", j} -> AbsoluteTiming[res = j^j^j;]];
    Print[{"send", j} -> AbsoluteTiming[foo = res;]];
  ],
  {j, 1, 7}
]    

Of course in this case the two factors of magnitude that the parallel version takes longer than the sequential one is almost entirely due to how inefficient Mathematica worker kernels communicate data back to the master. On the other hand code as shown would hardly have a chance to see any speedup with whatever technology, platform or language you would use.

In general, when trying to see good speedup with parallel code, you will need to minimize any communication overhead and synchronization between the parallel parts of your code. This is even more important in Mathematica compared to other languages/thechnologies due to the high level at which it operates and some suboptimal implementation details. Both make you pay an especially high price for any communication/synchronization.

I understand that the given example is just a demonstration, but if your code does contain parts that do similar things I would suggest to rethink your coding strategies if you want to see speedup from parallelization. Follow the advices in the answer of Szabolcs is a good starting point.

You also should be aware of the fact that it usually is much easier to see surprisingly high speedup using the various optimization strategies for sequential Mathematica code that you can find in other questions and answers on this site.

Albert Retey
  • 23,585
  • 60
  • 104
  • I realize from your comments/answer now that what I really meant to complain about was the slowdown due to accessing a shared variable, not writing to it. I didn't realize the difference until you brought it up. I've improved my question to reflect this (sorry to make part of your answer irrelevant). – Max Mar 03 '17 at 05:31
  • @Max: while it doesn't fit too well into the concept of this site this is how learning works, doesn't it? I did learn by answering your question in any case :-). As for the new situation: I'm not sure and don't have time to investigate or rethink, but it might be that reading needs to be treated similar to writing (concerning locking) in Mathematica for cases like foo:=RandomReal[]. – Albert Retey Mar 03 '17 at 12:58
  • It's true, this was a good give-and-take learning process. :) – Max Mar 05 '17 at 20:09