11

I recently tried to use ParallelMap instead of Map and to my surprise encountered that ParallelMap seems to be slower in general than Map, which does not make sense to me.

Here is a simple test case, that shows the behavior on my system (tested it on Linux 64 Bit QuadCore i7 and on MacOS DualCore Core2Duo, both running Mathematica 9.0.1):

LaunchKernels[];
f[x_] := Sin[x] + Cos[x] + Tan[x];
test = Table[i, {i, 100000}];

Timing results are the following (and they are consistent, additional kernels have been started before):

Map[f[#] &, test]; // AbsoluteTiming

{0.198230, Null}

ParallelMap[f[#] &, test]; // AbsoluteTiming

{0.650516, Null}

What am I missing here?

Murta
  • 26,275
  • 6
  • 76
  • 166
Wizard
  • 2,720
  • 17
  • 28
  • 2
    Just to drive the point home: ParallelXYZ works best for slow functions. Often you can save time by faster functional implementations (examples galore around here), consider e.g.: Sin[#] + Cos[#] + Tan[#] &[Range[100000]] – Yves Klett Oct 07 '13 at 16:31
  • 1
    ParallelMap is a bit faster when test is replaced by test = N @ Range[1000000] and gets faster as the size grows. Approximate real versus exact results makes a 10x difference in the amount of data transferred back to the main kernel. That is surely a factor in the timings. – Michael E2 Oct 08 '13 at 02:19

2 Answers2

11

I think this may be a duplicate of: How to avoid unpacking from Language`ExtendedFullDefinition

In Mathematica parallelism is only useful when processing takes longer than data transfer, otherwise the overhead of that transfer will make the parallel operation slower than the plain one. It should be somewhat faster than your original use of ParallelMap, but still not as fast as the plain usage to add Method -> "CoarsestGrained":

"CoarsestGrained": break the computation into as many pieces as there are available kernels

ParallelMap[f, test, Method -> "CoarsestGrained"]

Note that you do not need to embed f in a Function (f[#] &) to use it.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 3
    Now, are we talking to ourselves (not to worry, I do it all the time, don´t I)? – Yves Klett Oct 07 '13 at 16:21
  • @Yves The White Council is secretly assembling... BTW, there is another candidate for master: Why is parallel slower? – István Zachar Oct 07 '13 at 16:33
  • @IstvánZachar who will get to play Radagast? Now there´s a whacky fellow (although the movie took a lot of poetic license). And yes, quite similar questions... – Yves Klett Oct 07 '13 at 16:35
  • 1
    @Yves Not an easy part, lot's of qualities to fulfill. He must be a simple fool who is also a bird-tamer. We should look around at Gardening & Landscaping. – István Zachar Oct 07 '13 at 16:46
  • 1
    @IstvánZachar but he has the know of the mushrooms :-) – Yves Klett Oct 07 '13 at 16:49
  • @Yves Oh yes, the shrooms :) As of yet, there is no hallucinogenics.StackExchange to cast someone from there. I guess Radagast was Jackson's interpretation of Tim Benzedrine. – István Zachar Oct 07 '13 at 17:02
  • what you describe is the inherent problem of parallelizing code, it is by no means specific to Mathematica. In fact for many compiled languages it can be even harder to gain speedup with mindless brute force parallelism because the (sequential) processing is much faster. What is Mathematica specific is that the data transfer has some extra pitfalls, as your link shows... – Albert Retey Oct 08 '13 at 08:52
  • @Albert I thought there was a particular issue with parallelism and data transfer in Mathematica compared to other functional languages. For immutable data do not some other frameworks share one copy among all threads, whereas Mathematica makes a copy of that data for each? – Mr.Wizard Oct 08 '13 at 18:53
  • @Mr.Wizard: that's true insofar as mathematica parallelism AFAIK is entirely based on message passing (shared memory being only used for speed up of the message passing if possible). That does in fact has the disadvantage that it tends to copy data more than necessary, especially with the "high level" automatic parallelizing constructs. To some extent this can be mitigated and I think there already were some Q+A about how to. The advantage of message passing vs. direct shared memory is that the parallel code will run on a cluster of processors which don't share memory just as well... – Albert Retey Oct 09 '13 at 11:46
  • @Albert Good points; thanks. – Mr.Wizard Oct 10 '13 at 02:33
  • good maybe but certainly not very supportive for my original statement :-). They were not meant to contradict it but I have to admit that there is a point in emphazising that the way Mathematica implements parallelism is especially prone to suffer from communication overhead. Still -- the basic problem always exists at some level and no matter which language and concepts you use you'll find that getting good speedup from parallelizing hardly ever is easy to achieve. – Albert Retey Oct 10 '13 at 16:20
4

I think that ParallelMap has bad implementation of the data distribution between kernels. However if computation of f takes a long time there is some speedup (tested on Core2Duo)

LaunchKernels[];

f[x_] := Nest[Sin, x, 1000];
test = N@Range[100000];

Map[f, test]; // AbsoluteTiming
ParallelMap[f, test]; // AbsoluteTiming

{5.813099, Null}

{3.848709, Null}

Or you can distribute the data manually

f[x_] := Sin[x];
test = Transpose@Partition[N@Range[1000000], $ProcessorCount];

Map[f, Flatten@test]; // AbsoluteTiming

{1.207334, Null}

DistributeDefinitions[f, test];
ParallelEvaluate[data[$KernelID] = test[[$KernelID]]];
ParallelEvaluate[Map[f, data[$KernelID]]]; // AbsoluteTiming

{1.124331, Null}

Speedup is small but it exist.

May be these examples are not the best but they show that behavior of Parallel stuff in Mathematica isn't clear.

ybeltukov
  • 43,673
  • 5
  • 108
  • 212