6

Consider this code:

LaunchKernels[];
On["Packing"]

a = RandomReal[{0., 1.}, {64, 30000}];

ParallelMap[Fourier, a]; // AbsoluteTiming

DeveloperFromPackedArray::unpack: Unpacking array in call to LanguageExtendedFullDefinition. Developer`FromPackedArray::unpack: Unpacking array in call to MemberQ. >>

(*{1.853567, Null}*)

Map[Fourier, a]; // AbsoluteTiming

Developer`FromPackedArray::punpackl1: Unpacking array with dimensions {64,30000} to level 1.

(*{0.289122, Null}*)

How to avoid the unpacking and get some speedup by parallelization?

Update

By avoiding the MemberQ unpack (fix function copied from here), we can get about 2X speedup, but still slower than the non-parallel version :

memberQ[list_, form_] := Or @@ (MatchQ[#, form] & /@ list)
ClearAll[fix]
SetAttributes[fix, HoldAll]
fix[expr_] := Block[{MemberQ = memberQ}, expr]

fix@ParallelMap[Fourier, a]; // AbsoluteTiming

DeveloperFromPackedArray::unpack: Unpacking array in call to LanguageExtendedFullDefinition. >>

(*{0.564126, Null}*)

Update 2

Using the ParallelTable can eliminate unpacking and can actually get speedup

first run

fix[
   ParallelTable[
    Fourier[a[[n]]], {n, 1, Length[a]}]]; // AbsoluteTiming
(*{0.215288, Null}*)

second run

fix[
   ParallelTable[
    Fourier[a[[n]]], {n, 1, Length[a]}]]; // AbsoluteTiming
(*{0.092006, Null}*)

Questions:

  1. What is this LanguageExtendedFullDefinition` and why I always get this warning? How to avoid unpacking from it? I'm using version 9.
  2. Can you give more evidence on "Fourier is so fast that you loose any time you gain in the overhead of parallelism"?
  3. If the slow is because of parallel over head, why ParallelTable is 5X faster than ParallelMap? Thanks a lot!
xslittlegrass
  • 27,549
  • 9
  • 97
  • 186

1 Answers1

6

I don't think unpacking is the problem. Rather, I believe that Fourier is so fast that you loose any time you gain in the overhead of parallelism.

Consider using Identity as an example. I will use withModifiedMemberQ to deal with that bug.

I use List @@ to intentionally unpack to level one, which is not a problem as subarrays remain packed.

a = List @@ RandomReal[{0., 1.}, {640, 30000}];
On["Packing"]

withModifiedMemberQ[
  ParallelMap[Identity, a]; // AbsoluteTiming
]

Map[Identity, a]; // AbsoluteTiming
{0.4930282, Null}

{0.0850049, Null}

Note that there are no unpacking messages issued. However, I had to run the code above twice as the first pass I got an error:

list::shdw: Symbol list appears in multiple contexts {Parallel`Preferences`,Global`}; definitions in context Parallel`Preferences` may shadow or be shadowed by other definitions.)

This might be a v7 bug. Does anyone else see it with the code above?

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • I do not see the shdw message. Version 9.0.1 on fresh kernel, screen shot: Mathematica graphics on second call: Mathematica graphics – Nasser Sep 03 '13 at 02:37
  • @Nasser Thanks; I don't know why Parallel`Preferences` would end up on the context path; that seems like a mistake. Good that it's apparently fixed. – Mr.Wizard Sep 03 '13 at 02:40
  • I guess unpack maybe somewhat relevant because if I remove used the fixed MemberQ I get 2X speedup? See my update. – xslittlegrass Sep 03 '13 at 02:40
  • @xslittlegrass Yes, there is a known MemberQ unpacking bug as Oleksandr described, and as I already addressed in my answer. Beyond that what is your point? I don't understand. – Mr.Wizard Sep 03 '13 at 02:44
  • @Mr.Wizard I'm wonderring the unpack in calling to LanguageExtendedFullDefinition also has some effect. Since in your example, the a is ten times larger than mine, does that make the overhead more significant in your case? And also 1 second over head seems quit large, isn't it? – xslittlegrass Sep 03 '13 at 02:47
  • @xslittlegrass So with my code above you area still getting an unpack message for the function LanguageExtendedFullDefinition? I do not see that in v7. I forgot that I made my data 10X larger. I was thinking that if anything a larger working set might reduce the proportional overhead; usually parallelism is more applicable to large/slow problems. – Mr.Wizard Sep 03 '13 at 02:50
  • @Mr.Wizard Yes but I only tried the MemberQ fix above in my update by Szabolcs, let me try Oleksandr's version and get back to you. – xslittlegrass Sep 03 '13 at 02:55
  • @Mr.Wizard Yes I still get that unpack message from LanguageExtendedFullDefinition using withModifiedMemberQ. – xslittlegrass Sep 03 '13 at 03:04
  • @Mr.Wizard ParallelTable seem doesn't unpack, and it gives about 2X speedup compared to the non-parallel version in the second run. See my update. – xslittlegrass Sep 03 '13 at 03:22
  • @xsl I was away. Perhaps you could update the post with your questions so that I might (try to) answer them when I have time, rather than waiting for us both to be around at the same time. – Mr.Wizard Sep 03 '13 at 05:54
  • It isn't ideal in many ways to do this testing on version 7, as the Parallel\`` package has changed a lot in version 8.Language`ExtendedFullDefinitionis new, for example (it is used to automatically distribute definitions). I do agree that unpacking to level 1 likely has no significant performance impact, but I'm not completely sure that the version 8Parallel`` package might not be doing something harmful to performance as well. Will look into this later. – Oleksandr R. Feb 25 '14 at 11:13
  • @Oleksandr Good point. I look forward to your results; please feel free to either edit this answer or post your own. – Mr.Wizard Feb 25 '14 at 21:41