Consider this code:
LaunchKernels[];
On["Packing"]
a = RandomReal[{0., 1.}, {64, 30000}];
ParallelMap[Fourier, a]; // AbsoluteTiming
Developer
FromPackedArray::unpack: Unpacking array in call to LanguageExtendedFullDefinition. Developer`FromPackedArray::unpack: Unpacking array in call to MemberQ. >>
(*{1.853567, Null}*)
Map[Fourier, a]; // AbsoluteTiming
Developer`FromPackedArray::punpackl1: Unpacking array with dimensions {64,30000} to level 1.
(*{0.289122, Null}*)
How to avoid the unpacking and get some speedup by parallelization?
Update
By avoiding the MemberQ unpack (fix function copied from here), we can get about 2X speedup, but still slower than the non-parallel version :
memberQ[list_, form_] := Or @@ (MatchQ[#, form] & /@ list)
ClearAll[fix]
SetAttributes[fix, HoldAll]
fix[expr_] := Block[{MemberQ = memberQ}, expr]
fix@ParallelMap[Fourier, a]; // AbsoluteTiming
Developer
FromPackedArray::unpack: Unpacking array in call to LanguageExtendedFullDefinition. >>
(*{0.564126, Null}*)
Update 2
Using the ParallelTable can eliminate unpacking and can actually get speedup
first run
fix[
ParallelTable[
Fourier[a[[n]]], {n, 1, Length[a]}]]; // AbsoluteTiming
(*{0.215288, Null}*)
second run
fix[
ParallelTable[
Fourier[a[[n]]], {n, 1, Length[a]}]]; // AbsoluteTiming
(*{0.092006, Null}*)
Questions:
- What is this
LanguageExtendedFullDefinition` and why I always get this warning? How to avoid unpacking from it? I'm using version 9. - Can you give more evidence on "Fourier is so fast that you loose any time you gain in the overhead of parallelism"?
- If the slow is because of parallel over head, why
ParallelTableis 5X faster thanParallelMap? Thanks a lot!
shdwmessage. Version 9.0.1 on fresh kernel, screen shot:Parallel`Preferences`would end up on the context path; that seems like a mistake. Good that it's apparently fixed. – Mr.Wizard Sep 03 '13 at 02:40MemberQI get 2X speedup? See my update. – xslittlegrass Sep 03 '13 at 02:40MemberQunpacking bug as Oleksandr described, and as I already addressed in my answer. Beyond that what is your point? I don't understand. – Mr.Wizard Sep 03 '13 at 02:44ais ten times larger than mine, does that make the overhead more significant in your case? And also 1 second over head seems quit large, isn't it? – xslittlegrass Sep 03 '13 at 02:47LanguageExtendedFullDefinition? I do not see that in v7. I forgot that I made my data 10X larger. I was thinking that if anything a larger working set might reduce the proportional overhead; usually parallelism is more applicable to large/slow problems. – Mr.Wizard Sep 03 '13 at 02:50MemberQfix above in my update by Szabolcs, let me try Oleksandr's version and get back to you. – xslittlegrass Sep 03 '13 at 02:55LanguageExtendedFullDefinitionusingwithModifiedMemberQ. – xslittlegrass Sep 03 '13 at 03:04Parallel\`` package has changed a lot in version 8.Language`ExtendedFullDefinitionis new, for example (it is used to automatically distribute definitions). I do agree that unpacking to level 1 likely has no significant performance impact, but I'm not completely sure that the version 8Parallel`` package might not be doing something harmful to performance as well. Will look into this later. – Oleksandr R. Feb 25 '14 at 11:13