1

This is a follow-up question to my previously posted question on how to get Mathematica to use all available cores to do a parallel calculation. I resolved this by rewriting the code in terms of a ParallelMap command, and in doing so Mathematica started to use all available cores, 16 physical cores in total. On a single core using Map the calculation takes ~6.5 minutes, and on 16 cores using ParallelMap it takes ~43 seconds. Why is it not calculating faster than this? I would've expected the calculation to take ~390/16 = 24 seconds.

Here is a simplified version of my code:

LaunchKernels[16];

dk = 1; λ = 1; g = 1; m = 1; f = 7;
lattsize = 10; t = 1; dt = 2*10^-1;
ϕ = 1;
p[P_, α_, β_] := {P*Sin[α]*Cos[β], P*Sin[α]*Sin[β], P*Cos[α]};
q[Q_, a_] := {Q*Sin[a], 0, Q*Cos[a]};
k[X_] := {0, 0, X};
X = Interpolation[Table[{i, i}, {i, 0, lattsize, 10^-3}]];
ω[x_] := Sqrt[x.x + m^2];
(*x:=p, y:=q, z:=k, s:=k+(-)p+(-)q*)

A1[x_, y_, z_, s_] := (1 + (g*ϕ^2)/(8*ω[x]^2))*ω[x] + (1 + (g*ϕ^2)/(8*ω[y]^2))*ω[y] + (1 + (g*ϕ^2)/(8*ω[z]^2))*ω[z] + (1 + (g*ϕ^2)/(8*ω[s]^2))*ω[s];

minA1 = {};
maxA1 = {};

variablesTable = Flatten[Table[{P, i, α, β, a}, {P, 0, 1, dk}, {i, 0, 1, dk}, {α, 0, 1, dk}, {β, 0, 1, dk}, {a, 0, 1, dk}], 4];

func := (solA1 = NSolve[A1[p[#[[1]], #[[3]], #[[4]]], q[Q, #[[5]]], k[X[#[[2]]]], k[X[#[[2]]]] - p[#[[1]], #[[3]], #[[4]]] - q[Q, #[[5]]]] == f, Q, Method -> {Automatic, "SymbolicProcessing" -> 0}]) &;

ParallelMap[
             ( func[#];

               If[solA1 != {},
                   solA1 = Select[Q /. solA1, Positive];
                   AppendTo[minA1, {#, Min[solA1] /. Infinity -> Null}];
                   AppendTo[maxA1, {#, Max[solA1] /. -Infinity -> Null}];,

                   AppendTo[minA1, {#, Null}];
                   AppendTo[maxA1, {#, Null}];
                 ]

              ) &,

            variablesTable,
            Method -> "CoarsestGrained"
           ];

minA1Master = Join@@ParallelEvaluate[minA1]
maxA1Master = Join@@ParallelEvaluate[maxA1]

Any help would be much appreciated.

user35305
  • 291
  • 1
  • 4
  • First of all, take a look at point 3.2: Performance tuning in Mathematica – Kuba Feb 28 '17 at 13:02
  • @Kuba Thanks for the info. The problem is, I'm fairly new to Mathematica so I'm not sure how to write this differently to achieve the same result?! Should I use Table inside the ParallelMap command? – user35305 Feb 28 '17 at 13:06
  • @Kuba How would I go about using Reap and Sow to create a table of values minA1 with the conditions imposed as per my original code? – user35305 Feb 28 '17 at 13:18
  • 2
    I'm not answering questions about parallel features, sorry :) I just wanted to link some general guidelines. – Kuba Feb 28 '17 at 13:30
  • @Kuba Ok fair enough. – user35305 Feb 28 '17 at 13:31
  • NSolve does not have a "SymbolicProcessing" suboption (NDSolve does). Anyway, maybe this is something FindRoot would do faster? – Daniel Lichtblau Feb 28 '17 at 15:14
  • Can you try to reduce this example to something more manageable, while making sure that it actually runs? (The code you posted doesn't.) This is still a big piece of code that is hard to understand. Anyone who would try to answer would have to start by making it much smaller. This is something you can do. Hints: 1. Make sure it runs in a reasonable time, not 6 minutes. Map it over a shorter list if needed. 2. Do not use a huge inline function with hard to read slots like #[[1]], #[[2]]. Define a separate function with readable argument names. – Szabolcs Feb 28 '17 at 15:44
  • You manage data transfer through global variables (minA1, etc.) which have different values on main and subkernels (even though at the beginning their value is synchronized). Don't do this. Even if it works (which is a question), it invites trouble. Manage data transfer through return values. The result should be returned by the function you mapped. 4. X is an interpolating function that does effectively nothing. Avoid it if you can. There are known problems with interpolating functions and parallelization
  • – Szabolcs Feb 28 '17 at 15:47
  • Whether this is a still a problem, and whether it affects your code, is something to be tested. 5. NSolve is really meant for "nice" (mostly polynomial) symbolic functions, not numerical black boxes. I'm not sure if using it with an interpolating function is a good idea (though it does seem to work). As Daniel said, try FindRoot. Hope this helps. – Szabolcs Feb 28 '17 at 15:48
  • @DanielLichtblau Ah, I didn't know that (although NSolve isn't complaining about me using "SymbolicProcessing"). I'll try FindRoot and see if this improves things. – user35305 Feb 28 '17 at 15:51
  • I partially fixed your code by adding minA1 = {}; maxA1 = {}; ParallelEvaluate[minA1 = {}; maxA1 = {};]; at the start. I benchmarked it with Map and ParallelMap, when mapped to Take[variablesTable, 500]. I got 5.8 vs 4.4 seconds. That's a small speedup, but still a speedup. I use M11.0 with 4 kernels (not 16). Do you observe the slowdown only when mapping over a longer list? Do you observe it only when using 16 kernels instead of fewer? – Szabolcs Feb 28 '17 at 15:54
  • @Szabolcs I have updated my code in my OP to try and make it more manageable (and hopefully work)! That's good to know about the interpolating function (come to think of it, you're right that it does effectively nothing - it's a redundancy left over from a previous iteration of the code). What do you mean exactly by managing data transfer through return values? – user35305 Feb 28 '17 at 15:55
  • You can define fun[x_] := (a=x^2;), then run fun[2], then read the result by checking global variable a. This is generally bad, and with parallelization I would expect it to break things. fun[x_] := x^2 returns the result directly and doesn't manipulate global variables. This is good. – Szabolcs Feb 28 '17 at 15:58
  • A more reasonable way to define func is func[{a1_, a2_, a3_, a4_, a5_, a6_}] := NSolve[ A1[p[a1, a3, a4], q[Q, a5], k[X[a2]], k[X[a2]] - p[a1, a3, a4] - q[Q, a5], a6] == f, Q] (named arguments for readability, no messing with global variables, and directly mappable over a list of parameter lists) – Szabolcs Feb 28 '17 at 16:01
  • @Szabolcs Why do I need to define minA1 = {}; maxA1 = {} on the master kernel as well as on the sub-kernels? To be fair I'm not sure, I'll try a smaller list now (and fewer kernels). So instead of func := (solA1 = ...)& would it be better to write solA1 := (...)&? – user35305 Feb 28 '17 at 16:02
  • I should also mention that unfortunately it is not an uncommon experience that parallelization fails to speed up things, sometimes due to long standing bugs. Debugging this is often difficult and frustrating. This is why it is important to simplify the code as much as possible. Without that it is impossible to find the cause. – Szabolcs Feb 28 '17 at 16:04