Mathematica using all available cores, but not speeding up calculation as much as expected

Question

This is a follow-up question to my previously posted question on how to get Mathematica to use all available cores to do a parallel calculation. I resolved this by rewriting the code in terms of a ParallelMap command, and in doing so Mathematica started to use all available cores, 16 physical cores in total. On a single core using Map the calculation takes ~6.5 minutes, and on 16 cores using ParallelMap it takes ~43 seconds. Why is it not calculating faster than this? I would've expected the calculation to take ~390/16 = 24 seconds.

Here is a simplified version of my code:

LaunchKernels[16];

dk = 1; λ = 1; g = 1; m = 1; f = 7;
lattsize = 10; t = 1; dt = 2*10^-1;
ϕ = 1;
p[P_, α_, β_] := {P*Sin[α]*Cos[β], P*Sin[α]*Sin[β], P*Cos[α]};
q[Q_, a_] := {Q*Sin[a], 0, Q*Cos[a]};
k[X_] := {0, 0, X};
X = Interpolation[Table[{i, i}, {i, 0, lattsize, 10^-3}]];
ω[x_] := Sqrt[x.x + m^2];
(*x:=p, y:=q, z:=k, s:=k+(-)p+(-)q*)

A1[x_, y_, z_, s_] := (1 + (g*ϕ^2)/(8*ω[x]^2))*ω[x] + (1 + (g*ϕ^2)/(8*ω[y]^2))*ω[y] + (1 + (g*ϕ^2)/(8*ω[z]^2))*ω[z] + (1 + (g*ϕ^2)/(8*ω[s]^2))*ω[s];

minA1 = {};
maxA1 = {};

variablesTable = Flatten[Table[{P, i, α, β, a}, {P, 0, 1, dk}, {i, 0, 1, dk}, {α, 0, 1, dk}, {β, 0, 1, dk}, {a, 0, 1, dk}], 4];

func := (solA1 = NSolve[A1[p[#[[1]], #[[3]], #[[4]]], q[Q, #[[5]]], k[X[#[[2]]]], k[X[#[[2]]]] - p[#[[1]], #[[3]], #[[4]]] - q[Q, #[[5]]]] == f, Q, Method -> {Automatic, "SymbolicProcessing" -> 0}]) &;

ParallelMap[
             ( func[#];

               If[solA1 != {},
                   solA1 = Select[Q /. solA1, Positive];
                   AppendTo[minA1, {#, Min[solA1] /. Infinity -> Null}];
                   AppendTo[maxA1, {#, Max[solA1] /. -Infinity -> Null}];,

                   AppendTo[minA1, {#, Null}];
                   AppendTo[maxA1, {#, Null}];
                 ]

              ) &,

            variablesTable,
            Method -> "CoarsestGrained"
           ];

minA1Master = Join@@ParallelEvaluate[minA1]
maxA1Master = Join@@ParallelEvaluate[maxA1]

Any help would be much appreciated.

First of all, take a look at point 3.2: Performance tuning in Mathematica — Kuba, Feb 28 '17 at 13:02
@Kuba Thanks for the info. The problem is, I'm fairly new to Mathematica so I'm not sure how to write this differently to achieve the same result?! Should I use Table inside the ParallelMap command? — user35305, Feb 28 '17 at 13:06
@Kuba How would I go about using Reap and Sow to create a table of values minA1 with the conditions imposed as per my original code? — user35305, Feb 28 '17 at 13:18
I'm not answering questions about parallel features, sorry :) I just wanted to link some general guidelines. — Kuba, Feb 28 '17 at 13:30
NSolve does not have a "SymbolicProcessing" suboption (NDSolve does). Anyway, maybe this is something FindRoot would do faster? — Daniel Lichtblau, Feb 28 '17 at 15:14
Can you try to reduce this example to something more manageable, while making sure that it actually runs? (The code you posted doesn't.) This is still a big piece of code that is hard to understand. Anyone who would try to answer would have to start by making it much smaller. This is something you can do. Hints: 1. Make sure it runs in a reasonable time, not 6 minutes. Map it over a shorter list if needed. 2. Do not use a huge inline function with hard to read slots like #[[1]], #[[2]]. Define a separate function with readable argument names. — Szabolcs, Feb 28 '17 at 15:44
Whether this is a still a problem, and whether it affects your code, is something to be tested. 5. NSolve is really meant for "nice" (mostly polynomial) symbolic functions, not numerical black boxes. I'm not sure if using it with an interpolating function is a good idea (though it does seem to work). As Daniel said, try FindRoot. Hope this helps. — Szabolcs, Feb 28 '17 at 15:48
@DanielLichtblau Ah, I didn't know that (although NSolve isn't complaining about me using "SymbolicProcessing"). I'll try FindRoot and see if this improves things. — user35305, Feb 28 '17 at 15:51
I partially fixed your code by adding minA1 = {}; maxA1 = {}; ParallelEvaluate[minA1 = {}; maxA1 = {};]; at the start. I benchmarked it with Map and ParallelMap, when mapped to Take[variablesTable, 500]. I got 5.8 vs 4.4 seconds. That's a small speedup, but still a speedup. I use M11.0 with 4 kernels (not 16). Do you observe the slowdown only when mapping over a longer list? Do you observe it only when using 16 kernels instead of fewer? — Szabolcs, Feb 28 '17 at 15:54
@Szabolcs I have updated my code in my OP to try and make it more manageable (and hopefully work)! That's good to know about the interpolating function (come to think of it, you're right that it does effectively nothing - it's a redundancy left over from a previous iteration of the code). What do you mean exactly by managing data transfer through return values? — user35305, Feb 28 '17 at 15:55
You can define fun[x_] := (a=x^2;), then run fun[2], then read the result by checking global variable a. This is generally bad, and with parallelization I would expect it to break things. fun[x_] := x^2 returns the result directly and doesn't manipulate global variables. This is good. — Szabolcs, Feb 28 '17 at 15:58
A more reasonable way to define func is func[{a1_, a2_, a3_, a4_, a5_, a6_}] := NSolve[ A1[p[a1, a3, a4], q[Q, a5], k[X[a2]], k[X[a2]] - p[a1, a3, a4] - q[Q, a5], a6] == f, Q] (named arguments for readability, no messing with global variables, and directly mappable over a list of parameter lists) — Szabolcs, Feb 28 '17 at 16:01
@Szabolcs Why do I need to define minA1 = {}; maxA1 = {} on the master kernel as well as on the sub-kernels? To be fair I'm not sure, I'll try a smaller list now (and fewer kernels). So instead of func := (solA1 = ...)& would it be better to write solA1 := (...)&? — user35305, Feb 28 '17 at 16:02
I should also mention that unfortunately it is not an uncommon experience that parallelization fails to speed up things, sometimes due to long standing bugs. Debugging this is often difficult and frustrating. This is why it is important to simplify the code as much as possible. Without that it is impossible to find the cause. — Szabolcs, Feb 28 '17 at 16:04

Mathematica using all available cores, but not speeding up calculation as much as expected

0 Answers0