Tell ParallelMap[] to use just specific kernels

Question

Is there a way to tell ParallelMap to use just some specific kernels (as is possible for ParallelEvaluate)?

Background: I have 6 local and 16 remote kernels. During some analysis I call ParalleMap twice. One ParallelMap is quicker using all the kernels, but one is actually slower (probably bandwidth limited). This is why I would like to run one ParallelMap on all the kernels and the other only on the local kernels. I don't want to close the kernels in between if possible. I also tried setting the parallelization method to "CoarsestGrained" or "FinestGrained", but that didn't help much.

Edit:

Could somebody shed light on why the following does not work?

num = 10;
list = Range[num];
function = Labeled[Framed[#], $KernelID] &;
ParallelMap[function, list]
aKer = Kernels[];
Block[{Parallel`Protected`$kernels = aKer[[{2, 3}]]},
   ParallelMap[function, list]
]

Mathematica graphics

The second map is still run on 4 of the Kernels instead of just two. I traced ParallelMap to

Parallel`Developer`ParallelDispatch[Parallel`Combine`Private`cmdl]

and

Parallel`Developer`ParallelDispatch[Parallel`Parallel`Private`cmds_] :=
    Parallel`Developer`ParallelDispatch[Parallel`Parallel`Private`cmds,Kernels[]]
Kernels[]:=Parallel`Protected`$kernels

so overriding Parallel`Protected`$kernels should in theory work.

score 12 · Accepted Answer · edited Feb 19 '24 at 08:13

Let me give a different approach. One downside of WalkingRandomly's approach is, that he distributes the whole list over all subkernels. The he uses Part in each subcall to select the data to use. I will make this differently:

I divide the data into chunks and define each chunk as subdata on every subkernel you want to use in the current call
then I can simply call Map[f,subdata] on each wanted subkernel with ParallelEvaluate

The chunkenize works whether Length[data] is divisible by the number of used kernels or not.

chunkenize[data_, nkernels_] := 
 Partition[data, UpTo[Ceiling[Length[data] / nkernels]]]
MyParallelMap[f_, data_, kernels_] := 
 Module[{chunks = chunkenize[data, Length[kernels]]},
  Block[{subdata},
   MapIndexed[
    ParallelEvaluate[subdata = #1, kernels[[First[#2]]]] &, chunks];
   DistributeDefinitions[f];
   ParallelEvaluate[Map[f, subdata], kernels]
   ]
  ]

Trying it gives

data = Range[20];
f[x_] := {$KernelID, x^2}
kernels = LaunchKernels[];
MyParallelMap[f, data, kernels]
(*
{{{1,1},{1,4},{1,9},{1,16},{1,25}},{{2,36},{2,49},{2,64},{2,81},{2,100}},
{{3,121},{3,144},{3,169},{3,196},{3,225}},{{4,256},{4,289},{4,324},{4,361},{4,400}}}
*)

Or if you like

MyParallelMap[f,data,kernels[[{2,3}]]]
(*
{{{2,1},{2,4},{2,9},{2,16},{2,25},{2,36},{2,49},{2,64},{2,81},{2,100}},
{{3,121},{3,144},{3,169},{3,196},{3,225},{3,256},{3,289},{3,324},{3,361},{3,400}}}
*)

Update

Also I would really like to know why overriding Parallel`Protected`$kernels does not work.

When you trace the output of a simple ParallelMap call, you can investigate what happens. What I did is, I created a full trace output and checked then, on what positions the subkernels like KernelObject[1, "local"] appear.

In detail this meant to check the FullForm of a subkernel because then you see that it has the form

Parallel`Kernels`kernel[....]

then I launched some kernels and trace the output. Using Position you can find all positions which match a sub-kernel

kernels = LaunchKernels[];
trace = Trace[ParallelMap[$KernelID &, Range[100]]];
pos = Position[trace, Parallel`Kernels`kernel, Infinity];

If you now inspect a bit the positions where the sub-kernels arise, you first find what you found: Parallel`Protected`$kernels. But soon you see

Part[trace,Sequence@@Drop[pos[[10]], -4]]
(*
{Parallel`Protected`$sortedkernels,
 {KernelObject[1,local],KernelObject[2,local],
  KernelObject[3,local],KernelObject[4,local]}}
*)

This brings us to the following solution:

Block[{
  $KernelCount = 2,
  Parallel`Protected`$sortedkernels = Take[kernels, 2]
  },
 ParallelMap[$KernelID &, Range[100]]
]
(*
{1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,
1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,
2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2}
*)

Since I didn't find this that quick, I had time to do some more spelunking. You may have noted, that I set $KernelCount in the Block. This is, because the value of it is used for the partitionizer.

@WalkingRandomly Nevertheless you had already the basic idea. And keeping in mind that you are a newbie who made 215 reps yesterday and writes his answers with clear code-boxes and images, I can just repeat: I'm very glad to have you here. — halirutan, Nov 27 '12 at 00:58
thanks :) I find it interesting seeing how other people solve the same problem. I'm off to bed now. 1am here — WalkingRandomly, Nov 27 '12 at 01:05
Actually..one more thing. It would have never have occured to me to use a Block[] within a Module. In fact, I never use Block[]. May I ask why you use it here? Why not just do MyParallelMap[f_, data_, kernels_] := Module[{chunks = chunkenize[data, Length[kernels]], subdata}, MapIndexed[ParallelEvaluate[subdata = #1, kernels[[First[#2]]]] &, chunks]; DistributeDefinitions[f]; ParallelEvaluate[Map[f, subdata], kernels]] — WalkingRandomly, Nov 27 '12 at 01:11
@WalkingRandomly In this case it works without Block because ParallelEvaluate has the HoldFirst attribute. I just like to make sure (even for the reader) that subdata is a new, local variable which should not interfere with any globally defined subdata. The difference between Block and Module is the scoping type: First uses dynamic scoping the latter lexical scopying — halirutan, Nov 27 '12 at 01:21
Thanks for that. Will google dynamic scoping and lexical scoping tomorrow :) Cheers.. — WalkingRandomly, Nov 27 '12 at 01:23
Supposing one had to evaluate MyParallelMap[f,data, kernel] several times, how could this code be optimized in case data were a fixed quantity, and the function f were constantly changing? — Ziofil, Apr 18 '13 at 22:06

score 8 · Answer 2 · answered Nov 27 '12 at 00:11

I don't think that you can do this with ParallelMap directly (well, I can't find anything in the documentation at least) but you could emulate such behaviour using ParallelEvaluate. Consider the following example and its output on my quad core laptop

num=10;
list = Range[num];
function = Labeled[Framed[#], $KernelID] &;
ParallelMap[function, list]

Result of evaluating the above code

The goal of this function is just to frame the input with the $KernelID of the kernel that did the framing printed underneath. Imagine that kernels {3,4} were broken and so I only wanted to work with kernels {1,2}

num = 10;
list = Range[num];
function = Labeled[Framed[#], $KernelID] &;
wantedkernels = {1, 2};
chunk = num/Length[wantedkernels];
DistributeDefinitions[chunk, list, function];

ParallelEvaluate[
  Map[function, 
   list[[1 + ($KernelID - 1)*chunk ;; ($KernelID*chunk)]]],
  wantedkernels] // Flatten

The result of the above code

Performance might suffer from the fact that each kernel gets a fixed amount of work (so no load balancing etc) but perhaps this is enough for your needs?

One pitfall here is that kernel IDs don't necessarily start at 1. Just run LaunchKernels[];CloseKernels[];LaunchKernels[]. Also losing the load balancing is a major blow. But I appreciated the general idea and will try it out (+1). — Ajasja, Nov 27 '12 at 13:58
Agreed. I only intended to show the basic principal. halirutan's answer uses the same basic idea as mine but is more robust and probably more efficient. Even his solution,howerver, would need more work before it became something that would be fit for general usage. — WalkingRandomly, Nov 27 '12 at 14:09

Tell ParallelMap[] to use just specific kernels

2 Answers2

Update

Linked