How to monitor the communication between parallel kernels

Question

Recently I increased the problem size in my code, and it scales up very badly using ParallelTable. With a lot of help from the this community, it seems that the problem maybe caused by the communication of large data between the kernels. So is there a way to monitor the communication between these kernels? For example, to see how much data is copied from one kernel to another.

Edit

As requested, here is a example demonstrate the problem:

Clear["`*"]
LaunchKernels[];

I have 16 kernels on my system

$KernelCount
(* ==> 16 *)

define some matrix

m = 30000; n = 640;
a = RandomComplex[{0., 1. + I}, {n, m}];
b = RandomComplex[{0., 1. + I}, {n, m}];

define some function which does some simple algebraic calculation, the detailed can be ignored.

SelectbyWRange[A_, {WMin_, WMax_}, {TakeWMin_, TakeWMax_}] :=
  Module[{lthA, nMax, nMin}, lthA = Length[A];
  nMin = Round[-((-WMax + lthA WMin)/(WMax - WMin)) - ((1 - 
          lthA) TakeWMin)/(WMax - WMin)];
  nMax = Round[-((-WMax + lthA WMin)/(WMax - WMin)) - ((1 - 
          lthA) TakeWMax)/(WMax - WMin)];
  Transpose[{Table[
     TakeWMin + n*(TakeWMax - TakeWMin)/(nMax - nMin), {n, 0, 
      nMax - nMin}], Take[A, {nMin, nMax}]}]]
g[{x_, y_}] := 
 SelectbyWRange[-Im[x*Conjugate[y]], {-834., 834.}, {19.5, 20.5}]

Timing Table, ParallelTable, Map, ParallelMap:

Table[g[{a[[n]], b[[n]]}], {n, 1, Length[a]}]; // AbsoluteTiming
(* ==> {0.390135, Null} *)

ParallelTable[g[{a[[n]], b[[n]]}], {n, 1, Length[a]}]; // AbsoluteTiming  
(* ==> {14.352067, Null} *)

Map[g, Transpose[{a, b}]]; // AbsoluteTiming
(* ==> {1.010789, Null} *)

ParallelMap[g, Transpose[{a, b}]]; // AbsoluteTiming 
(* ==> {8.101203, Null} *)

ParallelTable[
   g[{RandomComplex[{0., 1. + I}, {m}], 
     RandomComplex[{0., 1. + I}, {m}]}], {n}]; // AbsoluteTiming  
(* ==> {0.128660, Null} *)

We can see that ParallelTable[g[{a[[n]], b[[n]]}], {n, 1, Length[a]}] is worst in timing, this maybe because whole a and b have to be copied to each subKernel and thus took long time. Also in the last ParallelTable, there is no data copying from master kernel to subKernels and it has the best performance. As why Table is so fast and ParallelMap is so slow I have no clue. I thought monitor the communication between the kernels maybe helpful in understanding their behaviors.

If the kernels are running on different machines, something like http://www.wireshark.org/ might be helpful. If the data is not encrypted, you can even see the content of the packets. Otherwise, you would be able to monitor the size/number of the packets. — Helium, Sep 04 '13 at 04:46
If you want to monitor the communication between kernels running on the same machine and you are using Windows, this link might be useful http://stackoverflow.com/questions/8496388/sniff-inter-process-communication — Helium, Sep 04 '13 at 04:57
It seems to me that the specific question about monitoring may not help you with the real problem. While on can force communication between remote kernels, typically parallel kernels principally communicate only with the master kernel. If you can post a simple diagram of the structure of the calculations. Maybe we can help you rethink where to stage data for parallel kernel use. — Jagra, Sep 04 '13 at 13:11

score 8 · Accepted Answer · answered Sep 04 '13 at 20:57

You can use LinkSnooper to monitor this communication. LinkSnooper is a utility Java program that is included in J/Link. You insert it between two MathLink programs and it transparently shuttles data back and forth between them, printing out the flow in both directions. The programs at each end think they have a MathLink directly to each other, but instead they each have a link to LinkSnooper.

There are a number of ways to get LinkSnooper inserted between two MathLink programs (listen, connect, etc.) but a basic launch-style setup is sufficient for the parallel computing features in Mathematica. Assuming for the moment that you are using local subkernels, Mathematica will normally do essentially this to start up a subkernel:

(* LinkLaunch["mathkernel.exe"] *)

To use LinkSnooper, you convert this into

(* LinkLaunch["java command that launches LinkSnooper and tells it to launch mathkernel.exe"] *)

Here's how I can do this on my WIndows machine.

Needs["SubKernels`LocalKernels`"]

$ConfiguredKernels=
{LocalMachine["javaw -classpath \"" <> FileNameJoin[{$InstallationDirectory, "SystemFiles", "Links", "JLink", "JLink.jar"}] <> "\" com.wolfram.jlink.util.LinkSnooper -kernelmode launch -kernelname \"" <> FileNameJoin[{$InstallationDirectory, "mathkernel.exe"}] <> "\" -feSide Master -kernelSide Sub", 2, LowerPriority->True]}

The 2 above is the original default number of local subkernels on my machine, which I got from the original value of $ConfiguredKernels. On Linux or Mac, use "java" instead of "javaw", and you will probably need to use ' instead of " for quoting paths that have spaces in them. The -feSide and -kernelSide parameters specify the names of the two sides of the link for use during printing.

Now when you launch the kernels, you will see one LinkSnooper window open up for each one:

LaunchKernels[]

Try a parallel command to see what is being transferred:

ParallelEvaluate[2+2]

LinkSnooper will affect the performance of your parallel computations, so you might not get meaningful performance data while it is in use, but at least you can see exactly what is being sent. The performance will improve if you periodically clear the LinkSnooper window using the button at the bottom.

This same basic technique will work for launching remote subkernels, but the command line will be different and more complex. It is left as an exercise to the reader, meaning that I don't have time to work it out right now. Even if serious use of your program requires remote subkernels, you can temporarily switch to local kernels to help understand what is being transferred.

I heavily regret that I cannot get this working on my 64 bits windows 10 system with Mathematica 10.4, 11.0., 11.1. Quite often Mathematica hangs when I evaluate the assignment to $ConfiguredKernels (with 2 replaced with 4) and when it works, LinkSnooper does not turn up. — Fred Simons, Feb 10 '17 at 17:03
To make the method described above more operating-system independent, I believe one can use $mathkernel (available after loading SubKernels`LocalKernels`) for the -kernelname option. Together with changing javaw to java as described, I got LinkSnooper to work on Linux with Mathematica 12.2 — Hausdorff, Jan 31 '21 at 18:51

How to monitor the communication between parallel kernels

1 Answers1

Linked