19

I'm currently using Mathematica on a High Performance Cluster (HPC) which consists of many compute nodes each with around 16 cores. I currently run my Mathematica script on 20 of the nodes that invokes 10 cores and 10 of the subkernels licenses in Parallel, meaning I use 20 Mathkernel licenses and 200 subkernel licenses.

The problem is we have limited Mathkernel licenses (36, and for me to be using 20 of them is unfair on everyone else!) although ample subkernel licenses (288). Is there a way I can just use a single (or at least fewer) Mathkernel licenses to invoke the 200 subkernels I need?

Currently in each of the 20 scripts I just have

LaunchKernels[10];
ParellelTable[....];

which launches the 10 local subkernels on each node, but could I specify different nodes to launch subkernels on perhaps? Thereby I would only need to launch one Mathkernel which could invoke the 200 subkernels spread across the compute nodes.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
fpghost
  • 2,125
  • 2
  • 21
  • 25
  • 6
    Are you asking or boasting? :) – Dr. belisarius Feb 27 '13 at 13:54
  • Yes, this is possible, and I am doing this right now. But how it can be done is dependent on the grid engine your HPC cluster is using as well as details of the local setup. – Szabolcs Feb 27 '13 at 14:00
  • @belisarius hehe @Szabolcs The grid engine is PBS Pro. – fpghost Feb 27 '13 at 14:09
  • @fpghost I spent a day and a half on just figuring this out, so I uploaded a solution to bitbucket. It's not for PBS, but looking at it might save you some time. See my answer. – Szabolcs Feb 27 '13 at 14:17
  • @Szabolcs thanks very much, I'll have a play around and see how I get on. – fpghost Feb 27 '13 at 14:30
  • @fpghost If you get it working, can you document your solution, like I did, and put a link here? Even if it's valid for your cluster only, it would have saved me so much time to see an existing solution. – Szabolcs Feb 27 '13 at 15:08
  • @Szabolcs yes, will do. – fpghost Feb 27 '13 at 15:19

3 Answers3

14

What you need to launch subkernels across several nodes on a HPC cluster is the following:

  1. Figure out how to request several compute nodes for the same job
  2. Find the names of the nodes that have been allocated for your job
  3. Find out how to launch subkernels on these nodes from within the main kernel

All of these depend on the grid engine your cluster is using, as well as your local setup, and you'll need to check its docs and ask your administrator about the details. I have an example for our local setup (complete with a jobfile), which might be helpful for you to study:

https://bitbucket.org/szhorvat/crc/src

Our cluster uses the Sun Grid Engine. The names of the nodes (and information about them) are listed in a "hostfile" which you can find by retrieving the value of the PE_HOSTFILE environment variable. (I think this works the same way with PBS, except the environment variable is called something else.)

Note that if you request multiple nodes in a single job file, the job script will be run on only one of the nodes, and you'll be launching the processes across all nodes manually (at least on SGE and PBS).

Launching processes on different nodes is usually possible with ssh: just run ssh nodename command to run command. You may also need to set up passphraseless authentication if it is not set up by default. To launch subkernels, you'll need to pass the -f option to ssh to let it return immediately after it has launched the remote process.

Some setups use rsh instead of ssh. To launch a command in the background using rsh, you'll need to do

rsh -n nodename "command >& /dev/null &"

To run the remote process in the background, it important to redirect the output (both stdout and stderr) because there's a bug in rsh (also described in its man page) that won't let it return immediately otherwise.

Another thing to keep in mind about rsh is that you can't rsh to the local machine, so you'll need to launch the subkernels which will run on the same machine as the main kernel without rsh.

See my example for details.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • How's the performance on the cluster? I'd be very wary of scaling problems with this many subkernels... also, last I tried, MathLink begins to crap out with hundreds of simultaneous connections. – Oleksandr R. Feb 27 '13 at 14:34
  • @OleksandrR. I haven't run more than 60 kernels like this yet, and for my application the communication was minimal between the kernels, so I didn't notice any performance problems. I'll try a very big job soon, just out of curiosity. – Szabolcs Feb 27 '13 at 14:41
  • So to request several compute nodes for the same job I simplet do qsub -l select=10:ncpus=10:mem=2gb subscript.sh for example which would give me 10 nodes each with 10 cores. So far so good. – fpghost Feb 27 '13 at 15:08
  • @fpghost Szabolcs's suggestion of using ssh should work, but given that you're using PBS, you might find OSC mpiexec preferable. I find it easier to use and much more reliable than the ssh-based method (N.B.: for other programs; Mathematica isn't available on any of the clusters I have access to). Despite the name, the programs you launch do not have to be compiled against an MPI library. – Oleksandr R. Feb 27 '13 at 15:13
  • Still just figuring out what's going on in your CRC.m. Could you possibly explain the command SubKernels`RemoteKernels`RemoteMachine[ host,rsh <> " -n `1` \"" <> math <> " -mathlink -linkmode Connect `4` -linkname '`2`' -subkernel -noinit >& /dev/null &\"", cores] is this the same as described in the Manual launching section of http://reference.wolfram.com/mathematica/ParallelTools/tutorial/ConnectionMethods.html – fpghost Feb 27 '13 at 16:28
  • @fpghost Yes, it's the same. Also see here. The RemoteMachine head just contains all the information needed by Mathematica to launch a subkernel: the host, the launch command, the number of subkernels to launch on that host. Those backticked numbers are templates: the value will be automatically substituted. See the page you linked for their meaning. Here I had to use rsh to launch, but you will likely need ssh on your cluster. – Szabolcs Feb 27 '13 at 16:33
  • @Szabolcs OK, thanks. What does the <> ...<> notation mean? Is it just to dereference the var? I think I just need to get the PBSpro equivalent to the env var "PE_HOSTFILE" then. I thought it was PBS_NODEFILE but that seems to be empty, hmm. – fpghost Feb 27 '13 at 16:46
  • @fpghost <> is string join. "asd" <> "123" === "asd123". – Szabolcs Feb 27 '13 at 16:55
  • OK, I think I understand what is going on now. I have changed the rsh line to ssh="/usr/bin/ssh" and other refs to rsh to ssh in CRC.m. I have also used PBS_NODEFILE (which seems to give a path to a file when I print it now) for the location of hosts. I am getting errors like Part::take: Cannot take positions 1 through 2 in {node052}. however. – fpghost Feb 27 '13 at 17:34
  • I currently have SubKernels`RemoteKernels`RemoteMachine[ host, ssh -x -f -l <> " -n `1` \"" <> math <> " -mathlink -linkmode Connect `4` -linkname '`2`' -subkernel -noinit\"", cores] – fpghost Feb 27 '13 at 17:46
  • Actually is this error to do with reading the hostfile? – fpghost Feb 27 '13 at 17:52
  • @fpghost The format of the hostfile seems to be different for PBS. On SGE it's a table where the first column is the hostnames and the second one is the number of cores. You'll have to examine a hostfile on PBS and see what it contains. – Szabolcs Feb 27 '13 at 18:03
  • Ah, it just contains one column with node053 node055,..... the number of cores is always 16 though, so I can put that in manually by just removing the second arg of CRC. – fpghost Feb 27 '13 at 18:14
  • ParallelTable::nopar: No parallel kernels available; proceeding with sequential evaluation. is what I get now. With my hostfile in the above format, I changed things in CRC.m to be like hosts = Import[hostfile, "List"]; $ConfiguredKernels = Join[$ConfiguredKernels, CRCKernel @@@ Rest[hosts]] and changed CRCKernel to CRCKernel[host_String]:= SubKernels`RemoteKernels`RemoteMachine[ host, ssh -x -f <> " -n `1` \"" <> math <> " -mathlink -linkmode Connect `4` -linkname '`2`' -subkernel -noinit\"", 16] as well as the original definition. – fpghost Feb 27 '13 at 18:25
  • You didn't quote ssh as a string ... Also I'd recommend that you don't use CRC.m directly. Instead read it, understand what it does, and re-build it step by step, verifying that each step works. The first thing for you would be figuring out how to start commands on a remote machine using ssh from the terminal (not from within Mathematica). – Szabolcs Feb 27 '13 at 18:37
  • I do think I understand it reasonably well now, although I am a little rusty on some Mathematica syntax to do with string joins etc. How is; SubKernels`RemoteKernels`RemoteMachine[ host, ssh <> " -x -f -n `1` \"" <> "/cm/shared/apps/Mathematica_8.0.4/bin/math" <> " -mathlink -linkmode Connect `4` -linkname '`2`' -subkernel -noinit\"", 16]? On these HPC math is not an alias and full path is needed- the error I now get is `bash: math: command not found

    KernelObject::rdead: Subkernel connected through remote[node053] appears dead.` (with or without abs path in above)

    – fpghost Feb 27 '13 at 18:51
  • I defined ssh="/usr/bin/ssh" already in place of yourrsh` def. – fpghost Feb 27 '13 at 19:04
  • @Szabolcs totally baffled as to why it's giving bash: math: command not found when I am passing the absolute path to math everywhere I can in CRC.m. Also even from the terminal simply doing ssh node055 /cm/shared/apps/Mathematica_8.0.4/bin/math gives the familiar Input[] but then times out after 15 secs. – fpghost Feb 28 '13 at 00:04
  • @Szabolcs Hi, I finally got this working. I don't know much about Package writing, so have just placed my needed Kernel Launch commands in job.nb along with parsing commands for hostfile. If I try to use your package as a template then add it to my notebook with Needs things no longer work for some reason. I don't know much about package writing so I am prob doing something stupid there. If you like however I can make a package from my notebook and upload to bitbucket and perhaps you could take a look so we could publish it there for others using PBSpro in the future? – fpghost Mar 05 '13 at 23:18
  • 2
    @OleksandrR. I think you are right MathLink does seem to be dying for me with 190 Kernels. Despite getting there in the end it now takes vastly longer than when I had 19 single MathKernels with 10 local slaves each. – fpghost Mar 09 '13 at 21:13
  • @Szabolcs Is my understand correct, that the ssh only used once to start the subkernels, and the communication between the subkernel and the main kernel are still using TCPIP? Our university recently bought a big machine, but it seems that I cannot using more than one nodes and I'm suspect that may because the TCPIP are Blocked. – xslittlegrass Feb 03 '15 at 02:12
  • @xslittlegrass Yes, that's correct. SSH is only used to start up the remote kernel, but once the kernel is started, SSH terminates. It doesn't do tunneling or that sort of thing. – Szabolcs Feb 03 '15 at 02:40
  • @Szabolcs, Your Links are not working. Could you please update them – Schrodinger Dec 23 '20 at 11:52
6

Update

The node names in a job can be access through environment variables such as PBS_NODEFILE and HOSTNAME, so that launching subkernels on the correct nodes can be automated.


I'm also trying the run more subKernels from a main kernel on HPC. I usually apply an interaction job on the HPC and run math kernels on it, and then connect back to the front end on may laptop. My waiting time for the queue of the interactive job is very short so it is convenient for me to do the work in the interactive way. Here is how I did, it may not be the same, but hope it would help.

Apply a interative job

qsub -V -I -l walltime=01:00:00,nodes=2:ppn=16 -A hpc_atistartup

it will return something like this:

qsub: waiting for job 48488.mike3 to start
qsub: job 48488.mike3 ready

Running PBS prologue script

PBS has allocated the following nodes:

mike054 mike067

A total of 32 processors on 2 nodes allocated

Check nodes and clean them of stray processes

Checking node mike054 15:43:46 Checking node mike067 15:43:48 Done clearing all the allocated nodes


Concluding PBS prologue script - 01-Sep-2013 15:43:48

[aaa@mike054 ~]$

We can see I get nodes mike054 and mike067, and the shell is on node mike054.

Start remote master kernel

From the menu of the local front end(my laptop), Evaluation ==> Kernel Configuration Options , add a remote Kernel, here I added one called superMike. Select "Advanced Options" and fill it with "-LinkMode Listen -LinkProtocol TCPIP".

enter image description here

Then run a command in a notebook, for example $Version. It would pop out a window like this:

enter image description here

The port and IP address should be different than mine.

With this pop up window opened, go to your shell at the HPC we just get, run the command math to launch command line mathematica. After I get the mathematica shell, enter

$ParentLink = LinkConnect["50013@127.0.0.1,50014@127.0.0.1", LinkProtocol->"TCPIP"]

and hit enter. Then hit the "OK" key of that pop up window. If it successfully connected, it would pop up a message window with

Out[1]:=LinkObject[50013@127.0.0.1,50014@127.0.0.1, 59, 2]

and the $Version command should return the results:

enter image description here

For details of the remote kernel connection, see the post here.

Start subKernels

Open the Remote Kernels tab in Evaluation ==> Parallel Kernel Configuration, clink "Add Host" to add other nodes we get in the interactive job. In this case I get nodes mike054 and mike067, and the shell I get is on node mike054. So I will add mike067 by fill the Hostname, set the number of kernels and check "Enable".

enter image description here

After that we can go to Evaluation ==> Parallel Kernel status, and check whether the subKernel are working. If everything went successfully we can see something like this

enter image description here

We can see that we've launched 16 subKernels on node mike054 and 16 subKernels on node mike067.

Hope it will help.

xslittlegrass
  • 27,549
  • 9
  • 97
  • 186
  • Hi,xslittlegrass! Do you know how to do it without front end? The network of the HPC which I use is not fast enough to use a front end. And I didn't understand Szabolcs's method. Does he achieve the same effect as yours(to parallel computation across different nodes)? – matheorem Nov 11 '13 at 14:52
  • @matheorem Here is the (better) way to do in the code :) – xslittlegrass Nov 11 '13 at 19:47
  • Thank you for the information! I am reading it. I have a question. I can't understand the step Start remote master kernel. In the Mathematica doc "ParallelTools/tutorial/ConnectionMethods", it didn't say that we have to start remote master kernel. And I can understand why your method can launch a master kernel on remote node. Can explain it a little? – matheorem Nov 12 '13 at 06:46
  • @matheorem that remote master kernel is just the default remote kernel we are in. Say if we get 2 nodes "mike1" and "mike2" from qsub and we are on "mike1". Then when you type "math" to launch mathematica kernel, it will launch a master kernel. This master kernel is what I was referring to. It is a "remote" kernel in the sense that the kernel is running on the hpc and the front end is running on my laptop. That's all I meant. After launching the master kernel, we can then launch the sub-kernels using the linked codes. – xslittlegrass Nov 12 '13 at 15:47
  • Now, I am sure that the first step Start remote master kernel is unnecessary. Just Add host in the remote Kernels tab, and then first CloseKernels[], second LauchKernels[], all the Kernels will be launched. This works on my HPC. – matheorem Nov 13 '13 at 00:07
  • @matheorem I'm glade it works for you. For our hpc, we can not access the computational nodes directly from my local computer. I think that's why I need the remote master kernel in my case. – xslittlegrass Nov 13 '13 at 01:23
0

It depends. In our framework it is like that: We have a Mathematica licence server hosting say 10 licences. I lately saw a screen dump from the server status and there, used Kernels and Subkernels were handled independently. This means, if everyone else is using Mathematica in a non-parallel way, I can take me all the Subkernels by going to the preferences and adjusting the number of used Subkernels in the Parallel tab

enter image description here

halirutan
  • 112,764
  • 7
  • 263
  • 474