Requesting less than a node with SLURM

Question

I am curious if it is possible to request less than a node using SLURM.

For example, if I use the commands,

#!/bin/bash
#SBATCH --job-name="name"
#SBATCH --output="name.%j.%N.out"
#SBATCH --ntasks 1
#SBATCH --time=00:30:00

will SLURM automatically reserve an entire node for me, or will it try to fit me in on just one processor? Can this make it faster for my small jobs to run on the cluster?

And if I use the commands

#!/bin/bash
#SBATCH --job-name="name"
#SBATCH --output="name.%j.%N.out"
#SBATCH --ntasks 13
#SBATCH --time=00:30:00

but there are only 12 cores per node, then what will SLURM do?

Thanks

With ntasks you do not get any guarantee. If resources are scarce it could squeeze all 13 tasks on a single CPU core. If you want a whole node use --nodes=1 (caveat: your colleagues will hate you if you do not blast the whole node on full load and more jobs could have fit there) — Henri Menke, Jul 14 '17 at 02:03
It also depends on your local set up - SLURM may have been configured so that you always get exclusive access to a node irrespective of how you are trying to use the resources within it. — Ian Bush, Jul 14 '17 at 11:25

score 1 · Accepted Answer · answered Jul 14 '17 at 02:00

tl;dr

For multiprocessing (MPI, message passing) use ntasks.
For multithreading (OpenMP, pthreads) use cpus-per-task.
For hybrid codes you need both options and probably also want to tune ntasks-per-node.

Link to the sbatch manual

This is somewhat complicated. It depends on whether your program needs tasks or cores. For example an MPI based program will be launched several times and communicates via message passing, while an OpenMP based program will only be launched once and will then launch several threads which communicate via shared memory.

In the case of message passing it doesn't matter on which node the tasks are launched as long as they can communicate (Infiniband, Ethernet, etc). In the case of shared memory it is important that tasks run on the same node (in fact, it is required).

The ntasks option of SLURM specifies how many tasks your program will launch, which could be threads of independent instances of the MPI program. However, SLURM assumes that when you say ntasks you mean tasks which communicate by message passing and in case your machine has 12 cores but you requested 13 tasks, it will happily launch 12 tasks on one node and 1 on another node. (I don't think this behaviour is guaranteed. SLURM could also throw all 13 tasks on one node with 12 CPUs and let the CPU schedule the tasks. You can get more fine-grained control using ntasks-per-core and ntasks-per-node.)

If you have a multithreaded program, then you want to use cpus-per-task instead and set ntasks to 1 (or leave it unspecified, as it defaults to 1). This way, if you request 13 CPUs but the maximum available is 12, your job will just be rejected.

Requesting less than a node with SLURM

1 Answers1