Specifying resources in Slurm

When submitting a job request to Slurm, you need to specify which resources your job will need. The full list of options here can be found in the Slurm sbatch page, but there are a lot of options that you will never need. This page summarises the key options you may want to play with on M3.

Quick example

Here is an example sbatch command to run a script called my-script.slurm with 1 hour, 16 GB memory, and 8 CPUs:

sbatch --time=1:00:00 --mem=16G --cpus-per-task-8 my-script.slurm

In general, an sbatch command always looks like:

sbatch [OPTIONS...] SCRIPT

where you can provide as many valid options as you want before placing the script's name at the very end.

Change your job's name

Slurm docs on --job-name.

--job-name="Some interesting name"

Maximum time limit for your job

Slurm docs on --time.

Specify the maximum time that your job might need. You must set this large enough such that your job is guaranteed to finish within the time limit. If your job is still running beyond this time limit, it will be automatically killed. However, a longer --time may lead to your job being in the queue for longer. It is up to you to set an appropriate --time to minimise your queuing time while guaranteeing that your job does not get terminated early.

Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Ask for 15 minutes:

--time=15

Ask for 1 hour, 30 minutes:

--time=1:30:00

Ask for 2 days, 12 hours:

--time=2-12:00:00

Memory

Slurm offers multiple options for configuring memory. The only ones you'll need are:

--mem: memory per node.
--mem-per-cpu: memory per CPU.

If your job only uses one node, then use --mem to specify the total memory. E.g. to ask for 64 GB of memory (per node):

--mem=64G

To ask for 512 MB of memory (per node):

--mem=512M

Perhaps you are using MPI to run multiple tasks in parallel. In this case, you may not care so much about the total memory available, but rather how much memory each individual CPU should have. To ensure each CPU has 4 GB memory:

--mem-per-cpu=4G

Number of CPUs

note

Slurm distinguishes between CPUs, cores, and sockets. You are probably best off only ever thinking in terms of CPUs. If you're curious, see this StackOverflow post for a quick summary of these terms.

If you just want 8 CPUs for a particular job:

--cpus-per-task=8

Note you should only ask for multiple CPUs if the program you are running will actually use them! Some programs do not have any multithreading or multiprocessing and so cannot use more than one CPU at a time.

If you are using MPI, then Slurm lets you specify the number of MPI tasks and the number of CPUs per task. For example, the below asks for 6 tasks, with 2 CPUs per task, giving a total of 12 CPUs for your job:

--ntasks=6 --cpus-per-task=2

tip

Be careful specifying lots of CPUs. If you ask for more CPUs than can be provided by a single node, you will need to specify --nodes=2 (or greater!). Not only will this take a long time for the job to even start, but the communication between two nodes will be slower than communications within a single node, so it might actually be faster for you to just ask for all of the CPUs on a single node.

GPUs

You can specify both the number and (optionally) the type of GPUs you want. This generally involves setting --gres and --partition. See GPUs on M3 for details.

Partitions and Quality of Service (QoS)

See Partitions and Quality of Service (QoS).

Getting emails

You can ask for an email to be sent at different stages of your job's lifecycle. Set --mail-user to your email address:

--mail-user=my-email@monash.edu

You also need to specify which events should trigger an email. See the --mail-type documentation for all the options, but for simplicity, you can just ask for emails for all events with:

--mail-type=ALL

Changing the output files

By default, Slurm saves all of the output of your job (that would ordinarily be printed to the terminal) to a single file called:

slurm-<JOBID>.out

where <JOBID> is the number corresponding to your job's ID. This file will be placed in whichever directory you ran sbatch from. You can change this by specifying --output. You can optionally redirect all error output using --error. Read the linked Slurm docs for more details.

tip

Despite its name, the --error option does not guarantee that all error messages from a program will be directed to the specified file. If you suspect errors have occurred, be sure to check all of your job's output files for warnings and error messages.

Quick example​

Change your job's name​

Maximum time limit for your job​

Memory​

Number of CPUs​

GPUs​

Partitions and Quality of Service (QoS)​

Getting emails​

Changing the output files​