Partitions and Quality of Service (QoS)
Make sure you have read Specifying resources in Slurm to understand how to use Slurm flags like --partition
and --qos
.
Executive summary
If you're an ordinary user on M3 you will likely only ever need to choose a partition and QoS when you want to use GPUs. In that case you should look at our GPUs on M3 though you may still want to read this page to understand what this actually means. If you have been granted access to a restricted partition (e.g. you are in FIT or CCEMMP) then you will want to read this page.
Partitions
What is a partition?
A partition is a collection of nodes. Generally all of the nodes in a given partition share some property e.g. each node in the gpu
partition has GPUs. On M3 all of the partitions are disjoint i.e. no node will be in 2 different partitions.
How do I specify a partition?
You specify a partition with --partition e.g.
sbatch --partition=gpu ...
What is the default partition?
If you don't specify a --partition it will default to our
comp` partition.
Quality of Service (QoS)
What is Quality of Service (QoS)?
In Slurm a Quality of Service (QoS) is used to apply restrictions on what a single user's jobs can do. Importantly on M3 some partitions require you to specify a QoS in order to use them.
How do I specify a QoS?
You specify a QoS with --qos. For example, if you have access to the fitq
QoS you can specify it with
sbatch --qos=fitq ...
What QoS can I use?
The mon_qos
command shows you which QoS you are allowed to use.
Available partitions on M3
The most up-to-date way of seeing all partitions on M3 is to use the show_cluster
command.
CPU-only partitions
Name | How to use? | Total nodes | Total cores | CPUs per node | Memory per node (GB) |
---|---|---|---|---|---|
General Computation | --partition=comp (default) | 79 | 1864 | Up to 96 | Up to 1532 |
High-Density CPUs | --partition=m3i | 45 | 810 | 18 | 181 |
High-Density CPUs with High Memory | --partition=m3j | 11 | 198 | 18 | 373 |
High-Density CPUs with Extra High Memory | --partition=m3m | 1 | 18 | 18 | 948 |
Short Jobs | --partition=short | 2 | 36 | 18 | 181 |
GPU partitions
GPU type | How to use? | Total nodes | Total cores | CPUs per node | Memory per node (GB) | Total GPUs | GPUs per node |
---|---|---|---|---|---|---|---|
A100,T4,A40 | --partition=gpu | 20 | 552 | Up to 28 | Up to 1020 | 52 | Up to 8 |
H100 | --partition=m3h --qos=m3h | 2 | 144 | 72 | 1010 | 8 | 4 |
V100 | --partition=m3g | 19 | 342 | 18 | Up to 373 | 56 | Up to 3 |
Desktops
Desktop nodes have their own partition, but they are reserved for use by Strudel only. Please see our Strudel docs on running remote desktops. As of December 2024, desktop nodes have a mix of P4, T4, and A40 GPUs.
Restricted partitions
Some partitions on M3 are restricted to only certain groups of users. You will generally already know if one of these relevant to you because your supervisor or colleagues would have told you so.
Partition description | Who can access it? | How to use? |
---|---|---|
Partition for standard jobs with four hour wall-time for omics community | Genomics community members | --partition=genomics --qos=genomics |
Partition with high-RAM nodes for omics community | Genomics community members | --partition=genomicsb --qos=genomicsbq |
Intended for real-time processing of data collected from instruments | ... | --partition=rtqp --qos=rtq |
Dedicated partition for Patrick Sexton's lab | Members of the Sexton lab | --partition=sexton --qos=sexton01 |
Partition dedicated to CCEMMP | Members of CCEMMP | --partition=ccemmp --qos=ccemmp |
Dedicated to Hudson Institute of Medical Research | Members of Hudson | --partition=hudson --qos=hudson |
FIT dedicated GPU nodes | Members of Faculty of IT (FIT) | --partition=fit --qos=fitq |
FIT dedicated CPU nodes | Members of Faculty of IT (FIT) | --partition=fitc --qos=fitqc |
BDI dedicated nodes | Members of Biomedicine Discovery Institute (BDI) | --partition=bdi --qos=bdiq |
Troubleshooting
Invalid qos specification
When you submit a job, if you see the error Invalid qos specification
, it means either:
- The QoS you specified does not actually exist. Run the command below to check if the QoS exists, replacing
some-qos
with the QoS name:
NAME=some-qos; test "$(sacctmgr show qos $NAME | wc -l)" -eq 2 && echo "QoS does NOT exist" || echo "QoS exists"
- The QoS you specified does exist, but you don't have access to it. Check
mon_qos
to see which QoS you are allowed to use. You can check the restricted partitions table above to check if you're eligible to apply for access to the QoS.