mon_qos
mon_qos
is a custom script to check which QoS
are available to you, and what restrictions each QoS places on your jobs. It's really just a wrapper around some sacctmgr
commands.
Usage
Running mon_qos
with no arguments will show you:
- all available QoS on M3, AND
- the list of QoS you are allowed to use.
mon_qos
Optionally, you can specify a single QoS to see only its specifications, rather than being overwhelmed by every possible
QoS. E.g. to see just the normal
QoS's specifications, do:
mon_qos normal
You can specify multiple QoS at once by using commas:
mon_qos normal,desktopq
Output
Running mon_qos
without any arguments
The example output below has had many different QoSs omitted for brevity.
[lexg@m3-login3 ~]$ mon_qos
Loading python/3.7.3-system
Loading requirement: gcc/8.1.0
sacctmgr -p show qos
+-------------+--------------------+
| Field | Value |
+-------------+--------------------+
| Name | normal |
| Priority | 50 |
| GraceTime | 00:00:00 |
| Preempt | irq |
| PreemptMode | requeue |
| UsageFactor | 1.000000 |
| MaxWall | 7-00:00:00 |
| MaxTRESPU | cpu=250,gres/gpu=4 |
| MaxJobsPU | 1000 |
+-------------+--------------------+
+-------------+----------+
| Field | Value |
+-------------+----------+
| Name | m3d |
| Priority | 0 |
| GraceTime | 00:00:00 |
| PreemptMode | cluster |
| UsageFactor | 1.000000 |
| MaxTRESPU | node=4 |
+-------------+----------+
# and many more QoS...
+-------------+--------------------+
| Field | Value |
+-------------+--------------------+
| Name | cryosparc_generalq |
| Priority | 0 |
| GraceTime | 00:00:00 |
| PreemptMode | cluster |
| UsageFactor | 1.000000 |
| MaxWall | 7-00:00:00 |
+-------------+--------------------+
sacctmgr -p show assoc user=lexg format=account,qos
+------+---------+----------------------------------------------------------------------------------------------------+
| User | Account | QOSs allocated to user |
+------+---------+----------------------------------------------------------------------------------------------------+
| lexg | nq46 | bdiq,ccemmpq,desktopq,dgx,fitcq,genomics,genomics03,irq,m3h,normal,rockq,rtq,sexton01,shortq,super |
+------+---------+----------------------------------------------------------------------------------------------------+
The output is split into two parts. First, you will see the full details of every available QoS on M3 along with what constraints each QoS applies. There are many possible constraints that can be set for a QoS, see the Slurm page on "Specifications for QoS" for the full list. On M3, the key specifications used include:
Field | Meaning |
---|---|
MaxWall | Maximum walltime allowed for any job. |
MaxTRESPU | Maximum "Trackable RESources (TRES)" allowed per user (PU). This could specify the maximum number of GPUs, CPUs, or even nodes a user can occupy at any one time. |
MaxTRESPA | Same as MaxTRESPU , but per account (PA) instead. Your Slurm account is equivalent to your HPC ID project, i.e. this quantity is shared amongst every user in your project. |
MaxJobsPU | Maximum number of running jobs you allowed per user. |
The second part shows you the full list of QoS you personally are allowed to use. In my case, that was:
sacctmgr -p show assoc user=lexg format=account,qos
+------+---------+----------------------------------------------------------------------------------------------------+
| User | Account | QOSs allocated to user |
+------+---------+----------------------------------------------------------------------------------------------------+
| lexg | nq46 | bdiq,ccemmpq,desktopq,dgx,fitcq,genomics,genomics03,irq,m3h,normal,rockq,rtq,sexton01,shortq,super |
+------+---------+----------------------------------------------------------------------------------------------------+
You are unlikely to have access to this many QoS, you will generally only have access to ~6. You should never need to ask for a specific QoS on M3, rather you will be given access to the relevant QoS if you apply for access to a restricted partition.
Specifying a QoS when running mon_qos
If you specify a single QoS, you will get much less output like so:
[lexg@m3-login3 ~]$ mon_qos normal
Loading python/3.7.3-system
Loading requirement: gcc/8.1.0
sacctmgr -p show qos normal
+-------------+--------------------+
| Field | Value |
+-------------+--------------------+
| Name | normal |
| Priority | 50 |
| GraceTime | 00:00:00 |
| Preempt | irq |
| PreemptMode | requeue |
| UsageFactor | 1.000000 |
| MaxWall | 7-00:00:00 |
| MaxTRESPU | cpu=250,gres/gpu=4 |
| MaxJobsPU | 1000 |
+-------------+--------------------+
sacctmgr -p show assoc user=lexg format=account,qos
+------+---------+----------------------------------------------------------------------------------------------------+
| User | Account | QOSs allocated to user |
+------+---------+----------------------------------------------------------------------------------------------------+
| lexg | nq46 | bdiq,ccemmpq,desktopq,dgx,fitcq,genomics,genomics03,irq,m3h,normal,rockq,rtq,sexton01,shortq,super |
+------+---------+----------------------------------------------------------------------------------------------------+
The meaning of the output is identical to that described in Running mon_qos
without any arguments.