Skip to main content

mon_qos

mon_qos is a custom script to check which QoS are available to you, and what restrictions each QoS places on your jobs. It's really just a wrapper around some sacctmgr commands.

Usage

Running mon_qos with no arguments will show you:

  • all available QoS on M3, AND
  • the list of QoS you are allowed to use.
mon_qos

Optionally, you can specify a single QoS to see only its specifications, rather than being overwhelmed by every possible QoS. E.g. to see just the normal QoS's specifications, do:

mon_qos normal

You can specify multiple QoS at once by using commas:

mon_qos normal,desktopq

Output

Running mon_qos without any arguments

The example output below has had many different QoSs omitted for brevity.

[lexg@m3-login3 ~]$ mon_qos
Loading python/3.7.3-system
Loading requirement: gcc/8.1.0
sacctmgr -p show qos
+-------------+--------------------+
| Field | Value |
+-------------+--------------------+
| Name | normal |
| Priority | 50 |
| GraceTime | 00:00:00 |
| Preempt | irq |
| PreemptMode | requeue |
| UsageFactor | 1.000000 |
| MaxWall | 7-00:00:00 |
| MaxTRESPU | cpu=250,gres/gpu=4 |
| MaxJobsPU | 1000 |
+-------------+--------------------+
+-------------+----------+
| Field | Value |
+-------------+----------+
| Name | m3d |
| Priority | 0 |
| GraceTime | 00:00:00 |
| PreemptMode | cluster |
| UsageFactor | 1.000000 |
| MaxTRESPU | node=4 |
+-------------+----------+

# and many more QoS...

+-------------+--------------------+
| Field | Value |
+-------------+--------------------+
| Name | cryosparc_generalq |
| Priority | 0 |
| GraceTime | 00:00:00 |
| PreemptMode | cluster |
| UsageFactor | 1.000000 |
| MaxWall | 7-00:00:00 |
+-------------+--------------------+
sacctmgr -p show assoc user=lexg format=account,qos
+------+---------+----------------------------------------------------------------------------------------------------+
| User | Account | QOSs allocated to user |
+------+---------+----------------------------------------------------------------------------------------------------+
| lexg | nq46 | bdiq,ccemmpq,desktopq,dgx,fitcq,genomics,genomics03,irq,m3h,normal,rockq,rtq,sexton01,shortq,super |
+------+---------+----------------------------------------------------------------------------------------------------+

The output is split into two parts. First, you will see the full details of every available QoS on M3 along with what constraints each QoS applies. There are many possible constraints that can be set for a QoS, see the Slurm page on "Specifications for QoS" for the full list. On M3, the key specifications used include:

FieldMeaning
MaxWallMaximum walltime allowed for any job.
MaxTRESPUMaximum "Trackable RESources (TRES)" allowed per user (PU). This could specify the maximum number of GPUs, CPUs, or even nodes a user can occupy at any one time.
MaxTRESPASame as MaxTRESPU, but per account (PA) instead. Your Slurm account is equivalent to your HPC ID project, i.e. this quantity is shared amongst every user in your project.
MaxJobsPUMaximum number of running jobs you allowed per user.

The second part shows you the full list of QoS you personally are allowed to use. In my case, that was:

sacctmgr -p show assoc user=lexg format=account,qos 
+------+---------+----------------------------------------------------------------------------------------------------+
| User | Account | QOSs allocated to user |
+------+---------+----------------------------------------------------------------------------------------------------+
| lexg | nq46 | bdiq,ccemmpq,desktopq,dgx,fitcq,genomics,genomics03,irq,m3h,normal,rockq,rtq,sexton01,shortq,super |
+------+---------+----------------------------------------------------------------------------------------------------+

You are unlikely to have access to this many QoS, you will generally only have access to ~6. You should never need to ask for a specific QoS on M3, rather you will be given access to the relevant QoS if you apply for access to a restricted partition.

Specifying a QoS when running mon_qos

If you specify a single QoS, you will get much less output like so:

[lexg@m3-login3 ~]$ mon_qos normal
Loading python/3.7.3-system
Loading requirement: gcc/8.1.0
sacctmgr -p show qos normal
+-------------+--------------------+
| Field | Value |
+-------------+--------------------+
| Name | normal |
| Priority | 50 |
| GraceTime | 00:00:00 |
| Preempt | irq |
| PreemptMode | requeue |
| UsageFactor | 1.000000 |
| MaxWall | 7-00:00:00 |
| MaxTRESPU | cpu=250,gres/gpu=4 |
| MaxJobsPU | 1000 |
+-------------+--------------------+
sacctmgr -p show assoc user=lexg format=account,qos
+------+---------+----------------------------------------------------------------------------------------------------+
| User | Account | QOSs allocated to user |
+------+---------+----------------------------------------------------------------------------------------------------+
| lexg | nq46 | bdiq,ccemmpq,desktopq,dgx,fitcq,genomics,genomics03,irq,m3h,normal,rockq,rtq,sexton01,shortq,super |
+------+---------+----------------------------------------------------------------------------------------------------+

The meaning of the output is identical to that described in Running mon_qos without any arguments.