`squeue`

squeue is a Slurm command for checking your queued and running jobs. See the official Slurm page for full details.

Usage

By default, squeue shows you all queued and running jobs on M3. You probably only care about your jobs, so use the --me flag:

squeue --me

Some arguments you may find useful are:

--start: show the estimated start time of a pending job.
-O or --Format: specify which columns are shown in the output.
-w or --nodelist: specify which nodes to show queued jobs for.

Output

See the example below:

[lexg@m3-login3 ~]$ squeue --me
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          42016968      comp Interest     lexg  R       0:06      1 m3j002
          42016967      comp  test.sh     lexg  R       0:20      1 m3j002
          42016966      comp  test.sh     lexg  R       0:39      1 m3j003
          42016970       gpu Another      lexg PD       0:00      1 (Resources)

Again, see the official squeue docs for full details, but the default fields are:

Field	Meaning
JOBID	The Slurm job's ID.
PARTITION	Requested partition for the job.
NAME	The name of the job.
USER	The user who submitted the job.
ST	The state of the job. See job states.
TIME	The amount of time the job has run for.
NODES	Requested number of nodes for the job.
NODELIST(REASON)	If the job is running, the list of nodes it is running on. If it is not yet running, this instead shows the reason the job has not started. See Reasons for a job not starting.

Job states

See the official squeue docs for a full list of possible job states. Generally, you will only see one of the following:

State	Meaning
PD PENDING	Job has not yet started.
R RUNNING	Job has started and is currently running.
CG COMPLETING	Job is in the process of completing. This should generally finish within a few seconds or minutes at most.

Reasons for a job not starting

The REASON field of squeue will sometimes show a particular reason that your job has not already started. See the Slurm list of reason codes for every possible reason code, but here are the most common ones you will see:

Reason	Meaning	What should you do?
Priority	One or more higher priority jobs exist for this partition or advanced reservation.	Nothing. The best you could do is request fewer resources.
Resources	The job is waiting for resources to become available.	Nothing. The best you could do is request fewer resources.
QOSMaxGRESPerUser	Your job requested more GPUs than are allowed by the QoS.	Use `mon_qos` to check the `gpu` value in `MaxTRESPU`, and never request more than that number of GPUs (across all of your jobs).
QOSMaxWallDurationPerJobLimit	Your job requested more walltime than is allowed by the QoS.	Use `mon_qos` to check `MaxWall`, and never request more walltime than this.
QOSMaxCpuPerUserLimit	Your job requested more CPUs than are allowed by the QoS.	Use `mon_qos` to check the `cpu` value in `MaxTRESPU`, and don't try to use more than this number of CPUs at once (across all of your jobs)
QOSMaxSubmitJobPerUserLimit	You already have the maximum number of submitted jobs for this QoS. E.g. the `desktopq` QoS limits this to 1 submitted job at a time.	Use `mon_qos` to check `MaxSubmitPU`, and don't submit more than this number of jobs at once.
MaxGRESPerAccount	Your job requested more GPUs than are allowed for your account. Note your account represents your HPC ID project, i.e. this count is shared amongst your colleagues.	Use `mon_qos` to check `MaxTRESPA`, and either wait or ask your colleagues to request fewer GPUs.

note

Some of these reasons are only temporary. For example, if you have a job using 4 GPUs in the normal QoS, and then you submit another job requesting 1 GPU, Slurm will say QOSMaxGRESPerUser for your new job since starting that new job would result in you using more than 4 GPUs at once, forbidden by the normal QoS. If you simply wait for your first job to finish though, your second job should eventually stop saying QOSMaxGRESPerUser and will be able to start.

Usage​

Output​

Job states​

Reasons for a job not starting​

Usage

Output

Job states

Reasons for a job not starting