Skip to main content

squeue

squeue is a Slurm command for checking your queued and running jobs. See the official Slurm page for full details.

Usage

By default, squeue shows you all queued and running jobs on M3. You probably only care about your jobs, so use the --me flag:

squeue --me

Some arguments you may find useful are:

Output

See the example below:

[lexg@m3-login3 ~]$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
42016968 comp Interest lexg R 0:06 1 m3j002
42016967 comp test.sh lexg R 0:20 1 m3j002
42016966 comp test.sh lexg R 0:39 1 m3j003
42016970 gpu Another lexg PD 0:00 1 (Resources)

Again, see the official squeue docs for full details, but the default fields are:

FieldMeaning
JOBIDThe Slurm job's ID.
PARTITIONRequested partition for the job.
NAMEThe name of the job.
USERThe user who submitted the job.
STThe state of the job. See job states.
TIMEThe amount of time the job has run for.
NODESRequested number of nodes for the job.
NODELIST(REASON)If the job is running, the list of nodes it is running on. If it is not yet running, this instead shows the reason the job has not started. See Reasons for a job not starting.

Job states

See the official squeue docs for a full list of possible job states. Generally, you will only see one of the following:

StateMeaning
PD PENDINGJob has not yet started.
R RUNNINGJob has started and is currently running.
CG COMPLETINGJob is in the process of completing. This should generally finish within a few seconds or minutes at most.

Reasons for a job not starting

The REASON field of squeue will sometimes show a particular reason that your job has not already started. See the Slurm list of reason codes for every possible reason code, but here are the most common ones you will see:

ReasonMeaningWhat should you do?
PriorityOne or more higher priority jobs exist for this partition or advanced reservation.Nothing. The best you could do is request fewer resources.
ResourcesThe job is waiting for resources to become available.Nothing. The best you could do is request fewer resources.
QOSMaxGRESPerUserYour job requested more GPUs than are allowed by the QoS.Use mon_qos to check the gpu value in MaxTRESPU, and never request more than that number of GPUs (across all of your jobs).
QOSMaxWallDurationPerJobLimitYour job requested more walltime than is allowed by the QoS.Use mon_qos to check MaxWall, and never request more walltime than this.
QOSMaxCpuPerUserLimitYour job requested more CPUs than are allowed by the QoS.Use mon_qos to check the cpu value in MaxTRESPU, and don't try to use more than this number of CPUs at once (across all of your jobs)
QOSMaxSubmitJobPerUserLimitYou already have the maximum number of submitted jobs for this QoS. E.g. the desktopq QoS limits this to 1 submitted job at a time.Use mon_qos to check MaxSubmitPU, and don't submit more than this number of jobs at once.
MaxGRESPerAccountYour job requested more GPUs than are allowed for your account. Note your account represents your HPC ID project, i.e. this count is shared amongst your colleagues.Use mon_qos to check MaxTRESPA, and either wait or ask your colleagues to request fewer GPUs.
note

Some of these reasons are only temporary. For example, if you have a job using 4 GPUs in the normal QoS, and then you submit another job requesting 1 GPU, Slurm will say QOSMaxGRESPerUser for your new job since starting that new job would result in you using more than 4 GPUs at once, forbidden by the normal QoS. If you simply wait for your first job to finish though, your second job should eventually stop saying QOSMaxGRESPerUser and will be able to start.