Skip to main content

Checking job status

There are two methods to check your job status.

Method 1: show_job​

We provide a show_job script. This script groups information, filters, sorts, and provides statistics to provide a clean, tidy, and user-friendly output.

show_job 3000558
-----------------------------------------------------------------------------------
JobID 3000558
USERID smichnow
USER Name Simon Michnowicz (Monash University)
Email
-----------------------------------------------------------------------------------
Job Name testV2feature
Project general
Partition comp
QoS normal
Job State PENDING
Why cant Run Resources
Running Time 00:00:00
Total Time 00:05:00
Submit Host monarch-dtn
Submit Time 2018-06-19T14:29:36
-----------------------------------------------------------------------------------
Job Resource Node=1
NumCPUs=16
CPUsPerTask=1
CPUsPerNode=1
MemoryPerNode=1000M
Constraint=Xeon-E5-2680-v3
----------------------------------------------------------------------------------
Job Working Dir:
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:
/home/smichnow/slurm/hc-3000558
Job Error File:
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------
tip

To check the status of a single job use show_job [JOBID].

Method 2: Slurm commands​

To display all of your running/pending jobs use squeue -u `whoami`.

tip

whoami returns your username, and is a handy shortcut.

$ squeue -u `whoami`
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

If you want to view the status of a single job

$ scontrol show job [JOBID]

squeue Status Codes and Reasons​

The squeue command details a variety of information on an active job's status with state and reason codes. Job state codes describe a job's current state in queue (e.g. pending, completed). Job reason codes describe the reason why the job is in its current state.

The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.

squeue status codes​

StatusCodeExplanation
COMPLETEDCDThe job has completed successfully.
COMPLETINGCGThe job is finishing but some processes are still active.
FAILEDFThe job terminated with a non-zero exit code and failed to execute.
PENDINGPDThe job is waiting for resource allocation. It will eventually run.
PREEMPTEDPRThe job was terminated because of preemption by another job.
RUNNINGRThe job currently is allocated to a node and is running.
SUSPENDEDSA running job has been stopped with its cores released to other jobs.
STOPPEDSTA running job has been stopped with its cores retained.

Job Reason Codes​

Reason CodeExplanation
PriorityOne or more higher priority jobs is in queue for running. Your job will eventually run.
DependencyThis job is waiting for a dependent job to complete and will run afterwards.
ResourcesThe job is waiting for resources to become available and will eventually run.
InvalidAccountThe job's account is invalid. Cancel the job and rerun with correct account.
InvaldQoSThe job's QoS is invalid. Cancel the job and rerun with correct account.
QOSGrpCpuLimitAll CPUs assigned to your job's specified QoS are in use; job will run eventually.
QOSGrpMaxJobsLimitMaximum number of jobs for your job's QoS have been met; job will run eventually.
QOSGrpNodeLimitAll nodes assigned to your job's specified QoS are in use; job will run eventually.
PartitionCpuLimitAll CPUs assigned to your job's specified partition are in use; job will run eventually.
PartitionMaxJobsLimitMaximum number of jobs for your job's partition have been met; job will run eventually.
PartitionNodeLimitAll nodes assigned to your job's specified partition are in use; job will run eventually.

Method 3: mon_sacct

mon_sacct is a wrapper script around sacctmgr and prints out lots of useful information in a user-friendly wawy.

i.e.