Checking job status

There are two methods to check your job status.

Method 1: `show_job`

We provide a show_job script. This script groups information, filters, sorts, and provides statistics to provide a clean, tidy, and user-friendly output.

show_job 3000558
-----------------------------------------------------------------------------------
JobID                       3000558                    
USERID                      smichnow                   
USER Name                   Simon Michnowicz (Monash University)  
Email                                                  
 -----------------------------------------------------------------------------------
Job Name                    testV2feature              
Project                     general                    
Partition                   comp                       
QoS                         normal                     
Job State                   PENDING                    
Why cant Run                Resources                  
Running Time                00:00:00                   
Total Time                  00:05:00                   
Submit Host                 monarch-dtn                
Submit Time                 2018-06-19T14:29:36        
-----------------------------------------------------------------------------------
Job Resource                Node=1                     
                          NumCPUs=16                 
                          CPUsPerTask=1              
                          CPUsPerNode=1              
                          MemoryPerNode=1000M        
                          Constraint=Xeon-E5-2680-v3  
 ----------------------------------------------------------------------------------
Job Working Dir:    
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:    
/home/smichnow/slurm/hc-3000558
Job Error File:     
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------

tip

To check the status of a single job use show_job [JOBID].

Method 2: Slurm commands

To display all of your running/pending jobs use squeue -u `whoami`.

tip

whoami returns your username, and is a handy shortcut.

$ squeue -u `whoami`
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

If you want to view the status of a single job

$ scontrol show job [JOBID]

`squeue` Status Codes and Reasons

The squeue command details a variety of information on an active job's status with state and reason codes. Job state codes describe a job's current state in queue (e.g. pending, completed). Job reason codes describe the reason why the job is in its current state.

The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.

squeue status codes

Status	Code	Explanation
COMPLETED	CD	The job has completed successfully.
COMPLETING	CG	The job is finishing but some processes are still active.
FAILED	F	The job terminated with a non-zero exit code and failed to execute.
PENDING	PD	The job is waiting for resource allocation. It will eventually run.
PREEMPTED	PR	The job was terminated because of preemption by another job.
RUNNING	R	The job currently is allocated to a node and is running.
SUSPENDED	S	A running job has been stopped with its cores released to other jobs.
STOPPED	ST	A running job has been stopped with its cores retained.

Job Reason Codes

Reason Code	Explanation
Priority	One or more higher priority jobs is in queue for running. Your job will eventually run.
Dependency	This job is waiting for a dependent job to complete and will run afterwards.
Resources	The job is waiting for resources to become available and will eventually run.
InvalidAccount	The job's account is invalid. Cancel the job and rerun with correct account.
InvaldQoS	The job's QoS is invalid. Cancel the job and rerun with correct account.
QOSGrpCpuLimit	All CPUs assigned to your job's specified QoS are in use; job will run eventually.
QOSGrpMaxJobsLimit	Maximum number of jobs for your job's QoS have been met; job will run eventually.
QOSGrpNodeLimit	All nodes assigned to your job's specified QoS are in use; job will run eventually.
PartitionCpuLimit	All CPUs assigned to your job's specified partition are in use; job will run eventually.
PartitionMaxJobsLimit	Maximum number of jobs for your job's partition have been met; job will run eventually.
PartitionNodeLimit	All nodes assigned to your job's specified partition are in use; job will run eventually.

Method 3: mon_sacct

mon_sacct is a wrapper script around sacctmgr and prints out lots of useful information in a user-friendly way.

i.e.

 mon_sacct 41087900
Loading python/3.7.3-system
  Loading requirement: gcc/8.1.0
/apps/slurm-23.11.9/bin/sacct -S 2024-11-27 -p --format="User,JobID,JobName,State,Submit,Start,End,Elapsed,WorkDir%35,TimeLimit,ReqMem,Account,Partition,Priority,MaxVMSize,AveVMSize,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,QOS,NodeList,CPUTime,UserCPU,SystemCPU,TotalCPU,minCPU,ReqTRES,Reservation,NTasks,ExitCode,Nodelist,SubmitLine,Reason,Comment,Constraints,Comment,SystemComment,AdminComment" -j 41087900
+--------------+----------------------------------+
| Field        | Value                            |
+--------------+----------------------------------+
| User         | smichnow                         |
| JobID        | 41087900                         |
| JobName      | batch1                           |
| State        | COMPLETED                        |
| Submit       | 2024-11-29T11:58:49              |
| Start        | 2024-11-29T11:59:04              |
| End          | 2024-11-29T11:59:05              |
| Elapsed      | 00:00:01                         |
| WorkDir      | /home/smichnow/slurm             |
| Timelimit    | 00:01:00                         |
| ReqMem       | 1000M                            |
| Account      | nq46                             |
| Partition    | comp                             |
| Priority     | 106917                           |
| MaxVMSize    |                                  |
| AveVMSize    |                                  |
| AveRSS       |                                  |
| MaxRSS       |                                  |
| MaxRSSTask   |                                  |
| MaxRSSNode   |                                  |
| QOS          | normal                           |
| NodeList     | m3j005                           |
| CPUTime      | 00:00:01                         |
| UserCPU      | 00:00.011                        |
| SystemCPU    | 00:00.012                        |
| TotalCPU     | 00:00.023                        |
| MinCPU       |                                  |
| ReqTRES      | billing=1,cpu=1,mem=1000M,node=1 |
| Reservation  |                                  |
| NTasks       |                                  |
| ExitCode     | 0:0                              |
| NodeList     | m3j005                           |
| SubmitLine   | sbatch rocky.slm                 |
| Reason       | None                             |
| Constraints  | r9                               |
| AdminComment |                                  |
|              |                                  |
+--------------+----------------------------------+
<etc>

Method 1: show_job​

Method 2: Slurm commands​

squeue Status Codes and Reasons​

squeue status codes​

Job Reason Codes​

Method 3: mon_sacct​