Checking job status
There are two methods to check your job status.
Method 1: show_job
We provide a show_job
script. This script groups information, filters,
sorts, and provides statistics to provide a clean, tidy, and
user-friendly output.
show_job 3000558
-----------------------------------------------------------------------------------
JobID 3000558
USERID smichnow
USER Name Simon Michnowicz (Monash University)
Email
-----------------------------------------------------------------------------------
Job Name testV2feature
Project general
Partition comp
QoS normal
Job State PENDING
Why cant Run Resources
Running Time 00:00:00
Total Time 00:05:00
Submit Host monarch-dtn
Submit Time 2018-06-19T14:29:36
-----------------------------------------------------------------------------------
Job Resource Node=1
NumCPUs=16
CPUsPerTask=1
CPUsPerNode=1
MemoryPerNode=1000M
Constraint=Xeon-E5-2680-v3
----------------------------------------------------------------------------------
Job Working Dir:
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:
/home/smichnow/slurm/hc-3000558
Job Error File:
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------
To check the status of a single job use show_job [JOBID]
.
Method 2: Slurm commands
To display all of your running/pending jobs use
squeue -u `whoami
`.
whoami
returns your username, and is a handy shortcut.
$ squeue -u `whoami`
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
If you want to view the status of a single job
$ scontrol show job [JOBID]
squeue
Status Codes and Reasons
The squeue
command details a variety of information on an
active job's status with state and reason codes. Job state codes
describe a job's current state in queue (e.g. pending, completed). Job
reason codes describe the reason why the job is in its current state.
The following tables outline a variety of job state and reason codes you
may encounter when using squeue
to check on your jobs.
squeue status codes
Status | Code | Explanation |
---|---|---|
COMPLETED | CD | The job has completed successfully. |
COMPLETING | CG | The job is finishing but some processes are still active. |
FAILED | F | The job terminated with a non-zero exit code and failed to execute. |
PENDING | PD | The job is waiting for resource allocation. It will eventually run. |
PREEMPTED | PR | The job was terminated because of preemption by another job. |
RUNNING | R | The job currently is allocated to a node and is running. |
SUSPENDED | S | A running job has been stopped with its cores released to other jobs. |
STOPPED | ST | A running job has been stopped with its cores retained. |
Job Reason Codes
Reason Code | Explanation |
---|---|
Priority | One or more higher priority jobs is in queue for running. Your job will eventually run. |
Dependency | This job is waiting for a dependent job to complete and will run afterwards. |
Resources | The job is waiting for resources to become available and will eventually run. |
InvalidAccount | The job's account is invalid. Cancel the job and rerun with correct account. |
InvaldQoS | The job's QoS is invalid. Cancel the job and rerun with correct account. |
QOSGrpCpuLimit | All CPUs assigned to your job's specified QoS are in use; job will run eventually. |
QOSGrpMaxJobsLimit | Maximum number of jobs for your job's QoS have been met; job will run eventually. |
QOSGrpNodeLimit | All nodes assigned to your job's specified QoS are in use; job will run eventually. |
PartitionCpuLimit | All CPUs assigned to your job's specified partition are in use; job will run eventually. |
PartitionMaxJobsLimit | Maximum number of jobs for your job's partition have been met; job will run eventually. |
PartitionNodeLimit | All nodes assigned to your job's specified partition are in use; job will run eventually. |
Method 3: mon_sacct
mon_sacct is a wrapper script around sacctmgr and prints out lots of useful information in a user-friendly way.
i.e.
mon_sacct 41087900
Loading python/3.7.3-system
Loading requirement: gcc/8.1.0
/apps/slurm-23.11.9/bin/sacct -S 2024-11-27 -p --format="User,JobID,JobName,State,Submit,Start,End,Elapsed,WorkDir%35,TimeLimit,ReqMem,Account,Partition,Priority,MaxVMSize,AveVMSize,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,QOS,NodeList,CPUTime,UserCPU,SystemCPU,TotalCPU,minCPU,ReqTRES,Reservation,NTasks,ExitCode,Nodelist,SubmitLine,Reason,Comment,Constraints,Comment,SystemComment,AdminComment" -j 41087900
+--------------+----------------------------------+
| Field | Value |
+--------------+----------------------------------+
| User | smichnow |
| JobID | 41087900 |
| JobName | batch1 |
| State | COMPLETED |
| Submit | 2024-11-29T11:58:49 |
| Start | 2024-11-29T11:59:04 |
| End | 2024-11-29T11:59:05 |
| Elapsed | 00:00:01 |
| WorkDir | /home/smichnow/slurm |
| Timelimit | 00:01:00 |
| ReqMem | 1000M |
| Account | nq46 |
| Partition | comp |
| Priority | 106917 |
| MaxVMSize | |
| AveVMSize | |
| AveRSS | |
| MaxRSS | |
| MaxRSSTask | |
| MaxRSSNode | |
| QOS | normal |
| NodeList | m3j005 |
| CPUTime | 00:00:01 |
| UserCPU | 00:00.011 |
| SystemCPU | 00:00.012 |
| TotalCPU | 00:00.023 |
| MinCPU | |
| ReqTRES | billing=1,cpu=1,mem=1000M,node=1 |
| Reservation | |
| NTasks | |
| ExitCode | 0:0 |
| NodeList | m3j005 |
| SubmitLine | sbatch rocky.slm |
| Reason | None |
| Constraints | r9 |
| AdminComment | |
| | |
+--------------+----------------------------------+
<etc>