Checking job status
There are two methods to check your job status.
Method 1: show_job
​
We provide a show_job
script. This script groups information, filters,
sorts, and provides statistics to provide a clean, tidy, and
user-friendly output.
show_job 3000558
-----------------------------------------------------------------------------------
JobID 3000558
USERID smichnow
USER Name Simon Michnowicz (Monash University)
Email
-----------------------------------------------------------------------------------
Job Name testV2feature
Project general
Partition comp
QoS normal
Job State PENDING
Why cant Run Resources
Running Time 00:00:00
Total Time 00:05:00
Submit Host monarch-dtn
Submit Time 2018-06-19T14:29:36
-----------------------------------------------------------------------------------
Job Resource Node=1
NumCPUs=16
CPUsPerTask=1
CPUsPerNode=1
MemoryPerNode=1000M
Constraint=Xeon-E5-2680-v3
----------------------------------------------------------------------------------
Job Working Dir:
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:
/home/smichnow/slurm/hc-3000558
Job Error File:
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------
To check the status of a single job use show_job [JOBID]
.
Method 2: Slurm commands​
To display all of your running/pending jobs use
squeue -u `whoami
`.
whoami
returns your username, and is a handy shortcut.
$ squeue -u `whoami`
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
If you want to view the status of a single job
$ scontrol show job [JOBID]
squeue
Status Codes and Reasons​
The squeue
command details a variety of information on an
active job's status with state and reason codes. Job state codes
describe a job's current state in queue (e.g. pending, completed). Job
reason codes describe the reason why the job is in its current state.
The following tables outline a variety of job state and reason codes you
may encounter when using squeue
to check on your jobs.
squeue status codes​
Status | Code | Explanation |
---|---|---|
COMPLETED | CD | The job has completed successfully. |
COMPLETING | CG | The job is finishing but some processes are still active. |
FAILED | F | The job terminated with a non-zero exit code and failed to execute. |
PENDING | PD | The job is waiting for resource allocation. It will eventually run. |
PREEMPTED | PR | The job was terminated because of preemption by another job. |
RUNNING | R | The job currently is allocated to a node and is running. |
SUSPENDED | S | A running job has been stopped with its cores released to other jobs. |
STOPPED | ST | A running job has been stopped with its cores retained. |
Job Reason Codes​
Reason Code | Explanation |
---|---|
Priority | One or more higher priority jobs is in queue for running. Your job will eventually run. |
Dependency | This job is waiting for a dependent job to complete and will run afterwards. |
Resources | The job is waiting for resources to become available and will eventually run. |
InvalidAccount | The job's account is invalid. Cancel the job and rerun with correct account. |
InvaldQoS | The job's QoS is invalid. Cancel the job and rerun with correct account. |
QOSGrpCpuLimit | All CPUs assigned to your job's specified QoS are in use; job will run eventually. |
QOSGrpMaxJobsLimit | Maximum number of jobs for your job's QoS have been met; job will run eventually. |
QOSGrpNodeLimit | All nodes assigned to your job's specified QoS are in use; job will run eventually. |
PartitionCpuLimit | All CPUs assigned to your job's specified partition are in use; job will run eventually. |
PartitionMaxJobsLimit | Maximum number of jobs for your job's partition have been met; job will run eventually. |
PartitionNodeLimit | All nodes assigned to your job's specified partition are in use; job will run eventually. |
Method 3: mon_sacct
mon_sacct is a wrapper script around sacctmgr and prints out lots of useful information in a user-friendly wawy.
i.e.