sacct
sacct
is a Slurm command for viewing accounting details about previous jobs, such as:
- how long a job ran for,
- how many CPUs and GPUs the job requested,
- whether the job exited with an error or not,
- and much more...
See mon_sacct
for a more user-friendly alternative provided by the M3 admins.
Usage
By default, sacct
shows only your jobs from today. For example:
[lexg@m3-login3 ~]$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
42051183 bash comp nq46 1 COMPLETED 0:0
42051183.ex+ extern nq46 1 COMPLETED 0:0
42051183.0 bash nq46 1 COMPLETED 0:0
See the official Slurm sacct
page for full
details on usage. However, you may find these optional arguments useful:
- Add
--starttime 2025-01-13
to list all of your jobs starting from 13/Jan/2025. - Similarly add
--endtime
to specify the latest date you care about. - Add
--partition=gpu
to only see jobs run on thegpu
partition. - Add
--format=jobid,elapsed,ncpus,ntasks,state
to show the job ID, elapsed time, number of CPUs, number of tasks, and state of your jobs. See Job Accounting Fields for all possible fields. - Add -j 123456 to only show the record for the job with ID 123456.
Output
See the example below:
[lexg@m3-login3 ~]$ sacct --starttime 2025-01-01
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
42006315 bash desktop nq46 1 TIMEOUT 0:0
42006315.ex+ extern nq46 1 COMPLETED 0:0
42006315.0 bash nq46 1 CANCELLED 0:9
42008526 Desktop comp nq46 1 TIMEOUT 0:0
42008526.ba+ batch nq46 1 CANCELLED 0:15
42008526.ex+ extern nq46 1 COMPLETED 0:0
42013037 interacti+ comp nq46 1 COMPLETED 0:0
42013037.ba+ batch nq46 1 COMPLETED 0:0
42013037.ex+ extern nq46 1 COMPLETED 0:0
42013037.0 tmux nq46 1 COMPLETED 0:0
42013067 interacti+ comp nq46 1 COMPLETED 0:0
42013067.ba+ batch nq46 1 COMPLETED 0:0
42013067.ex+ extern nq46 1 COMPLETED 0:0
42013067.0 tmux nq46 1 COMPLETED 0:0
42016966 test.sh comp nq46 1 COMPLETED 0:0
42016966.ba+ batch nq46 1 COMPLETED 0:0
42016966.ex+ extern nq46 1 COMPLETED 0:0
42016967 test.sh comp nq46 1 COMPLETED 0:0
42016967.ba+ batch nq46 1 COMPLETED 0:0
42016967.ex+ extern nq46 1 COMPLETED 0:0
42016968 Interesti+ comp nq46 1 COMPLETED 0:0
42016968.ba+ batch nq46 1 COMPLETED 0:0
42016968.ex+ extern nq46 1 COMPLETED 0:0
42016970 Another j+ gpu nq46 1 COMPLETED 0:0
42016970.ba+ batch nq46 1 COMPLETED 0:0
42016970.ex+ extern nq46 1 COMPLETED 0:0
42017165 Desktop comp nq46 1 CANCELLED+ 0:0
42017165.ba+ batch nq46 1 CANCELLED 0:15
42017165.ex+ extern nq46 1 COMPLETED 0:0
42018984 bash gpu nq46 0 CANCELLED+ 0:0
42018987 bash gpu nq46 0 CANCELLED+ 0:0
42051183 bash comp nq46 1 COMPLETED 0:0
42051183.ex+ extern nq46 1 COMPLETED 0:0
42051183.0 bash nq46 1 COMPLETED 0:0
Again, see the official sacct
docs for full details, but the default fields are:
Field | Meaning |
---|---|
JobID | The Slurm job's ID. |
JobName | The job's name. |
Partition | The partition the job ran on |
Account | The account the job ran under. This is generally your HPC ID project ID. |
AllocCPUS | Number of allocated CPUs for this job. |
State | The state the job exited with. See Job state codes for details |
ExitCode | A numeric exit code for your job. 0:0 is generally good, and any other code is generally bad. You will need to google your exact exit code to try understand its meaning. |
Some other fields that you may find useful are:
Field | Meaning |
---|---|
MaxRSS | "Maximum resident set size": this is approximately the maximum memory used by all of the tasks in your job. Note this may be inaccurate since it can miss sudden spikes in memory usage. |
Elapsed | How much time your job ran for. Useful to review! If you requested 12 hours but your job only took 1 hour, you should update your --time for similar future jobs so your jobs are queued more quickly. |
WorkDir | The directory you submitted a job request from. May be useful if you can't find your job's output files. |