Skip to main content

Running jobs on MonARCH

Launching jobs on MonARCH is controlled by Slurm, the Slurm Workload Manager, which allocates the compute nodes, resources, and time requested by the user through command line options and batch scripts. Submitting and running script jobs on the cluster is a straight forward procedure with 3 basic steps:

  • Setup and launch
  • Monitor progress
  • Retrieve and analyse results

Slurm: Useful Commands

WhatSlurm commandComment
Job Submissionsbatch <jobScript>Slurm directives in the jobs script can also be set by command line options for sbatch.
Check queuesqueue or aliases (sq)You can also examine individual jobs squeue -j 792412
Check clustershow_clusterThis is a nicely printed description of the
current state of the machines in our
cluster, built on top of the sinfo -lN command.
Deleting an
existing job
scancel <jobID>jobID is the Slurm job number
Show running jobscontrol show job <jobID> or
mon_sacct <jobID> or
show_job <jobID>
Info on a pending or running job. jobID is the Slurm job number. mon_sacct and show_job are our local helper scripts
Show finished jobsacct -j <jobID> or
mon_sacct <jobID> or
show_job <jobID>
Info on a finished job
Suspend a jobscontrol suspend <jobID>
Resume a jobscontrol resume <jobID>
Deleting parts 5 to 10 of a job arrayscancel <jobID>_5-10This deletes the jobs whose array indices are 5 to 10

Here are some samples of a quick submissions script to get you started.