Skip to main content

Running Simple Jobs in Slurm

Quick Start

A submission script is a shell script that consists of a list of processing tasks that need to be carried out, such as the command, runtime libraries, and input and/or output files for the tasks. If you know the resources that your tasks need to consume, you may also modify the SBATCH script with some of the common directives, e.g.:

A "Hello World" Example

Consider a file first.slm

#!/bin/bash
#SBATCH --job-name=MyFirstJob
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4096
#SBATCH --cpus-per-task=1
echo "helloworld"

To submit this job, you go

sbatch first.slm

Slurm returns the jobid of the job. You can view the status of the job with:

  • scontrol show job <jobid> This show syou the status of pending and running jobs only. See here to viview the status of finished jobs.
  • squeue -u $USER This command prints out all the jobs you have.

This script describes the job: it is a serial job with only one process (--ntasks=1). It only needs one CPU core to run the ./helloworld process. The default memory per CPU has been set to 4GB and you should adjust the script based on how much your job needs.

Send an email alert via Slurm

Here is a another example which enables email alerts at the begin and end of every job, or if the job fails. It also specifies a file name for the output of the job with the option --output=. If you do not use this option, the default file name is "slurm-%j.out", where the "%j" is replaced by the job ID.

#!/bin/env bash
#SBATCH --job-name=sample_email2
#SBATCH --time=10:00:00
#SBATCH --mem=4000
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=first.last@monash.edu
#SBATCH --output=SimpleJob-%j.out
module load modulefile
./myC

A list of some Slurm commands

To see detailed information about what the options do, type

man sbatch

in a Shell window in our cluster.

important

Slurm commands (i.e. #SBATCH )must be at the top of your submission scripts. Slurm ignores them once it encounters any non-Slurm commands.

Short FormatLong FormatDefaultDescription
-N count--nodes=countOneNumber of nodes to be used. Used to allocate count nodes to your job.
-A accountID--account=accountIDYour DefaultEnter the account ID for your group. You may check your available account(s) with id command.
-t HH:MM:SS--time=HH:MM:SSPartition DefaultAlways specify the maximum wallclock time for your job, max is 7 days.
-p Name--partition=NamecompSpecify your partition
-n count--ntasksOneControls the number of tasks (Unix Processes) to be created for the job
N/A--ntasks-per-nodeOneControls the maximum number of tasks per allocated node
-c count--cpus-per-taskOneControls the number of CPUs allocated per task
N/A--mem-per-cpu4096MBMemory size per CPU
-m size--mem=size4096MBTotal memory size
-J jobname--job-name=job_nameslurm-jobidUp to 15 printable, non-whitespace characters
N/A--gres=gpu:1N/AGeneric consumable resources e.g. GPU
N/A--no-requeue--requeueBy default, job will be requeued after a node failure

Running Simple Batch Jobs

Submitting a job to Slurm is performed by running the sbatch command and specifying a job script.

sbatch job.script

You can supply options (e.g. --ntasks=xx) to the sbatch command. If an option is already defined in the job.script file, it will be overridden by the commandline argument.

sbatch [options] job.script

Cancelling jobs

To cancel one job

scancel [JOBID]

To cancel all of your jobs

scancel -u [USERID]