Running Simple Jobs in Slurm
Quick Start
A submission script is a shell script that consists of a list of processing tasks that need to be carried out, such as the command, runtime libraries, and input and/or output files for the tasks. If you know the resources that your tasks need to consume, you may also modify the SBATCH script with some of the common directives, e.g.:
A "Hello World" Example
Consider a file first.slm
#!/bin/bash
#SBATCH --job-name=MyFirstJob
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4096
#SBATCH --cpus-per-task=1
echo "helloworld"
To submit this job, you go
sbatch first.slm
Slurm returns the jobid of the job. You can view the status of the job with:
scontrol show job <jobid>
This show syou the status of pending and running jobs only. See here to viview the status of finished jobs.squeue -u $USER
This command prints out all the jobs you have.
This script describes the job: it is a serial job with only one process
(--ntasks=1
). It only needs one CPU core to run the ./helloworld
process. The default memory per CPU has been set to 4GB and you should
adjust the script based on how much your job needs.
Send an email alert via Slurm
Here is a another example which enables email alerts at the begin and end of every job, or if the job fails. It also specifies a file name for the output of the job with the option --output=. If you do not use this option, the default file name is "slurm-%j.out", where the "%j" is replaced by the job ID.
#!/bin/env bash
#SBATCH --job-name=sample_email2
#SBATCH --time=10:00:00
#SBATCH --mem=4000
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=first.last@monash.edu
#SBATCH --output=SimpleJob-%j.out
module load modulefile
./myC
A list of some Slurm commands
To see detailed information about what the options do, type
man sbatch
in a Shell window in our cluster.
Slurm commands (i.e. #SBATCH
)must be at the top of your submission scripts.
Slurm ignores them once it encounters any non-Slurm commands.
Short Format | Long Format | Default | Description |
---|---|---|---|
-N count | --nodes=count | One | Number of nodes to be used. Used to allocate count nodes to your job. |
-A accountID | --account=accountID | Your Default | Enter the account ID for your group. You may check your available account(s) with id command. |
-t HH:MM:SS | --time=HH:MM:SS | Partition Default | Always specify the maximum wallclock time for your job, max is 7 days. |
-p Name | --partition=Name | comp | Specify your partition |
-n count | --ntasks | One | Controls the number of tasks (Unix Processes) to be created for the job |
N/A | --ntasks-per-node | One | Controls the maximum number of tasks per allocated node |
-c count | --cpus-per-task | One | Controls the number of CPUs allocated per task |
N/A | --mem-per-cpu | 4096MB | Memory size per CPU |
-m size | --mem=size | 4096MB | Total memory size |
-J jobname | --job-name=job_name | slurm-jobid | Up to 15 printable, non-whitespace characters |
N/A | --gres=gpu:1 | N/A | Generic consumable resources e.g. GPU |
N/A | --no-requeue | --requeue | By default, job will be requeued after a node failure |
Running Simple Batch Jobs
Submitting a job to Slurm is performed by running the sbatch
command
and specifying a job script.
sbatch job.script
You can supply options (e.g. --ntasks=xx
) to the sbatch
command. If
an option is already defined in the job.script
file, it will be
overridden by the commandline argument.
sbatch [options] job.script
Cancelling jobs
To cancel one job
scancel [JOBID]
To cancel all of your jobs
scancel -u [USERID]