Default Values For Selecting Hardware
A number of SLURM mechanisms are available to select different hardware:
- partitions
- QOS
- gres
- constraint
Not all of these mechanisms need to be specfied when submitting, and are listed here only for completeness. Users have a default partition and QOS, so do not need to specify them when submitting jobs. However doing so will do no harm, and may remind them about their values.
Name | Default Value |
---|---|
partition | comp |
QOS | normal |
The values of the defaults is likely to change over time, as we add new hardware and optimize the system. Current values can be found with some of these commands. To view more information on the outputs, please use the manual pages.
man scontrol
man sacctmgr
Command | Values |
---|---|
scontrol show partitions | Lists All Our Partitions, currently short, comp and gpu |
scontrol show partition comp | Detailed information on the comp partition including Maximum Wall Time (7 days) and Default Memory per CPU (4096M) |
sacctmgr show qos normal format="Name,MaxWall,MaxCPUSPerUser,MaxTresPerUser%20" | Name MaxWall MaxCPUsPU MaxTRESPU normal 7-00:00:00 65 cpu=65,gres/gpu=3 |
Please note that MonARCH uses the QOS to control how much of the cluster one user can use. For the QOS normal a user has:
- a maximum of 65 CPUs (cores)
- a maxmimum of 3 GPU cards
- a maximum wall time of 7 days
Partitions Available​
MonARCH hardware is split into several partitions.
The default partition for all submitted jobs is:
- comp for compute nodes
Other partitions include:
- short for jobs with a walltime < 1 day. This will run only on previously MonV1 hardware.
- gpu for the GPU nodes
Example: To use the short partition for jobs < 1 hour, put this in your Slurm submission script.
#SBATCH --partition=short
Selecting a particular CPU Type​
The hardware available consists of several sort of nodes: All nodes have hyper-threading turned off.
- mi* nodes are 36 core Xeon-Gold-6150 @ 2.70GHz servers wtih 158893MB usable memory
- gp* nodes are 28 core Xeon-E5-2680-v4 @ 2.40GHz servers with 241660MB usable memory. Each gp server has two P100 GPU cards.
- mk* nodes are 48 core Xeon-Platinum-8260 @ 2.4GHz servers with 342000M usable memory.
- md* nodes are 48 core Xeon-Gold-5220R @ 2.20GHz servers with 735000M usable memory. Each server has two processors with 28 cores each.
- hm00. This single node is 36 core Xeon-Gold-6150 @ 2.7GHz server with 1.4TB usable memory.
Sometimes users may want to constrain themselves to use a particular CPU type, e.g. for timing reasons. In this case, they need to specify this with a constraint flag in the SLURM submissions script. As a parameter, this flag specifies the CPU type needed. The CPU type of a particular node can be viewed by runinng this command:
scontrol show node <nodename>
command and then looking for the Feature field.
Examples:
# this command requests only mi* nodes that have Xeon-Gold processors
#SBATCH --constraint=Xeon-Gold-6150
This feature should only be used if you must have a particular processor. Jobs will schedule faster if you do not use it.
Selecting a particular server
Users can specify to use only a particular server if they wish.
Example: Only run jobs on server ge00
#SBATCH --nodelist=ge00
Selecting a GPU Node​
To request one or more GPU cards, you need to specify:
- the gpu partition
- the name and type of GPU in a gres statement. Your running program will only be allowed access to the number of cards that you specify.
You should not use the constraint feature described above.
# this command requests one P100 card on a node. This is a gp* machine
#SBATCH --partition=gpu
#SBATCH --gres=gpu:P100:1