About MonARCH
MonARCH is pioneering and building high performance computing upon Monash's specialist Research Cloud fabric. MonARCH has been supplied by Dell with a Mellanox low latency network and NVIDIA GPUs.
System configuration​
The MonARCH cluster serves the university's HPC users as its primary community, and remains distinct and independent from MASSIVE M3. However, it is closely aligned with M3. Specifically, MonARCH features:
- two dedicated login nodes and a dedicated data transfer node (like on MASSIVE M3);
- over 60 servers, totalling to over 1600 CPU cores;
- 15 GPU nodes, with a mix of nVIDIA Tesla P100 (http://www.nvidia.com/object/tesla-p100.html) cards and K80 (https://www.nvidia.com/en-gb/data-center/tesla-k80/) cards;
- a SLURM scheduler with service redundancy, with better stability and new features to improve fair share;
- a website for MonARCH HPC user documentation; and
- a convergence to a single HPC software module environment, shared with MASSIVE M3.
Hardware​
Name | CPU | Number of Cores / Server | Usable Memory / Server | Notes |
---|---|---|---|---|
mi* | Xeon-Gold 6150 @ 2.70GHz | 36 | 158893MB | |
hi* | Xeon-Gold 6150 @ 2.70GHz | 27 | 131000MB | Same hardware as mi* nodes, but with less cores/memory in the VM |
ga* | Xeon-Gold-6330 @ 3.10GHz | 56 | 754178MB | Each server has two A100 GPU devices |
gd* | Xeon-Gold-6448Y @ 4.1GHG | 64 | 774551MB | Each server has two A40 GPU devices |
hm00 | Xeon-Gold-6150 @ 2.70GHz | 26 | 1419500MB | Specialist High Memory ~1.4TB machine. Please contact support to get access |
md* | Xeon(R) Gold 5220R @ 2.2GHz | 48 | 735000MB | The most recent Monarch Nodes which are baremetal |
mk* | Xeon-Platinum-8260 @ 2.50GHz | 48 | 342000MB | |
ms* | Xeon-Gold-6338 @ 2.00GHz | 64 | 505700MB | The most recent Monarch Nodes |
Login Information​
MonARCH has two interactive login nodes and one dedicated for data transfers. The hostnames for these are:
Hostname | Purpose |
---|---|
monarch.erc.monash.edu | This alias will take you to one of the two login nodes below |
monarch-login4.erc.monash.edu | The first login node of MonARCH |
monarch-login5.erc.monash.edu | The second login node of MonARCH |
monarch-dtn.erc.monash.edu | This alias will take you to our dedicated data transfer node for large file transfers and rsync operations |
monarch-dtn2.erc.monash.edu | A dedicated data transfer node ideal for large file transfers and rsync operations |
MonARCH vs M3​
MonARCH and M3 share the same user identity system. However users on one cluster can not log into the other unless they belong to an active project on that cluster.
Hyperthreading​
All nodes on MonARCH V2 have hyperthreading turned off for performance reasons.
Software Stack​
MonARCH V2 uses the M3 software stack:
- /usr/local for Centos7 software
- /apps for Rocky9 Software
This sosftware is made availalbue using environment modules. This is explained in Software on MonARCH.
SLURM Partitions​
MonARCH V2's SLURM scheduler currently uses a simple partition (queue) structure:
- comp for CPU-only jobs of up to seven days long
- gpu for GPU jobs of up to seven days long
- short for jobs with a wall time less than 24-hour jobs
- himem for the high memory node only. Please contact support to get access to this partition.
MonARCH uses SLURM's QOS (Quality of Service) feature to control access to different features of the cluster. All users belong to a default QOS called normal. Users may be directed to use a different QOS at times (i.e. to use a Partner Share).
How to examine the QOS:​
sacctmgr show qos normal format="Name,MaxWall,MaxCPUSPerUser,MaxTresPerUser%20"
Name MaxWall MaxCPUsPU MaxTRESPU
normal 7-00:00:00 64 cpu=64,gres/gpu=3
We have a helpful script (mon_qos) that prints out QOS values and the ones you have access to,
mon_qos