Skip to main content

About MonARCH

MonARCH is pioneering and building high performance computing upon Monash's specialist Research Cloud fabric. MonARCH has been supplied by Dell with a Mellanox low latency network and NVIDIA GPUs.

System configuration​

The MonARCH cluster serves the university's HPC users as its primary community, and remains distinct and independent from MASSIVE M3. However, it is closely aligned with M3. Specifically, MonARCH features:

  • two dedicated login nodes and a dedicated data transfer node (like on MASSIVE M3);
  • over 60 servers, totalling to over 1600 CPU cores;
  • 15 GPU nodes, with a mix of nVIDIA Tesla P100 (http://www.nvidia.com/object/tesla-p100.html) cards and K80 (https://www.nvidia.com/en-gb/data-center/tesla-k80/) cards;
  • a SLURM scheduler with service redundancy, with better stability and new features to improve fair share;
  • a website for MonARCH HPC user documentation; and
  • a convergence to a single HPC software module environment, shared with MASSIVE M3.

Hardware​

NameCPUNumber of Cores / ServerUsable Memory / ServerNotes
mi*Xeon-Gold 6150 @ 2.70GHz36158893MB
hi*Xeon-Gold 6150 @ 2.70GHz27131000MBSame hardware as mi* nodes, but with less cores/memory in the VM
ga*Xeon-Gold-6330 @ 3.10GHz56754178MBEach server has two A100 GPU devices
gd*Xeon-Gold-6448Y @ 4.1GHG64774551MBEach server has two A40 GPU devices
hm00Xeon-Gold-6150 @ 2.70GHz261419500MBSpecialist High Memory ~1.4TB machine. Please contact support to get access
md*Xeon(R) Gold 5220R @ 2.2GHz48735000MBThe most recent Monarch Nodes which are baremetal
mk*Xeon-Platinum-8260 @ 2.50GHz48342000MB
ms*Xeon-Gold-6338 @ 2.00GHz64505700MBThe most recent Monarch Nodes

Login Information​

MonARCH has two interactive login nodes and one dedicated for data transfers. The hostnames for these are:

HostnamePurpose
monarch.erc.monash.eduThis alias will take you to one of the two login nodes below
monarch-login4.erc.monash.eduThe first login node of MonARCH
monarch-login5.erc.monash.eduThe second login node of MonARCH
monarch-dtn.erc.monash.eduThis alias will take you to our dedicated data transfer node for large file transfers and rsync operations
monarch-dtn2.erc.monash.eduA dedicated data transfer node ideal for large file transfers and rsync operations

MonARCH vs M3​

MonARCH and M3 share the same user identity system. However users on one cluster can not log into the other unless they belong to an active project on that cluster.

Hyperthreading​

All nodes on MonARCH V2 have hyperthreading turned off for performance reasons.

Software Stack​

MonARCH V2 uses the M3 software stack:

  • /usr/local for Centos7 software
  • /apps for Rocky9 Software

This sosftware is made availalbue using environment modules. This is explained in Software on MonARCH.

SLURM Partitions​

MonARCH V2's SLURM scheduler currently uses a simple partition (queue) structure:

  • comp for CPU-only jobs of up to seven days long
  • gpu for GPU jobs of up to seven days long
  • short for jobs with a wall time less than 24-hour jobs
  • himem for the high memory node only. Please contact support to get access to this partition.

MonARCH uses SLURM's QOS (Quality of Service) feature to control access to different features of the cluster. All users belong to a default QOS called normal. Users may be directed to use a different QOS at times (i.e. to use a Partner Share).

How to examine the QOS:​

sacctmgr  show qos normal format="Name,MaxWall,MaxCPUSPerUser,MaxTresPerUser%20"
Name MaxWall MaxCPUsPU MaxTRESPU
normal 7-00:00:00 64 cpu=64,gres/gpu=3

We have a helpful script (mon_qos) that prints out QOS values and the ones you have access to,

mon_qos