How M3 works
Running jobs on M3
M3 is a High-Performance Computing (HPC) cluster. This basically means M3 is made up of thousands of CPUs (along with many GPUs) with very fast networking between them. This setup is ideal for workloads that can be parallelised, that is, jobs that can be split up into smaller chunks which can be run simultaneously. Parallelising such workloads can be dramatically faster than if you had to run that same workload on your personal computer.
M3 is made up of many nodes. There are three main types of node:
Type of node | How many? | Purpose |
---|---|---|
Compute | Many, these make up most of M3 | Run all of your computations on these |
Login | 1-2 | You initially connect to a login node, from where you can submit jobs to run on compute nodes |
Data-transfer (DTN) | 1-2 | Use these for large file transfers to and from M3. |
The login nodes are lightweight and are shared by many users at once. You must not run heavy workloads on the login nodes, since this degrades the node's performance for every user and can even render it inaccessible. We will kill any heavyweight processes that we find on the login node and notify you when this happens. If you repeat this after having already been warned, your access to M3 may be revoked.
Every M3 user can freely connect to the login nodes, but you cannot simply connect to a compute node to start running your workload. Instead, we rely on a job scheduler called Slurm. Slurm is responsible for managing all of the resources on M3 (e.g. CPUs, GPUs, memory, nodes, etc.) and sharing those resources fairly between all users on M3. The basic idea is:
- You connect to a login node.
- You submit a job allocation request to Slurm. This includes specifics about how many CPUs you want, how much memory, how much time, etc.
- Slurm places your job in its queue. It will quickly identify a good time to start running your job and which exact resources to allocate for your job.
- Your job will eventually run on the allocated resources, i.e. on compute nodes.
There is still a lot for you to learn about how to run jobs on M3! Dive into Running jobs on M3 to learn more.