Checking the status of MonARCH
On MonARCH , users can check the status of all nodes via the
show_cluster
command. The output of this command should be similar to:
show_cluster
NODE TYPE PARTITION Mem (GB) GPU STATUS
(Free) (Free)
gp00 P100 comp 26 181 0 Running
gp01 P100 comp 27 209 1 Running
gp02 P100 comp 28 236 2 Idle
gp03 P100 comp 28 236 2 Idle
gp04 P100 comp 28 236 2 Idle
gp05 P100 comp 28 236 2 Idle
hc00 CPU comp 24 98 0 Idle
hs00 CPU comp 16 98 0 Idle
hs01 CPU comp 16 98 0 Idle
hs02 CPU comp 16 98 0 Idle
hs03 CPU comp 16 98 0 Idle
hs04 CPU comp 16 98 0 Idle
hs05 CPU comp 16 98 0 Idle
mi00 CPU comp 36 155 0 Idle
mi01 CPU comp 36 155 0 Idle
mi02 CPU comp 36 155 0 Idle
mi03 CPU comp 36 155 0 Idle
mi04 CPU comp 36 155 0 Idle
mi05 CPU comp 36 155 0 Idle
mi06 CPU comp 36 155 0 Idle
mi07 CPU comp 36 155 0 Idle
mi08 CPU comp 36 155 0 Idle
mi09 CPU comp 36 155 0 Idle
mi10 CPU comp 36 155 0 Idle
mi11 CPU comp 36 155 0 Idle
Summary:
+------------+-------------+------------+------------+------------+-------------+-------------+
| | Cores | Nodes | K1 GPUs | K80 GPUs | P100 GPUs | V100 GPUs |
|------------+-------------+------------+------------+------------+-------------+-------------|
| Available | 717 (100%) | 23 (92%) | 0 ( 0%) | 0 ( 0%) | 9 (75%) | 0 ( 0%) |
| In Use | 3 ( 0%) | 2 ( 8%) | 0 ( 0%) | 0 ( 0%) | 3 (25%) | 0 ( 0%) |
| Down | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) |
| Reserved | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) | 0 ( 0%) |
| ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Total | 720 | 25 | 0 | 0 | 12 | 0 |
+------------+-------------+------------+------------+------------+-------------+-------------+
The STATUS field explained​
The STATUS
field can show:
- Idle - Node is completely free. No jobs running on the node.
- Running - Some jobs are running on the node but it still has available resources for new jobs.
- Busy - Node is completely busy. There are no free resources on the node. No new jobs can start on this node.
- Offline - Node is offline and unavailable due to a system issue.
- Reserved - Node has been booked by other users and is ONLY available for them.