Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.
A short summary of the hardware available in the nodes:
Partition | NNodes | GPUs per node | GPU | GPU memory | Max time |
---|---|---|---|---|---|
normal | 1300 | 4 | GH200 | 96GB | 12 hours |
debug | 0 | 4 | GH200 | 96GB | 30 minutes |
Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.
More information on the available partitions can be found with scontrol show partitions.
ssh
from the frontend Ela ( `ssh <username>@ela.cscs.ch`
). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.Clariden can be accessed by ssh -A ela
(which is a frontend) and from then, ssh clariden.
It's possible to create a shh tunnel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config
file (replacing <username>
by your CSCS user name)
Host ela Hostname ela.cscs.ch User <username> AddKeysToAgent yes ForwardAgent yes Host clariden Hostname clariden.cscs.ch User <username> ProxyJump ela IdentityFile ~/.ssh/cscs-key User <username> |
Now you should be able to access clariden directly with ssh clariden
.
Some typical/helpful slurm commands are
sbatch | submit a batch script |
squeue | check the status of jobs on the system |
scancel | delete one of your jobs from the queue |
srun | launch commands in an existing allocation |
srun --interactive --jobid <jobid> --pty bash | start interactive session on an allocated node |
Example of Slurm job script
#!/bin/bash -l #SBATCH --job-name=<jobname> #SBATCH --time=00:15:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-core=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=16 srun executable |