Page History
...
Partition | NNodes | GPUs per node | GPU | GPU memory | Max time |
---|---|---|---|---|---|
normal | 13001298 | 4 | GH200 | 96GB | 12 24 hours |
debug | 032 | 4 | GH200 | 96GB | 30 minutes |
Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.
...
- Clariden can be reached via
ssh
from the frontend Ela ( `ssh <username>@ela.cscs.ch`
). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here. - Account and Resources Management ToolAccess to Clariden is managed through Waldur (https://portal.cscs.ch/). For SwissAI, your access to Clariden is managed by your respective vertical/horizontal project administrators and project managers (typically your PI).
- Usage policies
Connecting to Clariden
...
- Clariden uses the Slurm workload manager to manage the job scheduling.
Some typical/helpful slurm commands are
sbatch
submit a batch script squeue
check the status of jobs on the system scancel
delete one of your jobs from the queue srun launch commands in an existing allocation srun --interactive --jobid <jobid> --pty bash
start interactive session on an allocated node Example of Slurm job script
Code Block language bash #!/bin/bash -l #SBATCH --job-name=<jobname> #SBATCH --time=00:15:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-core=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=16 #SBATCH --account=a-a** srun executable
- You must specify the account attached to the project you are in. You can do this either in the submission script as above, or on the command line, for example:
sbatch -A a-a11 myscript.sh
- Your project account can be identified through Waldur. Simply go to https://portal.cscs.ch/ → Resources → HPC → find the vertical/horizontal you are in, and click on it to see more detailed information.
- Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.
...