Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.
A short summary of the hardware available in the nodes:
Partition | NNodes | GPUs per node | GPU | GPU memory | Max time |
---|---|---|---|---|---|
nvgpu | 18 | 4 | Nvidia A100 | 80GB | 1 day |
amdgpu | 12 | 8 | AMD Mi200 | 64GB | 1 day |
normal | 200 | 0 | - | - | 1 day |
clariden | 200 | 0 | - | - | 30 minutes |
More information on the partitions can be found with scontrol show partitions.
Access and Accounting
- Clariden can be reached via
ssh
from the frontend Ela ( `ssh <username>@ela.cscs.ch`
). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here. - Account and Resources Management Tool
- Usage policies
Connecting to Clariden
Clariden can be accessed by ssh -A ela
(which is a frontend) and from then, ssh clariden.
It's possible to create a shh tunel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config
file (replacing <username>
by your CSCS user name)
Host ela Hostname ela.cscs.ch User <username> AddKeysToAgent yes ForwardAgent yes Host clariden Hostname clariden.cscs.ch User <username> ProxyJump ela IdentityFile ~/.ssh/cscs-key User <username>
Now you should be able to access clariden directly with ssh clariden
.
Running Jobs
- Clariden uses the Slurm workload manager to manage the job scheduling.
Some typical/helpful slurm commands are
sbatch
submit a batch script squeue
check the status of jobs on the system scancel
delete one of your jobs from the queue srun launch commands in an existing allocation srun --interactive --jobid <jobid> --pty bash
start interactive session on an allocated node Example of Slurm job script
#!/bin/bash -l #SBATCH --job-name=<jobname> #SBATCH --time=00:15:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-core=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=16 #SBATCH --partition=nvdgpu #SBATCH --account=<project> srun executable
- Currently there's no accounting of the compute time, but it's expected to be setup.
- Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.