Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.

A short summary of the hardware available in the nodes:

Partition	NNodes	GPUs per node	GPU	GPU memory	Max time
nvgpu	18	4	Nvidia A100	80GB	1 day
amdgpu	12	8	AMD Mi200	64GB	1 day
normal	200	0	-	-	1 day
clariden	200	0	-	-	30 minutes

More information on the partitions can be found with scontrol show partitions.

Access and Accounting

Clariden can be reached via ssh from the frontend Ela ( `ssh <username>@ela.cscs.ch`). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.
Account and Resources Management Tool
Usage policies

Connecting to Clariden

Clariden can be accessed by ssh -A ela (which is a frontend) and from then, ssh clariden.

It's possible to create a shh tunel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config file (replacing <username> by your CSCS user name)

Host ela
 Hostname ela.cscs.ch
 User <username>
 AddKeysToAgent yes
 ForwardAgent yes
Host clariden
 Hostname clariden.cscs.ch
 User <username>
 ProxyJump ela
 IdentityFile ~/.ssh/cscs-key
 User <username>

Now you should be able to access clariden directly with ssh clariden .

Running Jobs

Clariden uses the Slurm workload manager to manage the job scheduling.

Some typical/helpful slurm commands are

sbatch	submit a batch script
squeue	check the status of jobs on the system
scancel	delete one of your jobs from the queue
srun	launch commands in an existing allocation
srun --interactive --jobid <jobid> --pty bash	start interactive session on an allocated node

Example of Slurm job script

#!/bin/bash -l

#SBATCH --job-name=<jobname>
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=16
#SBATCH --partition=nvdgpu
#SBATCH --account=<project>

srun executable

Currently there's no accounting of the compute time, but it's expected to be setup.
Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

Files system

Storage in Clariden

Software

We encourage to run the software in Clariden via containers. Jobs using containers can be easily setup and submitted with the Pyxis Slurm plugin. You can find more information in the "Using Container Images on Clariden" guide.
Building OCI container images on clariden.
Running multi-node jobs with NCCL and RCCL

Content

Space Tools

Access and Accounting

Connecting to Clariden

Running Jobs

Files system

Content

Space Tools

Breadcrumbs

Getting started on Clariden

Access and Accounting

Connecting to Clariden

Running Jobs

Files system

Software