Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.

A short summary of the hardware available in the nodes:

Partition	NNodes	GPUs per node	GPU	GPU memory	Max time
normal	1298	4	GH200	96GB	24 hours
debug	32	4	GH200	96GB	30 minutes

Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.

More information on the available partitions can be found with scontrol show partitions.

Maintenance

We aim to keep planned disruptive updates (meaning the services may potentially be inaccessible) to Tuesday morning (CET).
Exceptional and non-disruptive updates may happen outside of this period.

Access and Accounting

Clariden can be reached via ssh from the frontend Ela ( `ssh <username>@ela.cscs.ch`). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.
Access to Clariden is managed through Waldur (https://portal.cscs.ch/). For SwissAI, your access to Clariden is managed by your respective vertical/horizontal project administrators and project managers (typically your PI).
Usage policies

Connecting to Clariden

Clariden can be accessed by ssh -A ela (which is a frontend) and from then, ssh clariden.

It's possible to create a shh tunnel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config file (replacing <username> by your CSCS user name)

Host ela
 Hostname ela.cscs.ch
 User <username>
 AddKeysToAgent yes
 ForwardAgent yes
Host clariden
 Hostname clariden.cscs.ch
 User <username>
 ProxyJump ela
 IdentityFile ~/.ssh/cscs-key
 User <username>

Now you should be able to access clariden directly with ssh clariden .

Running Jobs

Clariden uses the Slurm workload manager to manage the job scheduling.

Some typical/helpful slurm commands are

sbatch	submit a batch script
squeue	check the status of jobs on the system
scancel	delete one of your jobs from the queue
srun	launch commands in an existing allocation
srun --interactive --jobid <jobid> --pty bash	start interactive session on an allocated node

Example of Slurm job script

#!/bin/bash -l

#SBATCH --job-name=<jobname>
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=16
#SBATCH --account=a-a**

srun executable

You must specify the account attached to the project you are in. You can do this either in the submission script as above, or on the command line, for example: sbatch -A a-a11 myscript.sh
- Your project account can be identified through Waldur. Simply go to https://portal.cscs.ch/ → Resources → HPC → find the vertical/horizontal you are in, and click on it to see more detailed information.
Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

Files system

Storage in Clariden

Software

We encourage to run the software in Clariden via containers. Jobs using containers can be easily setup and submitted thanks to the Container Engine service.
To build images see Building container images on Alps

Support

Your first port-of-call for support should be to check for related topics in the #cscs-users channel in the SwissAI slack space (swissai-initiative.slack.com). We additionally provide a more general slack space (cscs-users.slack.com) where CSCS engineers are also present. We note that while support may be offered in this slack space, it is not an official support channel, however CSCS engineers are very helpful in this space. If you can't resolve your issue through the above means, the best and recommended way to get support is by creating a ticket on our helpdesk (https://jira.cscs.ch/plugins/servlet/desk). We endeavor to respond to your tickets within ~3H.

Content

Space Tools

Maintenance

Access and Accounting

Connecting to Clariden

Running Jobs

Files system

Support

Content

Space Tools

Breadcrumbs

Getting started on Clariden

Maintenance

Access and Accounting

Connecting to Clariden

Running Jobs

Files system

Software

Support