Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.

A short summary of the hardware available in the nodes: 

PartitionNNodesGPUs per nodeGPUGPU memoryMax time
normal12984GH20096GB24 hours
debug324GH20096GB30 minutes


Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.

More information on the available partitions can be found with scontrol show partitions.

Maintenance

Access and Accounting

Connecting to Clariden

Clariden can be accessed by ssh -A ela  (which is a frontend) and from then, ssh clariden.

It's possible to create a shh tunnel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config file (replacing <username> by your CSCS user name)

Host ela
 Hostname ela.cscs.ch
 User <username>
 AddKeysToAgent yes
 ForwardAgent yes
Host clariden
 Hostname clariden.cscs.ch
 User <username>
 ProxyJump ela
 IdentityFile ~/.ssh/cscs-key
 User <username>

Now you should be able to access clariden directly with ssh clariden .

Running Jobs

Files system

Software

Support

Your first port-of-call for support should be to check for related topics in the #cscs-users channel in the SwissAI slack space (swissai-initiative.slack.com). We additionally provide a more general slack space (cscs-users.slack.com) where CSCS engineers are also present. We note that while support may be offered in this slack space, it is not an official support channel, however CSCS engineers are very helpful in this space. If you can't resolve your issue through the above means, the best and recommended way to get support is by creating a ticket on our helpdesk (https://jira.cscs.ch/plugins/servlet/desk). We endeavor to respond to your tickets within ~3H.