You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.

A short summary of the hardware available in the nodes: 

PartitionNNodesGPUs per nodeGPUGPU memoryMax time
normal13004GH20096GB12 hours
debug04GH20096GB30 minutes


Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.

More information on the available partitions can be found with scontrol show partitions.

Access and Accounting

  • Clariden can be reached via ssh from the frontend Ela ( `ssh <username>@ela.cscs.ch`). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.
  • Account and Resources Management Tool
  • Usage policies

Connecting to Clariden

Clariden can be accessed by ssh -A ela  (which is a frontend) and from then, ssh clariden.

It's possible to create a shh tunnel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config file (replacing <username> by your CSCS user name)

Host ela
 Hostname ela.cscs.ch
 User <username>
 AddKeysToAgent yes
 ForwardAgent yes
Host clariden
 Hostname clariden.cscs.ch
 User <username>
 ProxyJump ela
 IdentityFile ~/.ssh/cscs-key
 User <username>

Now you should be able to access clariden directly with ssh clariden .

Running Jobs

  • Clariden uses the Slurm  workload manager to manage the job scheduling.
  • Some typical/helpful slurm commands are

    sbatch
    submit a batch script
    squeue
    check the status of jobs on the system
    scancel
    delete one of your jobs from the queue
    srunlaunch commands in an existing allocation
    srun --interactive --jobid <jobid> --pty bash
    start interactive session on an allocated node
  • Example of Slurm job script

    #!/bin/bash -l
    
    #SBATCH --job-name=<jobname>
    #SBATCH --time=00:15:00
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-core=1
    #SBATCH --ntasks-per-node=4
    #SBATCH --cpus-per-task=16
    
    srun executable
  • Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

Files system

Software

  • No labels