Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PartitionNNodesGPUs per nodeGPUGPU memoryMax time
normal130012984GH20096GB12 24 hours
debug0324GH20096GB30 minutes


Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.

...

  • Clariden can be reached via ssh from the frontend Ela ( `ssh <username>@ela.cscs.ch`). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.
  • Account and Resources Management ToolAccess to Clariden is managed through Waldur (https://portal.cscs.ch/). For SwissAI, your access to Clariden is managed by your respective vertical/horizontal project administrators and project managers (typically your PI).
  • Usage policies

Connecting to Clariden

...

  • Clariden uses the Slurm  workload manager to manage the job scheduling.
  • Some typical/helpful slurm commands are

    sbatch
    submit a batch script
    squeue
    check the status of jobs on the system
    scancel
    delete one of your jobs from the queue
    srunlaunch commands in an existing allocation
    srun --interactive --jobid <jobid> --pty bash
    start interactive session on an allocated node


  • Example of Slurm job script

    Code Block
    languagebash
    #!/bin/bash -l
    
    #SBATCH --job-name=<jobname>
    #SBATCH --time=00:15:00
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-core=1
    #SBATCH --ntasks-per-node=4
    #SBATCH --cpus-per-task=16
    #SBATCH --account=a-a**
    
    srun executable


  • You must specify the account attached to the project you are in. You can do this either in the submission script as above, or on the command line, for example: sbatch -A a-a11 myscript.sh
    • Your project account can be identified through Waldur. Simply go to https://portal.cscs.ch/ → Resources → HPC → find the vertical/horizontal you are in, and click on it to see more detailed information.
  • Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

...