Breadcrumbs

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Partition	NNodes	GPUs per node	GPU	GPU memory	Max time
normal	13001298	4	GH200	96GB	12 24 hours
debug	032	4	GH200	96GB	30 minutes

Each node consists of 4xGH200 superchips. Each superchip is a unified memory system consisting of a Grace CPU and a Hopper GPU with a 900GB/s NVLINKC2C connect. The Grace CPUs share 512GB of LPDDR5X memory. Each individual Hopper GPU has 96GB of HBM3 memory with 3000GB/s read/write, totaling 896GB of unified memory available within each node.

...

Clariden can be reached via ssh from the frontend Ela ( `ssh <username>@ela.cscs.ch`). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.
Account and Resources Management ToolAccess to Clariden is managed through Waldur (https://portal.cscs.ch/). For SwissAI, your access to Clariden is managed by your respective vertical/horizontal project administrators and project managers (typically your PI).
Usage policies

Connecting to Clariden

...

Clariden uses the Slurm workload manager to manage the job scheduling.

Some typical/helpful slurm commands are

sbatch	submit a batch script
squeue	check the status of jobs on the system
scancel	delete one of your jobs from the queue
srun	launch commands in an existing allocation
srun --interactive --jobid <jobid> --pty bash	start interactive session on an allocated node

Example of Slurm job script

Code Block

language	bash

#!/bin/bash -l

#SBATCH --job-name=<jobname>
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=16
#SBATCH --account=a-a**

srun executable

You must specify the account attached to the project you are in. You can do this either in the submission script as above, or on the command line, for example: sbatch -A a-a11 myscript.sh
- Your project account can be identified through Waldur. Simply go to https://portal.cscs.ch/ → Resources → HPC → find the vertical/horizontal you are in, and click on it to see more detailed information.
Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

...

Content

Space Tools

Versions Compared

Old Version 15

New Version 16

Key

Connecting to Clariden