Page History
...
Partition | NNodes | GPUs per node | GPU | GPU memory | Max time | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
nvgpunormal | 181300 | 4 | Nvidia A100 | 80GB | 1 day | amdgpuGH200 | 96GB | 12 hours | 8 | AMD Mi200 | 64GB | 1 day | ||
debug | normal | 200 | 0 | -4 | - | 1 day | GH200 | 96GB | clariden | 200 | 0 | - | - | 30 minutes |
More information on the partitions can be found with scontrol show partitions.
...
It's possible to create a shh tunel tunnel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config
file (replacing <username>
by your CSCS user name)
...
- Clariden uses the Slurm workload manager to manage the job scheduling.
Some typical/helpful slurm commands are
sbatch
submit a batch script squeue
check the status of jobs on the system scancel
delete one of your jobs from the queue srun launch commands in an existing allocation srun --interactive --jobid <jobid> --pty bash
start interactive session on an allocated node Example of Slurm job script
Currently there's no accounting of the compute time, but it's expected to be setup.Code Block language bash #!/bin/bash -l #SBATCH --job-name=<jobname> #SBATCH --time=00:15:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-core=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=16 #SBATCH --partition=nvdgpu #SBATCH --account=<project> srun executable
- Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.
...