Breadcrumbs

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

amdgpu

Partition	NNodes	GPUs per node	GPU	GPU memory	Max time
nvgpunormal	181300	4	Nvidia A100	80GB	1 day	GH200	96GB	12 hours	8	AMD Mi200	64GB	1 day
debug	normal	200	0	-4	-	1 day	GH200	96GB	clariden	200	0	-	-	30 minutes

More information on the partitions can be found with scontrol show partitions.

...

It's possible to create a shh tunel tunnel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config file (replacing <username> by your CSCS user name)

...

Clariden uses the Slurm workload manager to manage the job scheduling.

Some typical/helpful slurm commands are

sbatch	submit a batch script
squeue	check the status of jobs on the system
scancel	delete one of your jobs from the queue
srun	launch commands in an existing allocation
srun --interactive --jobid <jobid> --pty bash	start interactive session on an allocated node

Example of Slurm job script

Code Block

language	bash

#!/bin/bash -l

#SBATCH --job-name=<jobname>
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=16
#SBATCH --partition=nvdgpu
#SBATCH --account=<project>

srun executable

Currently there's no accounting of the compute time, but it's expected to be setup.

Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

...

Content

Space Tools

Versions Compared

Old Version 12

New Version 13

Key