You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Clariden is a vcluster (versatile software-defined cluster) that's part of the Alps system.

A short summary of the hardware available in the nodes: 

PartitionNNodesGPUs per nodeGPUGPU memoryMax time
nvgpu184Nvidia A10080GB1 day
amdgpu128AMD Mi20064GB1 day
normal2000--1 day
clariden2000--30 minutes

More information on the partitions can be found with scontrol show partitions.

Access and Accounting

  • Clariden can be reached via ssh from the frontend Ela ( `ssh <username>@ela.cscs.ch`). The access to CSCS services and systems requires users to authenticate using multi-Factor Authentication (MFA). Please, find more information here.
  • Account and Resources Management Tool
  • Usage policies

Connecting to Clariden

Clariden can be accessed by ssh -A ela  (which is a frontend) and from then, ssh clariden.

It's possible to create a shh tunel to Clariden via Ela. The following must be add in your personal computer's ~/.ssh/config file (replacing <username> by your CSCS user name)

Host ela
 Hostname ela.cscs.ch
 User <username>
 AddKeysToAgent yes
 ForwardAgent yes
Host clariden
 Hostname clariden.cscs.ch
 User <username>
 ProxyJump ela
 IdentityFile ~/.ssh/cscs-key
 User <username>

Now you should be able to access clariden directly with ssh clariden .

Running Jobs

  • Clariden uses the Slurm  workload manager to manage the job scheduling.
  • Some typical/helpful slurm commands are

    sbatch
    submit a batch script
    squeue
    check the status of jobs on the system
    scancel
    delete one of your jobs from the queue
    srunlaunch commands in an existing allocation
    srun --interactive --jobid <jobid> --pty bash
    start interactive session on an allocated node
  • Example of Slurm job script

    #!/bin/bash -l
    
    #SBATCH --job-name=<jobname>
    #SBATCH --time=00:15:00
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-core=1
    #SBATCH --ntasks-per-node=4
    #SBATCH --cpus-per-task=16
    #SBATCH --partition=nvdgpu
    #SBATCH --account=<project>
    
    srun executable
  • Currently there's no accounting of the compute time, but it's expected to be setup.
  • Please note that the Slurm scheduling system is a shared resource that can handle a limited amount of batch jobs and interactive commands simultaneously. Therefore users are not supposed to submit arbitrary amounts of Slurm jobs and commands at the same time.

Files system

Software

  • No labels