The computing system Piz Daint is accessible via ssh from the front end ela.cscs.ch as daint.cscs.ch. The software environment on the system is controlled using the Modules framework, which provides a flexible mechanism to access compilers, tools and applications.

Getting Started

Parallel programs compiled with Cray-MPICH (the MPI library available on this system) must be run on the compute nodes using the Slurm srun command: running applications on the login nodes is not allowed, as they are a shared resource. Slurm batch scripts should be submitted with the sbatch command from the $SCRATCH folder: users are NOT supposed to run jobs from different filesystems, because of the low performance. A simple Slurm job submission script would look like the following:

#!/bin/bash -l
#SBATCH --job-name=job_name
#SBATCH --time=01:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-core=2
#SBATCH --ntasks-per-node=12
#SBATCH --cpus-per-task=2
#SBATCH --partition=normal
#SBATCH --constraint=gpu
#SBATCH --account=<project>

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export CRAY_CUDA_MPS=1
module load daint-gpu
srun ./executable.x

Please replace the string <project> with the ID of the active project that will be charged for the allocation. The flag -l at the beginning allows you to call the module command within the script, for instance to load the GPU enabled software stack in the MODULEPATH with the module daint-gpu. Alternatively, the module daint-mc will make the multicore software stack available in your MODULEPATH instead.

It is required to specify either --constraint=gpu or --constraint=mc: the option --constraint=gpu makes sure that the scheduler allocates the XC50 Intel Haswell 12-core nodes with GPU devices and automatically sets the option --gres=gpu:1. The --constraint=mc would ensure instead that the batch job is allocated to the multicore XC40 Intel Broadwell 2 x 18-core nodes and not to the GPU nodes: please note that you need to use the latter with the partition prepost and if you target large memory nodes using the flag --mem=120GB.

A summary report will be appended at the end of any batch job creating output files (interactive jobs do not create Slurm output files). Please refer to Batch Job Summary Report for the details on the information displayed in the summary report.

Slurm batch queues

Name of the queueMax timeMax nodesBrief Description
debug30 min10Quick turnaround for test jobs (one per user)
large12 h4400Job size must be larger than normal queue, by arrangement only
long7 days4Maximum 5 long jobs in total (one per user)
low6 h2400(gpu)/512(mc)Up to 130% of project's quarterly allocation
normal24 h2400(gpu)/512(mc)Standard queue for production work
prepost30 min1High priority pre/post processing
xfer24h1Data transfer queue

The list of queues and partitions is available typing sinfo or scontrol show partition. Note that not all groups are enabled on every partition, please check the AllowGroups entry of the command scontrol show partition. You can choose the queue where to run your job by issuing the Slurm directive --partition in your batch script: #SBATCH --partition=<partition_name>

Please check the Slurm man pages and the official documentation for further details on the scheduler. You will also find useful information in the corresponding section of the FAQ.

Interactive Computing with Jupyter Notebooks

Along with traditional access via a terminal and ssh, you can access Piz Daint resources via your browser through a user interface based on Jupyter. This service is available at https://jupyter.cscs.ch. Further information is available at JupyterLab.

File Systems

The $SCRATCH space /scratch/snx3000/$USER is connected via Infiniband interconnect to the system. File systems access type (read, write, None) from compute and login nodes is summarized below:

scratch/users/project/store
Compute nodesr+wr+wr
Login nodesr+wr+wr+w

Please carefully read the general information on file systems at CSCS.