The computing system Piz Daint is accessible via ssh from the front end ela.cscs.ch
as daint.cscs.ch
. The software environment on the system is controlled using the Modules framework, which provides a flexible mechanism to access compilers, tools and applications.
Getting Started
Parallel programs compiled with Cray-MPICH (the MPI library available on this system) must be run on the compute nodes using the Slurm srun
command: running applications on the login nodes is not allowed, as they are a shared resource. Slurm batch scripts should be submitted with the sbatch
command from the $SCRATCH
folder: users are NOT supposed to run jobs from different filesystems, because of the low performance. A simple Slurm job submission script would look like the following:
#!/bin/bash -l #SBATCH --job-name=job_name #SBATCH --time=01:00:00 #SBATCH --nodes=2 #SBATCH --ntasks-per-core=2 #SBATCH --ntasks-per-node=12 #SBATCH --cpus-per-task=2 #SBATCH --partition=normal #SBATCH --constraint=gpu #SBATCH --account=<project> export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK export CRAY_CUDA_MPS=1 module load daint-gpu srun ./executable.x
Please replace the string <project>
with the ID of the active project that will be charged for the allocation. The flag -l
at the beginning allows you to call the module command within the script, for instance to load the GPU enabled software stack in the MODULEPATH
with the module daint-gpu
. Alternatively, the module daint-mc
will make the multicore software stack available in your MODULEPATH
instead.
It is required to specify either --constraint=gpu
or --constraint=mc
: the option --constraint=gpu
makes sure that the scheduler allocates the XC50 Intel Haswell 12-core nodes with GPU devices and automatically sets the option --gres=gpu:1
. The --constraint=mc
would ensure instead that the batch job is allocated to the multicore XC40 Intel Broadwell 2 x 18-core nodes and not to the GPU nodes: please note that you need to use the latter with the partition prepost
and if you target large memory nodes using the flag --mem=120GB
.
A summary report will be appended at the end of any batch job creating output files (interactive jobs do not create Slurm output files). Please refer to Batch Job Summary Report for the details on the information displayed in the summary report.
Slurm batch queues
Name of the queue | Max time | Max nodes | Brief Description |
---|---|---|---|
debug | 30 min | 10 | Quick turnaround for test jobs (one per user) |
large | 12 h | 4400 | Job size must be larger than normal queue, by arrangement only |
long | 7 days | 4 | Maximum 5 long jobs in total (one per user) |
low | 6 h | 2400(gpu)/512(mc) | Up to 130% of project's quarterly allocation |
normal | 24 h | 2400(gpu)/512(mc) | Standard queue for production work |
prepost | 30 min | 1 | High priority pre/post processing |
xfer | 24h | 1 | Data transfer queue |
The list of queues and partitions is available typing sinfo
or scontrol show partition
. Note that not all groups are enabled on every partition, please check the AllowGroups
entry of the command scontrol show partition
. You can choose the queue where to run your job by issuing the Slurm directive --partition
in your batch script: #SBATCH --partition=<partition_name>
Please check the Slurm man pages and the official documentation for further details on the scheduler. You will also find useful information in the corresponding section of the FAQ.
Interactive Computing with Jupyter Notebooks
Along with traditional access via a terminal and ssh, you can access Piz Daint resources via your browser through a user interface based on Jupyter. This service is available at https://jupyter.cscs.ch. Further information is available at JupyterLab.
File Systems
The $SCRATCH
space /scratch/snx3000/$USER
is connected via Infiniband interconnect to the system. File systems access type (read, write, None) from compute and login nodes is summarized below:
Please carefully read the general information on file systems at CSCS.