Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
maxLevel13

This page documents known MPI issues found on Alps systems, along with workarounds when available.

Existing Issues

Cray MPICH hangs

Cray MPICH on Grace-Hopper systems may hang on larger runs.

Info
titleWorkaround

There are many possible reasons why an application would hang, many unrelated to MPICH. However, if you are experiencing hangs the issue may be worked around by setting:

Code Block
themeRDark
export FI_MR_CACHE_MONITOR=disabled

Performance may be negatively affected by this option.

Resolved issues

"cxil_map: write error" when doing inter-node GPU-aware MPI communication

Tip
titleFix

The issue has been resolved by a system update on and the workaround is no longer needed. The issue was caused by a system misconfiguration.

When doing inter-node GPU-aware communication with Cray MPICH after the October 2024 update on Alps, applications will fail with:

Code Block
themeRDark
cxil_map: write error


Tipexpand
titleWorkaround (no longer required)

The Until the issue is resolved properly on the system, the only workaround is to not use inter-node GPU-aware MPI.

For users of CP2K encountering this issue, one can disable the use of COSMA, which uses GPU-aware MPI, by placing the follwing in the &GLOBAL section of your input file: 

Code Block
themeRDark
&FM
TYPE_OF_MATRIX_MULTIPLICATION SCALAPACK
&END FM

Unless you run RPA calculations, this should have limited impact on performance.

MPI_THREAD_MULTIPLE does not work

info
Tip
title
Fix
The issue has been resolved in Cray MPICH version 8.1.30.

When using MPI_THREAD_MULTIPLE on Grace-Hopper systems Cray MPICH may fail with an assertion that looks similar to:

...

Code Block
themeRDark
Assertion failed [...]: MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX.count == 0
Tipexpand
titleWorkaround (no longer required for newer versions of Cray MPICH)

The issue can be worked around by falling back to a less optimized implementation of MPICH_THREAD_MULTIPLE by setting:

Code Block
themeRDark
export MPICH_OPT_THREAD_SYNC=0

MPICH hangs

Cray MPICH on Grace-Hopper systems may hang on larger runs.

Tip
titleWorkaround

There are many possible reasons why an application would hang, many unrelated to MPICH. However, if you are experiencing hangs the issue may be worked around by setting:

Code Block
themeRDark
export FI_MR_CACHE_MONITOR=disabled
Performance may be negatively affected by this option.