|
This page documents known MPI issues found on Alps systems, along with workarounds when available.
The issue has been resolved by a system update on and the workaround is no longer needed. The issue was caused by a system misconfiguration. |
When doing inter-node GPU-aware communication with Cray MPICH after the October 2024 update on Alps, applications will fail with:
cxil_map: write error |
Until the issue is resolved properly on the system, the only workaround is to not use GPU-aware MPI. For users of CP2K encountering this issue, one can disable the use of COSMA, which uses GPU-aware MPI, by placing the follwing in the
Unless you run RPA calculations, this should have limited impact on performance. |
The issue has been resolved in Cray MPICH version 8.1.30. |
When using MPI_THREAD_MULTIPLE on Grace-Hopper systems Cray MPICH may fail with an assertion that looks similar to:
Assertion failed [...]: (&MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX)->count == 0 |
or
Assertion failed [...]: MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX.count == 0 |
The issue can be worked around by falling back to a less optimized implementation of MPICH_THREAD_MULTIPLE by setting:
|
Cray MPICH on Grace-Hopper systems may hang on larger runs.
There are many possible reasons why an application would hang, many unrelated to MPICH. However, if you are experiencing hangs the issue may be worked around by setting:
Performance may be negatively affected by this option. |