Page History
...
| Table of Contents | ||
|---|---|---|
|
This page documents known MPI issues found on Alps systems, along with workarounds when available.
Existing Issues
Cray MPICH hangs
Cray MPICH on Grace-Hopper systems may hang on larger runs.
| Info | |||||
|---|---|---|---|---|---|
| |||||
There are many possible reasons why an application would hang, many unrelated to MPICH. However, if you are experiencing hangs the issue may be worked around by setting:
Performance may be negatively affected by this option. |
Resolved issues
"cxil_map: write error" when doing inter-node GPU-aware MPI communication
| Tip | ||
|---|---|---|
| ||
The issue has been resolved by a system update on and the workaround is no longer needed. The issue was caused by a system misconfiguration. |
When doing inter-node GPU-aware communication with Cray MPICH after the October 2024 update on Alps, applications will fail with:
| Code Block | ||
|---|---|---|
| ||
cxil_map: write error |
| Tipexpand | |||||
|---|---|---|---|---|---|
| |||||
The Until the issue is resolved properly on the system, the only workaround is to not use inter-node GPU-aware MPI. For users of CP2K encountering this issue, one can disable the use of COSMA, which uses GPU-aware MPI, by placing the follwing in the
Unless you run RPA calculations, this should have limited impact on performance. |
MPI_THREAD_MULTIPLE does not work
| Tip | |
|---|---|
| |
| |
| The issue has been resolved in Cray MPICH version 8.1.30. |
When using MPI_THREAD_MULTIPLE on Grace-Hopper systems Cray MPICH may fail with an assertion that looks similar to:
...
| Code Block | ||
|---|---|---|
| ||
Assertion failed [...]: MPIR_THREAD_GLOBAL_ALLFUNC_MUTEX.count == 0 |
| Tipexpand | |||||
|---|---|---|---|---|---|
| |||||
The issue can be worked around by falling back to a less optimized implementation of MPICH_THREAD_MULTIPLE by setting:
|
MPICH hangs
Cray MPICH on Grace-Hopper systems may hang on larger runs.
| Tip | ||||||
|---|---|---|---|---|---|---|
| ||||||
There are many possible reasons why an application would hang, many unrelated to MPICH. However, if you are experiencing hangs the issue may be worked around by setting:
|