Page History
...
Notice the section [annotations]
disabling Slurm and CXI hooks.
MPICH unable to allocate shared memory
We're aware of an issue affecting open source MPICH libraries from using shared memory when launching multiple ranks per node, and we're currently investigating.
The problem usually manifests with the following error message:
Code Block |
---|
Abort(73000719) on node 5: Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(48306)..........: MPI_Init(argc=0xffffeba3326c, argv=0xffffeba33260) failed
MPII_Init_thread(265).........:
MPIR_init_comm_world(34)......:
MPIR_Comm_commit(800).........:
MPIR_Comm_commit_internal(585):
MPID_Comm_commit_pre_hook(151):
MPIDI_world_pre_init(633).....:
MPIDU_Init_shm_init(179)......: unable to allocate shared memory |
The issue should affect only intra-node communication, not inter-node communication. In other words, running 1 MPI rank per node should work correctly.