LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ
You may encounter the following error while running mpp-dyna jobs with multiple nodes:
[c0054:22206:0:22206] ib_mlx5_log.c:179 Remote access error on mlx5_0:1/IB (synd 0x13 vend 0x88 hw_synd 0/0) [c0054:22206:0:22206] ib_mlx5_log.c:179 RC QP 0xef8 wqe[365]: RDMA_READ s-- [rva 0x32a5cb38 rkey 0x20000] [va 0x319d3bf0 len 10200 lkey 0x2e5f98] [rqpn 0xfb8 dlid=2285 sl=0 port=1 src_path_bits=0] forrtl: error (76): Abort trap signal
Cause of the Error
Unknown
Affected versions
mpp-dyna versions 11, 13, when running on multiple nodes