Cardinal

LS-DYNA mpp-dyna Cardinal: Remote access error on mlx5_0:1, RDMA_READ

You may encounter the following error while running mpp-dyna jobs with multiple nodes:

[c0054:22206:0:22206] ib_mlx5_log.c:179  Remote access error on mlx5_0:1/IB (synd 0x13 vend 0x88 hw_synd 0/0)
[c0054:22206:0:22206] ib_mlx5_log.c:179  RC QP 0xef8 wqe[365]: RDMA_READ s-- [rva 0x32a5cb38 rkey 0x20000] [va 0x319d3bf0 len 10200 lkey 0x2e5f98] [rqpn 0xfb8 dlid=2285 sl=0 port=1 src_path_bits=0]
forrtl: error (76): Abort trap signal

Cause of the Error

Unknown

Affected versions

mpp-dyna versions 11, 13, when running on multiple nodes

STAR-CCM+ MPI job failure and workaround

STAR-CCM+ encounters errors when running MPI jobs with Intel MPI or OpenMPI, displaying the following message:

ib_iface.c:1139 UCX ERROR Invalid active_speed on mlx5_0:1: 128

This issue occurs because the UCX library (v1.8) bundled with STAR-CCM+ only supports Mellanox InfiniBand EDR, while Mellanox InfiniBand NDR is used on Cardinal. As a result, STAR-CCM+ fails to correctly communicate over the newer fabric.

Affected versions

18.18.06.006, 19.04.009 and possibly later versions