Thanks, when I run the Benchmark I obtained:
prun IMB-MPI1 PingPong
[prun] Master compute host = c1
[prun] Resource manager = slurm
[prun] Launch cmd = mpiexec.hydra -bootstrap slurm IMB-MPI1 PingPong (family=impi)
[0] MPI startup(): Multi-threaded optimized library
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[0] MPI startup(): dapl data transfer mode
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): dapl data transfer mode
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 15607 c1 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): 1 9944 master.localdomain {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31}
[0] MPI startup(): I_MPI_DEBUG=5
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx4_0:0
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 2018, MPI-1 part
#------------------------------------------------------------
# Date : Mon Jan 8 14:36:35 2018
# Machine : x86_64
# System : Linux
# Release : 3.10.0-693.el7.x86_64
# Version : #1 SMP Tue Aug 22 21:09:27 UTC 2017
# MPI Version : 3.1
# MPI Thread Environment:
# Calling sequence was:
# IMB-MPI1 PingPong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 1.42 0.00
1 1000 1.36 0.74
2 1000 1.36 1.47
4 1000 1.33 3.00
8 1000 1.33 6.02
16 1000 1.31 12.25
32 1000 2.13 15.00
64 1000 2.10 30.49
128 1000 2.23 57.35
256 1000 2.32 110.25
512 1000 2.51 203.99
1024 1000 2.98 343.91
2048 1000 3.84 532.86
4096 1000 4.62 887.53
8192 1000 6.37 1285.44
16384 1000 9.26 1768.39
32768 1000 14.35 2283.48
65536 640 24.65 2659.08
131072 320 45.13 2904.53
262144 160 232.03 1129.76
524288 80 333.10 1573.97
1048576 40 525.58 1995.10
2097152 20 921.90 2274.82
4194304 10 1756.10 2388.43
# All processes entering MPI_Finalize
It is not using the IB connection right?