OV - gmx_mpi - Global

Help is available by moving the cursor above any symbol or by checking MAQAO website.

Total Time (s)		18.23
Profiled Time (s)		16.60
Time in analyzed loops (%)		48.9
Time in analyzed innermost loops (%)		41.1
Time in user code (%)		52.0
Compilation Options Score (%)		100
Array Access Efficiency (%)		52.1

Potential Speedups
Perfect Flow Complexity		1.02
Perfect OpenMP + MPI + Pthread		1.23
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution		2.14
No Scalar Integer	Potential Speedup	1.03
No Scalar Integer	Nb Loops to get 80%	14
FP Vectorised	Potential Speedup	1.02
FP Vectorised	Nb Loops to get 80%	8
Fully Vectorised	Potential Speedup	1.10
Fully Vectorised	Nb Loops to get 80%	33
FP Arithmetic Only	Potential Speedup	1.10
FP Arithmetic Only	Nb Loops to get 80%	32

Application	../../install_gcc/bin/gmx_mpi
Timestamp	2024-08-05 19:41:40	Universal Timestamp	1722879700
Number of processes observed	48	Number of threads observed	192
Experiment Type	MPI; OpenMP;
Machine	ins01.benchmarkcenter.megware.com
Model Name	AMD EPYC 9654 96-Core Processor
Architecture	x86_64	Micro Architecture	ZEN_V4
Cache Size	1024 KB	Number of Cores	96
OS Version	Linux 5.14.0-427.18.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 28 06:27:02 EDT 2024
Architecture used during static analysis	x86_64	Micro Architecture used during static analysis	ZEN_V4
Frequency Driver	acpi-cpufreq	Frequency Governor	performance
Huge Pages	always	Hyperthreading	on
Number of sockets	2	Number of cores per socket	96
Compilation Options	libgromacs_mpi.so.9.0.0: GNU C++17 13.2.0 -march=skylake-avx512 -g -O3 -std=c++17 -fno-omit-frame-pointer -fPIC -fexcess-precision=fast -funroll-all-loops -fopenmp
Comments	GROMACS 2024.2 compiled with gcc 13.2 running on two 96 cores AMD Zen 4 processors, using 48 MPI ranks and 4 OMP threads per MPI rank. Pinning is controlled by GROMACS.

Dataset
Run Command	<executable> mdrun -s ion_channel.tpr -nsteps 10000 -pin on -deffnm gcc
MPI Command	mpirun -genv I_MPI_FABRICS=shm -n <number_processes>
Number Processes	48
Number Nodes	1
Number Processes per Nodes	48
Filter	Not Used
Profile Start	Not Used