Theme: MAQAO_theme darkgrey cyan
Help is available by moving the cursor above any symbol or by checking MAQAO website .
r0: OMP1
r1: OMP2
r2: OMP4
r3: OMP8
r4: OMP16
r5: OMP24
Metric r0 r1 r2 r3 r4 r5 Total Time (s) 4.73 E3 2.68 E3 1.64 E3 979.98 628.93 508.50
Max (Thread Active Time) (s) 4.71 E3 2.66 E3 1.62 E3 964.37 613.34 492.24
Average Active Time (s) 4.71 E3 2.61 E3 1.54 E3 880.25 525.13 402.11
Activity Ratio (%) 99.6 97.4 94.5 89.8 83.5 79.1
Average number of active threads 3.984 7.791 15.115 28.743 53.437 75.914
Affinity Stability (%) 100.0 100.0 99.9 99.9 99.8 99.8
GFLOPS 9.776 19.104 36.943 69.268 125.769 139.855
Time in analyzed loops (%) 0.42 0.36 0.30 0.27 0.23 0.20
Time in analyzed innermost loops (%) 0.41 0.36 0.30 0.27 0.22 0.20
Time in user code (%) 2.35 2.10 1.76 1.55 1.30 1.14
Compilation Options Score (%) 100 100 100 100 100 100
Array Access Efficiency (%) 55.2 55.2 55.1 55.3 54.9 55.1
Potential Speedups
Perfect Flow Complexity 1.00 1.00 1.00 1.00 1.00 1.00
Perfect OpenMP + MPI + Pthread 1.01 1.01 1.01 1.02 1.03 1.04
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution 1.01 1.03 1.07 1.14 1.25 1.34
Scalability - Gap 1.00 1.13 1.38 1.66 2.13 2.58
No Scalar Integer Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 1 1 1 1 1 1 FP Vectorised Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 1 1 1 1 1 1 Fully Vectorised Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 3 3 3 3 3 4 Only FP Arithmetic Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 3 3 3 3 3 3
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlacpy.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_pdmatgen.c–
○
▼ HPL_setran.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlacpy.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_pdmatgen.c–
○
▼ HPL_setran.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlacpy.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_pdmatgen.c–
○
▼ HPL_setran.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlacpy.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_pdmatgen.c–
○
▼ HPL_setran.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlacpy.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_pdmatgen.c–
○
▼ HPL_setran.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_dlaswp01N.c–
○
Source Object Issue
▼ xhpl–
▼ HPL_lmul.c–
○
▼ HPL_rand.c–
○
▼ HPL_dlaswp02N.c–
○
▼ HPL_bcast.c–
○
▼ HPL_dlacpy.c–
○
▼ HPL_dlaswp04N.c–
○
▼ HPL_1ring.c–
○
▼ HPL_pdmatgen.c–
○
▼ HPL_setran.c–
○
▼ HPL_pdlange.c–
○
▼ HPL_ladd.c–
○
▼ HPL_dlaswp03N.c–
○
▼ HPL_dlaswp01N.c–
○
r0 r1 r2 r3 r4 r5
Experiment Name
Application ./bin/Linux_AArch64/xhpl same as r0 same as r0 same as r0 same as r0 same as r0
Timestamp 2025-06-23 12:25:04 same as r0 same as r0 same as r0 same as r0 same as r0
Experiment Type MPI; MPI; OpenMP; same as r1 same as r1 same as r1 same as r1
Machine ip-172-31-47-249.ec2.internal same as r0 same as r0 same as r0 same as r0 same as r0
Architecture aarch64 same as r0 same as r0 same as r0 same as r0 same as r0
Micro Architecture ARM_NEOVERSE_V2 same as r0 same as r0 same as r0 same as r0 same as r0
Model Name
Cache Size
Number of Cores
Maximal Frequency 0 GHz same as r0 same as r0 same as r0 same as r0 same as r0
OS Version Linux 6.1.109-118.189.amzn2023.aarch64 #1 SMP Tue Sep 10 08:58:40 UTC 2024 same as r0 same as r0 same as r0 same as r0 same as r0
Architecture used during static analysis aarch64 same as r0 same as r0 same as r0 same as r0 same as r0
Micro Architecture used during static analysis ARM_NEOVERSE_V2 same as r0 same as r0 same as r0 same as r0 same as r0
Compilation Options
xhpl : Arm C/C++/Fortran Compiler version 24.10.1 (build number 4) (based on LLVM 19.1.0) /opt/arm/arm-linux-compiler-24.10.1_AmazonLinux-2023/llvm-bin/clang-19 -o HPL_ladd.o -c -D Add__ -D F77_INTEGER=int -D StringSunStyle -D HPL_DETAILED_TIMING -D HPL_PROGRESS_REPORT -I /home/eoseret/hpl-2.3/include -I /home/eoseret/hpl-2.3/include/Linux_AArch64 -I /opt/arm/armpl-24.10.1_AmazonLinux-2023_arm-linux-compiler/include -fopenmp -O3 -ffast-math -g -grecord-command-line -mcpu=native -Wall ../HPL_ladd.c -I /home/eoseret/openmpi_acfl2410/include same as r0 same as r0 same as r0 same as r0 same as r0
Number of processes observed 4 same as r0 same as r0 same as r0 same as r0 same as r0
Number of threads observed 4 8 16 32 64 96
Frequency Driver NA same as r0 same as r0 same as r0 same as r0 same as r0
Frequency Governor NA same as r0 same as r0 same as r0 same as r0 same as r0
Huge Pages madvise same as r0 same as r0 same as r0 same as r0 same as r0
Hyperthreading off same as r0 same as r0 same as r0 same as r0 same as r0
Number of sockets 1 same as r0 same as r0 same as r0 same as r0 same as r0
Number of cores per socket 96 same as r0 same as r0 same as r0 same as r0 same as r0
MAQAO version 2025.1.0 same as r0 same as r0 same as r0 same as r0 same as r0
MAQAO build b107544c0173fc3785aa7d997ff783dc12b975d2::20250527-133805 same as r0 same as r0 same as r0 same as r0 same as r0
Comments HPL benchmark compiled with ARM ACfL/Armpl 24.10. Matrix order: 100K, block size 384. Run on AWS Graviton 4 with 1 NUMA node and 96 cores. Using 4 MPI ranks to limit multithreading overhead same as r0 same as r0 same as r0 same as r0 same as r0