Theme: MAQAO_theme darkgrey cyan Chart Theme: Default_Chart_Theme grey
Help is available by moving the cursor above any symbol or by checking MAQAO website .
There is no filter information to display
Total Time (s) 6.34
Max (Thread Active Time) (s) 4.48
Average Active Time (s) 3.21
Activity Ratio (%) 84.9
Average number of active threads 32.353
Affinity Stability (%) 87.4
Time in analyzed loops (%) 95.7
Time in analyzed innermost loops (%) 93.4
Time in user code (%) 96.0
Compilation Options Score (%) 75.0
Array Access Efficiency (%) 74.7
Potential Speedups
Perfect Flow Complexity 1.01
Perfect OpenMP/MPI/Pthread/TBB 1.02
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution 1.44
No Scalar Integer Potential Speedup 1.02
Nb Loops to get 80% 3
FP Vectorised Potential Speedup 1.17
Nb Loops to get 80% 1
Fully Vectorised Potential Speedup 3.64
Nb Loops to get 80% 1
FP Arithmetic Only Potential Speedup 3.87
Nb Loops to get 80% 1
Source Object Issue
▼ libllama.so–
○ hashtable.h-funroll-loops is missing.
▼ libggml-cpu.so–
○ vec.cpp-funroll-loops is missing.
○ traits.cpp-funroll-loops is missing.
○ kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0.c-funroll-loops is missing.
○ kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c-funroll-loops is missing.
○ kleidiai.cpp-funroll-loops is missing.
○ ops.cpp-funroll-loops is missing.
○ kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c-funroll-loops is missing.
○ binary-ops.cpp-funroll-loops is missing.
○ ggml-cpu.c-funroll-loops is missing.
○ quants.c-funroll-loops is missing.
▼ libggml-base.so–
▼ –
○ -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○ -O2, -O3 or -Ofast is missing.
○ -mcpu=native is missing.
Application /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-398-0068/llama.cpp/run/base_runs/defaults/gcc/exec
Timestamp 2025-11-24 10:30:56
Universal Timestamp 1763980256
Number of processes observed 1
Number of threads observed 64
Experiment Type MPI; OpenMP;
Machine ip-172-31-46-37.ec2.internal
Architecture aarch64
Micro Architecture ARM_NEOVERSE_V1
OS Version Linux 6.1.158-178.288.amzn2023.aarch64 #1 SMP Mon Nov 3 18:38:05 UTC 2025
Architecture used during static analysis aarch64
Micro Architecture used during static analysis ARM_NEOVERSE_V1
Frequency Driver NA
Frequency Governor NA
Huge Pages madvise
Hyperthreading off
Number of sockets 1
Number of cores per socket 64
Compilation Options libggml-base.so : N/A libggml-cpu.so : GNU C11 11.5.0 20240719 (Red Hat 11.5.0-5) -mcpu=zeus+crypto+sha3+sm4+noprofile+nossbs+dotprod+i8mm+sve -mlittle-endian -mabi=lp64 -g -O3 -O3 -std=gnu11 -fno-omit-frame-pointer -fcf-protection=none -fPIC -fopenmp libllama.so : GNU C++17 11.5.0 20240719 (Red Hat 11.5.0-5) -march=armv8.2-a+crypto -mtune=neoverse-n1 -mlittle-endian -mabi=lp64 -g -O3 -O3 -fno-omit-frame-pointer -fcf-protection=none -fPIC
Dataset
Run Command <executable> -m meta-llama-3.1-8b-instruct-Q4_0.gguf -t 64 -b 2048 -ub 512 -npp 128 -ntg 0 -npl 16 -c 16384 --seed 0 --output-format jsonl
MPI Command mpirun -n <number_processes> --bind-to none --report-bindings /usr/bin/numactl -C 0-63
Number Processes 1
Number Nodes 1
Filter Not Used
Profile Start Not Used
Profile Stop Not Used