Theme: MAQAO_theme darkgrey cyan Chart Theme: Default_Chart_Theme grey
Help is available by moving the cursor above any symbol or by checking MAQAO website .
There is no filter information to display
Total Time (s) 6.42
Max (Thread Active Time) (s) 4.51
Average Active Time (s) 3.20
Activity Ratio (%) 84.7
Average number of active threads 31.947
Affinity Stability (%) 87.1
Time in analyzed loops (%) 95.4
Time in analyzed innermost loops (%) 92.9
Time in user code (%) 95.8
Compilation Options Score (%) 100.0
Array Access Efficiency (%) 74.6
Potential Speedups
Perfect Flow Complexity 1.00
Perfect OpenMP/MPI/Pthread/TBB 1.02
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution 1.45
No Scalar Integer Potential Speedup 1.01
Nb Loops to get 80% 4
FP Vectorised Potential Speedup 1.17
Nb Loops to get 80% 1
Fully Vectorised Potential Speedup 3.56
Nb Loops to get 80% 1
FP Arithmetic Only Potential Speedup 3.87
Nb Loops to get 80% 1
Source Object Issue
▼ libllama.so–
○ hashtable.h
▼ libggml-cpu.so–
○ binary-ops.cpp
○ traits.cpp
○ kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0.c
○ kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c
○ ops.cpp
○ kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c
○ vec.cpp
○ ggml-cpu.c
○ quants.c
▼ libggml-base.so–
▼ –
○ -g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)
○ -O2, -O3 or -Ofast is missing.
○ -mcpu=native is missing.
Application /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-398-0068/llama.cpp/run/binaries/gcc_2/exec
Timestamp 2025-11-24 10:38:34
Universal Timestamp 1763980714
Number of processes observed 1
Number of threads observed 64
Experiment Type MPI; OpenMP;
Machine ip-172-31-46-37.ec2.internal
Architecture aarch64
Micro Architecture ARM_NEOVERSE_V1
OS Version Linux 6.1.158-178.288.amzn2023.aarch64 #1 SMP Mon Nov 3 18:38:05 UTC 2025
Architecture used during static analysis aarch64
Micro Architecture used during static analysis ARM_NEOVERSE_V1
Frequency Driver NA
Frequency Governor NA
Huge Pages madvise
Hyperthreading off
Number of sockets 1
Number of cores per socket 64
Compilation Options libggml-base.so : N/A libggml-cpu.so : GNU C11 11.5.0 20240719 (Red Hat 11.5.0-5) -mcpu=neoverse-v1 -msve-vector-bits=256 -mlittle-endian -mabi=lp64 -g -O3 -O3 -O3 -std=gnu11 -funroll-loops -ffast-math -fno-omit-frame-pointer -fcf-protection=none -fno-finite-math-only -fPIC -fopenmp libllama.so : GNU C++17 11.5.0 20240719 (Red Hat 11.5.0-5) -mcpu=neoverse-v1 -msve-vector-bits=256 -mlittle-endian -mabi=lp64 -g -O3 -O3 -O3 -funroll-loops -ffast-math -fno-omit-frame-pointer -fcf-protection=none -fno-finite-math-only -fPIC
Dataset
Run Command <executable> -m meta-llama-3.1-8b-instruct-Q4_0.gguf -t 64 -b 2048 -ub 512 -npp 128 -ntg 0 -npl 16 -c 16384 --seed 0 --output-format jsonl
MPI Command mpirun -n <number_processes> --bind-to none --report-bindings /usr/bin/numactl -C 0-63
Number Processes 1
Number Nodes 1
Filter Not Used
Profile Start Not Used
Profile Stop Not Used