Help is available by moving the cursor above any symbol or by checking MAQAO website.
▶Filter Information
63 threads covering less than 1% of profiled time ( = Max (Thread Active Time)) were discarded, cumulating 18.01 seconds CPU time. You can adjust the threshold below which a thread will be discarded with the thread-filter-threshold option.
Global Metrics
Total Time (s)
192.86
Max (Thread Active Time) (s)
179.19
Average Active Time (s)
179.19
Activity Ratio (%)
92.9
Average number of active threads
0.929
Affinity Stability (%)
100.0
Time in analyzed loops (%)
5.49
Time in analyzed innermost loops (%)
4.81
Time in user code (%)
5.69
Compilation Options Score (%)
99.7
Array Access Efficiency (%)
69.9
Potential Speedups
Perfect Flow Complexity
1.00
Perfect OpenMP/MPI/Pthread/TBB
1.00
Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution
1.00
No Scalar Integer
Potential Speedup
1.01
Nb Loops to get 80%
4
FP Vectorised
Potential Speedup
1.01
Nb Loops to get 80%
4
Fully Vectorised
Potential Speedup
1.04
Nb Loops to get 80%
5
FP Arithmetic Only
Potential Speedup
1.03
Nb Loops to get 80%
6
CQA Potential Speedups Summary
Average Active Threads Count⏎
Loop Based Profile⏎
Innermost Loop Based Profile⏎
Application Categorization⏎
Compilation Options⏎
Source Object
Issue
▼libllama.so–
○hashtable.h
○llama-vocab.cpp
▼libggml-cpu.so–
○binary-ops.cpp
○ops.cpp
○vec.cpp
○ggml-cpu.c
○quants.c
▼libggml-base.so–
▼–
○
-g is missing for some functions (possibly ones added by the compiler), it is needed to have more accurate reports. Other recommended flags are: -O2/-O3, -march=(target)