Loops
kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c: 131 - 172.08 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2582 | 0.16 | 0.33 | 35.62 | 28.14 | 55.37 | 2075 | 0.14 | 0.87 | 69.63 | 28.14 | 55.37 | 2079 | 0.15 | 0.83 | 66.84 | 28.14 | 55.37 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2582) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2075) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2079) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
vec.cpp: 385 - 2.92 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1015 | 0.01 | 0.01 | 0.98 | 68.18 | 82.24 | 767 | 0.01 | 0.01 | 1.03 | 80 | 97.68 | 790 | 0.01 | 0.01 | 0.91 | 90 | 98.67 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1015) | Sum on 1 analyzed binary loop (libggml-cpu.so - 767) | Sum on 1 analyzed binary loop (libggml-cpu.so - 790) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | ||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | ||||||||||||||
kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96 - 1.20 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2545 | 0.01 | 0.00 | 0.17 | 77.23 | 96.44 | 2043 | 0.01 | 0.01 | 0.58 | 69.7 | 97.1 | 2046 | 0.01 | 0.01 | 0.45 | 69.7 | 97.1 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2043) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2046) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||
quants.c: 2683 - 0.92 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2462 | 0.00 | 0.00 | 0.02 | 75.14 | 82.66 | 1953 | 0.00 | 0.00 | 0.09 | 59.53 | 80.22 | 1954 | 0.01 | 0.01 | 0.82 | 69.54 | 84.89 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1954) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Control Flow Issues | |||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||
| Data Access Issues | |||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
ggml-cpu.c: 3228 - 0.55 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00 | 0.00 | 0.06 | 92.5 | 98.75 | 6 | 0.01 | 0.00 | 0.30 | 0 | 18.47 | 1 | 0.01 | 0.00 | 0.18 | 72.6 | 83.56 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
vec.h: 411 - 0.44 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1900 | 0.01 | 0.00 | 0.08 | 100 | 100 | 1454 | 0.01 | 0.00 | 0.24 | 100 | 100 | 1501 | 0.01 | 0.00 | 0.12 | 100 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
vec.cpp: 231 - 0.41 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1007 | 0.01 | 0.00 | 0.08 | 100 | 100 | 764 | 0.01 | 0.00 | 0.12 | 100 | 100 | 788 | 0.01 | 0.00 | 0.21 | 100 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 4325 - 0.34 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1398 | 0.01 | 0.00 | 0.18 | 96.97 | 98.48 | 1127 | 0.00 | 0.00 | 0.03 | 0 | 26.56 | 1159 | 0.00 | 0.00 | 0.12 | 17.39 | 56.52 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1398) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | |||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||
binary-ops.cpp: 18 - 0.30 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 551 | 0.01 | 0.00 | 0.06 | 0 | 23.68 | 495 | 0.00 | 0.00 | 0.15 | 25 | 100 | 501 | 0.00 | 0.00 | 0.09 | 25 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
binary-ops.cpp: 10 - 0.26 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 434 | 0.01 | 0.00 | 0.11 | 0 | 25 | 411 | 0.01 | 0.00 | 0.09 | 25 | 100 | 413 | 0.01 | 0.00 | 0.06 | 25 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 6446 - 0.21 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1570 | 0.00 | 0.00 | 0.03 | 0 | 25 | 800 | 0.01 | 0.00 | 0.03 | 37.5 | 81.25 | 839 | 0.01 | 0.00 | 0.15 | 37.5 | 81.25 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
vec.h: 646 - 0.09 %
| Run orig_default | Run gcc_default | Run gcc_2 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1401 | 0.01 | 0.00 | 0.03 | 100 | 100 | 1126 | 0.00 | 0.00 | 0.03 | 100 | 100 | 1161 | 0.00 | 0.00 | 0.03 | 100 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||

