Loops
kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c: 131 - 128.87 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2582 | 0.03 | 0.02 | 0.77 | 28.14 | 55.37 | 2072 | 1.20 | 1.59 | 43.18 | 32.79 | 60.77 | 2084 | 1.23 | 1.57 | 42.41 | 32.79 | 60.77 |
| 2579 | 1.26 | 1.29 | 41.25 | 32.79 | 60.77 | 2075 | 0.03 | 0.02 | 0.61 | 28.14 | 55.37 | 2087 | 0.03 | 0.02 | 0.64 | 28.14 | 55.37 |
| Sum on 2 analyzed binary loops (libggml-cpu.so - 2582, libggml-cpu.so - 2579) | Sum on 2 analyzed binary loops (libggml-cpu.so - 2072, libggml-cpu.so - 2075) | Sum on 2 analyzed binary loops (libggml-cpu.so - 2084, libggml-cpu.so - 2087) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
quants.c: 2506 - 11.81 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2463 | 0.14 | 0.13 | 4.07 | 50 | 65.41 | 1951 | 0.14 | 0.15 | 4.14 | 48.91 | 69.52 | 1960 | 0.17 | 0.13 | 3.60 | 48.65 | 69.18 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2463) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1951) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1960) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
vec.cpp: 231 - 0.56 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1007 | 0.02 | 0.00 | 0.15 | 100 | 100 | 764 | 0.02 | 0.01 | 0.21 | 100 | 100 | 790 | 0.03 | 0.01 | 0.20 | 100 | 100 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1007) | Sum on 1 analyzed binary loop (libggml-cpu.so - 764) | Sum on 1 analyzed binary loop (libggml-cpu.so - 790) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
vec.h: 411 - 0.51 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1900 | 0.03 | 0.01 | 0.18 | 100 | 100 | 1454 | 0.02 | 0.01 | 0.16 | 100 | 100 | 1508 | 0.02 | 0.01 | 0.18 | 100 | 100 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1900) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1454) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1508) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
vec.cpp: 385 - 0.35 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1015 | 0.28 | 0.01 | 0.20 | 68.18 | 82.24 | 767 | 0.09 | 0.00 | 0.08 | 80 | 97.68 | 793 | 0.09 | 0.00 | 0.08 | 88.89 | 97.42 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1015) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | |||||||||||||||||
| Presence of expensive FP instructions | 1 | ||||||||||||||||
kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 271 - 0.16 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2543 | 0.11 | 0.00 | 0.04 | 64.41 | 84.59 | 2040 | 0.15 | 0.00 | 0.06 | 59.85 | 86.96 | 2051 | 0.13 | 0.00 | 0.05 | 59.85 | 86.96 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ggml-cpu.c: 1125 - 0.10 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 77 | 0.01 | 0.00 | 0.04 | 0 | 46.79 | 51 | 0.01 | 0.00 | 0.03 | 0 | 48.72 | 55 | 0.01 | 0.00 | 0.03 | 0 | 47.98 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ggml-cpu.c: 3228 - 0.09 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.01 | 0.00 | 0.03 | 92.5 | 98.75 | 6 | 0.01 | 0.00 | 0.05 | 0 | 18.47 | 1 | 0.01 | 0.00 | 0.02 | 72.6 | 83.56 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 6446 - 0.07 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1570 | 0.01 | 0.00 | 0.03 | 0 | 25 | 800 | 0.01 | 0.00 | 0.02 | 37.5 | 81.25 | 842 | 0.01 | 0.00 | 0.03 | 37.5 | 81.25 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
binary-ops.cpp: 18 - 0.05 %
| Run orig_default | Run gcc_default | Run gcc_1 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 551 | 0.03 | 0.00 | 0.02 | 0 | 23.68 | 495 | 0.02 | 0.00 | 0.02 | 25 | 100 | 504 | 0.03 | 0.00 | 0.01 | 25 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||

