Loops
kai_matmul_clamp_f32_qsi8d32p1x8_qsi4c32p4x8_1x4x32_neon_dotprod.c: 128 - 171.74 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2411 | 1.92 | 2.42 | 56.56 | 57.14 | 39.64 | 2077 | 1.95 | 2.48 | 56.99 | 57.14 | 39.64 | 2089 | 1.96 | 2.51 | 58.20 | 57.14 | 39.64 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2411) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2077) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2089) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c: 131 - 2.37 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2426 | 0.04 | 0.04 | 0.83 | 28.14 | 27.69 | 2092 | 0.04 | 0.04 | 0.84 | 28.14 | 27.69 | 2104 | 0.04 | 0.03 | 0.70 | 28.14 | 27.69 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2426) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2092) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2104) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
kai_lhs_quant_pack_qsi8d32p_f32.c: 87 - 0.21 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2376 | 0.08 | 0.00 | 0.04 | 0 | 9.82 | 2051 | 0.12 | 0.00 | 0.06 | 0 | 9.38 | 2062 | 0.22 | 0.00 | 0.11 | 0 | 9.38 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2062) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | |||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||
| Control Flow Issues | |||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||
| Presence of more than 4 paths | 1 | ||||||||||||||||
ggml-cpu.c: 1125 - 0.06 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 65 | 0.01 | 0.00 | 0.03 | 0 | 23.77 | 48 | 0.01 | 0.00 | 0.02 | 0 | 24.36 | 55 | 0.01 | 0.00 | 0.01 | 0 | 23.99 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 6446 - 0.04 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1437 | 0.01 | 0.00 | 0.01 | 0 | 12.5 | 799 | 0.01 | 0.00 | 0.01 | 37.5 | 40.63 | 848 | 0.01 | 0.00 | 0.01 | 37.5 | 40.63 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||

