Loops
kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c: 131 - 170.74 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2426 | 0.03 | 0.02 | 0.43 | 28.14 | 27.69 | 2089 | 2.89 | 3.45 | 56.67 | 32.79 | 30.39 | 2101 | 3.00 | 3.58 | 57.08 | 32.79 | 30.39 |
| 2423 | 2.93 | 3.26 | 55.67 | 32.79 | 30.39 | 2092 | 0.04 | 0.02 | 0.40 | 28.14 | 27.69 | 2104 | 0.04 | 0.03 | 0.48 | 28.14 | 27.69 |
| Sum on 2 analyzed binary loops (libggml-cpu.so - 2426, libggml-cpu.so - 2423) | Sum on 2 analyzed binary loops (libggml-cpu.so - 2089, libggml-cpu.so - 2092) | Sum on 2 analyzed binary loops (libggml-cpu.so - 2101, libggml-cpu.so - 2104) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
quants.c: 2506 - 26.92 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2308 | 0.50 | 0.55 | 9.33 | 50.56 | 33.98 | 1968 | 0.48 | 0.55 | 9.08 | 49.18 | 36.04 | 1986 | 0.46 | 0.53 | 8.51 | 47.34 | 35.41 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2308) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1968) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1986) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96 - 0.38 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2389 | 0.22 | 0.01 | 0.13 | 77.23 | 48.22 | 2060 | 0.23 | 0.01 | 0.14 | 69.7 | 48.55 | 2073 | 0.17 | 0.01 | 0.11 | 69.7 | 48.55 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ggml-cpu.c: 3228 - 0.20 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.02 | 0.00 | 0.06 | 86.36 | 95.61 | 4 | 0.02 | 0.00 | 0.07 | 93.18 | 94.03 | 1 | 0.02 | 0.00 | 0.07 | 72.6 | 41.78 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 4325 - 0.19 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1278 | 0.04 | 0.00 | 0.07 | 94.12 | 95.59 | 1132 | 0.05 | 0.01 | 0.09 | 0 | 13.28 | 1170 | 0.02 | 0.00 | 0.03 | 17.39 | 28.26 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
binary-ops.cpp: 10 - 0.15 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 433 | 0.05 | 0.00 | 0.07 | 0 | 12.5 | 410 | 0.03 | 0.00 | 0.04 | 25 | 50 | 416 | 0.04 | 0.00 | 0.05 | 25 | 50 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 6446 - 0.13 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1437 | 0.01 | 0.00 | 0.06 | 0 | 12.5 | 799 | 0.01 | 0.00 | 0.03 | 37.5 | 40.63 | 848 | 0.01 | 0.00 | 0.03 | 37.5 | 40.63 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
binary-ops.cpp: 18 - 0.09 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 538 | 0.03 | 0.00 | 0.04 | 0 | 11.84 | 494 | 0.03 | 0.00 | 0.02 | 25 | 50 | 502 | 0.02 | 0.00 | 0.02 | 25 | 50 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ggml-cpu.c: 1125 - 0.09 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 65 | 0.01 | 0.00 | 0.03 | 0 | 23.77 | 48 | 0.01 | 0.00 | 0.03 | 0 | 24.36 | 55 | 0.01 | 0.00 | 0.02 | 0 | 23.99 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||

