Loops
kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c: 131 - 244.39 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2426 | 0.74 | 1.28 | 72.28 | 28.14 | 27.69 | 2089 | 0.00 | 0.00 | 0.04 | 32.79 | 30.39 | 1879 | 0.75 | 1.89 | 84.96 | 28.14 | 27.69 |
| 2092 | 0.77 | 1.74 | 87.05 | 28.14 | 27.69 | 1876 | 0.00 | 0.00 | 0.06 | 32.79 | 30.39 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2426) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2092) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1879) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
vec.cpp: 385 - 2.92 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 908 | 0.02 | 0.02 | 1.02 | 68.18 | 79.76 | 766 | 0.01 | 0.02 | 0.89 | 80 | 97.59 | 700 | 0.01 | 0.02 | 1.01 | 88.89 | 97.32 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 908) | Sum on 1 analyzed binary loop (libggml-cpu.so - 766) | Sum on 1 analyzed binary loop (libggml-cpu.so - 700) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96 - 1.39 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2389 | 0.01 | 0.01 | 0.45 | 77.23 | 48.22 | 2060 | 0.01 | 0.01 | 0.47 | 69.7 | 48.55 | 1849 | 0.03 | 0.01 | 0.46 | 69.7 | 48.55 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2389) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2060) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||
kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0.c: 127 - 1.11 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2409 | 0.29 | 0.01 | 0.45 | 0 | 25 | 2073 | 0.02 | 0.00 | 0.05 | 0 | 21.88 | 1857 | 0.26 | 0.01 | 0.49 | 0 | 14 |
| 2074 | 0.01 | 0.00 | 0.03 | 0 | 0 | 1858 | 0.05 | 0.00 | 0.10 | 0 | 0 | ||||||
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2409) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1857) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Control Flow Issues | Control Flow Issues | ||||||||||||||||
| Presence of calls | Presence of calls | 1 | |||||||||||||||
| Presence of more than 4 paths | Presence of more than 4 paths | 1 | |||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | |||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||
| Presence of calls | 0 | Presence of calls | 1 | ||||||||||||||
| Presence of more than 4 paths | 0 | Presence of more than 4 paths | 1 | ||||||||||||||
| Presence of indirect access | 1 | Presence of indirect access | 0 | ||||||||||||||
ops.cpp: 4325 - 0.90 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1278 | 0.01 | 0.00 | 0.13 | 94.12 | 95.59 | 1132 | 0.01 | 0.01 | 0.45 | 0 | 13.28 | 1046 | 0.01 | 0.01 | 0.32 | 0 | 13.28 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 1132) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | |||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||
ggml-cpu.c: 3228 - 0.84 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.01 | 0.00 | 0.15 | 86.36 | 95.61 | 4 | 0.01 | 0.00 | 0.20 | 93.18 | 94.03 | 5 | 0.01 | 0.01 | 0.49 | 0 | 9.16 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 5) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | |||||||||||||||||
| Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||
| Control Flow Issues | |||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||
| Data Access Issues | |||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||
| Presence of 2 to 4 paths | 1 | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
binary-ops.cpp: 10 - 0.77 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 433 | 0.00 | 0.00 | 0.17 | 0 | 12.5 | 410 | 0.01 | 0.01 | 0.37 | 25 | 50 | 400 | 0.01 | 0.01 | 0.23 | 0 | 9.38 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 433) | Sum on 1 analyzed binary loop (libggml-cpu.so - 410) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||
binary-ops.cpp: 18 - 0.66 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 538 | 0.01 | 0.00 | 0.14 | 0 | 11.84 | 494 | 0.01 | 0.00 | 0.22 | 25 | 50 | 462 | 0.01 | 0.01 | 0.30 | 0 | 9.38 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
quants.c: 2506 - 0.65 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2308 | 0.00 | 0.01 | 0.44 | 50.56 | 33.98 | 1968 | 0.00 | 0.00 | 0.08 | 49.18 | 36.04 | 1757 | 0.00 | 0.00 | 0.12 | 48.37 | 35.87 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2308) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Data Access Issues | |||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
vec.h: 411 - 0.54 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1753 | 0.01 | 0.00 | 0.13 | 100 | 100 | 1470 | 0.01 | 0.00 | 0.12 | 100 | 100 | 1326 | 0.01 | 0.01 | 0.29 | 100 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
vec.cpp: 231 - 0.29 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 901 | 0.01 | 0.00 | 0.08 | 96 | 97 | 763 | 0.01 | 0.00 | 0.08 | 100 | 100 | 697 | 0.01 | 0.00 | 0.13 | 100 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
ops.cpp: 6446 - 0.23 %
| Run orig_default | Run gcc_default | Run gcc_6 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1437 | 0.00 | 0.00 | 0.11 | 0 | 12.5 | 799 | 0.00 | 0.00 | 0.02 | 37.5 | 40.63 | 730 | 0.00 | 0.00 | 0.10 | 0 | 9.46 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||

