Loops
quants.c: 910 - 13.95 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2293 | 0.10 | 0.07 | 0.86 | 10.53 | 24.67 | 1955 | 0.09 | 0.08 | 4.68 | 16.22 | 25.68 | 1974 | 0.09 | 4.01 | 8.41 | 16.22 | 25.68 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2293) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1955) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1974) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||
| Presence of a large number of scalar integer instructions | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | |||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
binary-ops.cpp: 10 - 0.22 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 433 | 0.01 | 0.00 | 0.01 | 0 | 12.5 | 410 | 0.01 | 0.00 | 0.07 | 25 | 50 | 416 | 0.01 | 0.06 | 0.13 | 25 | 50 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 433) | Sum on 1 analyzed binary loop (libggml-cpu.so - 410) | Sum on 1 analyzed binary loop (libggml-cpu.so - 416) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||
ops.cpp: 4325 - 0.16 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1278 | 0.01 | 0.00 | 0.02 | 94.12 | 95.59 | 1132 | 0.01 | 0.00 | 0.07 | 0 | 13.28 | 1170 | 0.01 | 0.03 | 0.07 | 17.39 | 28.26 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1278) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1132) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1170) | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | |||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | |||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | |||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | |||||||||||||
ggml-cpu.c: 3228 - 0.10 %
| Run orig_default | Run gcc_default | Run gcc_4 | |||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.01 | 0.00 | 0.01 | 86.36 | 95.61 | 4 | 0.01 | 0.00 | 0.02 | 93.18 | 94.03 | 1 | 0.01 | 0.03 | 0.06 | 72.6 | 41.78 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 0) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | |||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||
| Loop Computation Issues | |||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||
| Data Access Issues | |||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | |||||||||||||||||
| Presence of constant non-unit stride data access | 1 | ||||||||||||||||

