| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-948
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-910
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 928-929
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 13617-13617
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16103-16103
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 20765-20765
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 30781-30781
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 31056-31056
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 34664-34664
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-910
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 928-928
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 13617-13617
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16065-16065
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16103-16103
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 20765-20765
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 30781-30781
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 31056-31056
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 34664-34664
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2293 | 0.10 | 0.08 | 0.83 | 10.53 | 24.67 | 1955 | 0.09 | 0.07 | 2.50 | 16.22 | 25.68 | 1929 | 0.10 | 4.62 | 5.14 | 11.11 | 26.69 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2293) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1955) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1929) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1008-1034
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 908 | 0.02 | 0.01 | 0.05 | 68.18 | 79.76 | 766 | 0.02 | 0.01 | 0.33 | 80 | 97.59 | 779 | 0.02 | 0.38 | 0.42 | 90 | 98.62 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 908) | Sum on 1 analyzed binary loop (libggml-cpu.so - 766) | Sum on 1 analyzed binary loop (libggml-cpu.so - 779) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6210-6211
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6230
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6413-6413
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6211-6211
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6223-6231
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 789 | 0.04 | 0.01 | 0.41 | 2.33 | 10.1 | 794 | 0.03 | 0.26 | 0.29 | 0 | 12.76 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 789) | Sum on 1 analyzed binary loop (libggml-cpu.so - 794) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | Loop Computation Issues | |
| | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| | Control Flow Issues | | Control Flow Issues | |
| | Presence of calls | 1 | Presence of calls | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | Presence of calls | 1 | Presence of calls | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 818 | 0.03 | 0.52 | 0.58 | 0 | 14.58 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 818) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | |
| | | | Presence of expensive FP instructions | 1 |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | Control Flow Issues | |
| | | | Presence of calls | 1 |
| | | | Data Access Issues | |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Presence of indirect access | 1 |
| | | | Vectorization Roadblocks | |
| | | | Presence of calls | 1 |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Presence of indirect access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 433 | 0.02 | 0.00 | 0.04 | 0 | 12.5 | 410 | 0.02 | 0.00 | 0.13 | 25 | 50 | 412 | 0.02 | 0.12 | 0.13 | 25 | 50 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 433) | Sum on 1 analyzed binary loop (libggml-cpu.so - 410) | Sum on 1 analyzed binary loop (libggml-cpu.so - 412) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1278 | 0.02 | 0.00 | 0.03 | 94.12 | 95.59 | 1132 | 0.02 | 0.00 | 0.10 | 0 | 13.28 | 1153 | 0.01 | 0.12 | 0.14 | 90.91 | 75.04 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1278) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1132) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1153) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 538 | 0.01 | 0.00 | 0.03 | 0 | 11.84 | 494 | 0.02 | 0.00 | 0.08 | 25 | 50 | 498 | 0.01 | 0.11 | 0.12 | 25 | 50 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 538) | Sum on 1 analyzed binary loop (libggml-cpu.so - 494) | Sum on 1 analyzed binary loop (libggml-cpu.so - 498) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 0 | 0.01 | 0.00 | 0.01 | 86.36 | 95.61 | 4 | 0.01 | 0.00 | 0.09 | 93.18 | 94.03 | 0 | 0.01 | 0.09 | 0.10 | 92.11 | 93.09 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 0) | Sum on 1 analyzed binary loop (libggml-cpu.so - 4) | Sum on 1 analyzed binary loop (libggml-cpu.so - 0) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2483 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1648 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4199 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2473 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3728 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3796 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2634 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4140 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4388 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2650 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3907 | 0.00 | 0.00 | 0.00 | 0 | 0 | 53 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2649 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3779 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1905 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2782 | 0.00 | 0.00 | 0.00 | 0 | 0 | 60 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1491 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2480 | 0.00 | 0.00 | 0.00 | 0 | 0 | 56 | 0.00 | 0.00 | 0.01 | 0 | 0 | 61 | 0.00 | 0.01 | 0.01 | 0 | 0 |
| 2767 | 0.02 | 0.00 | 0.00 | 0 | 0 | 1932 | 0.01 | 0.00 | 0.01 | 0 | 0 | |
| 2886 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1466 | 0.00 | 0.00 | 0.00 | 0 | 0 | |
| 1282 | 0.00 | 0.00 | 0.00 | 0 | 0 | 368 | 0.00 | 0.00 | 0.01 | 0 | 0 | |
| 2268 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1131 | 0.01 | 0.00 | 0.01 | 0 | 0 | |
| 901 | 0.01 | 0.00 | 0.01 | 0 | 0 | 19 | 0.00 | 0.00 | 0.00 | 0 | 0 | |
| 1754 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| 396 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 1437 | 0.00 | 0.00 | 0.01 | 0 | 0 | | |
| 66 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 314 | 0.02 | 0.00 | 0.00 | 0 | 0 | | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6220
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6229-6230
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
| Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1432 | 0.03 | 0.01 | 0.13 | 0 | 13.17 | | |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1432) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | | | |
| Presence of expensive FP instructions | 1 | | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | |
| Control Flow Issues | | | | | |
| Presence of calls | 1 | | | | |
| Data Access Issues | | | | | |
| Presence of constant non-unit stride data access | 1 | | | | |
| Vectorization Roadblocks | | | | | |
| Presence of calls | 1 | | | | |
| Presence of constant non-unit stride data access | 1 | | | | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1753 | 0.01 | 0.00 | 0.01 | 100 | 100 | 1470 | 0.01 | 0.00 | 0.06 | 100 | 100 | |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1753) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1470) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Data Access Issues | | Data Access Issues | | | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 231-262
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 763 | 0.01 | 0.00 | 0.05 | 100 | 100 | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /usr/include/c++/11/bits/hashtable_policy.h: 287-287
- /usr/include/c++/11/bits/hashtable_policy.h: 434-434
- /usr/include/c++/11/bits/hashtable.h: 2386-2391
- /usr/include/c++/11/bits/hashtable.h: 2402-2403
| Loop Source Regions | - /usr/include/c++/11/bits/hashtable_policy.h: 287-287
- /usr/include/c++/11/bits/hashtable_policy.h: 434-434
- /usr/include/c++/11/bits/hashtable.h: 2386-2391
- /usr/include/c++/11/bits/hashtable.h: 2402-2403
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 4180 | 0.02 | 0.00 | 0.01 | 0 | 25 | 4246 | 0.02 | 0.02 | 0.02 | 0 | 25 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6446-6447
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6453-6456
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6446-6447
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6453-6456
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 799 | 0.01 | 0.00 | 0.01 | 37.5 | 40.63 | 828 | 0.00 | 0.02 | 0.02 | 42.55 | 63.3 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 646-653
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 1156 | 0.01 | 0.01 | 0.01 | 100 | 100 |
| | 1489 | 0.00 | 0.01 | 0.01 | 100 | 100 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/src/llama-vocab.cpp: 3216-3216
- /usr/include/c++/11/bits/hashtable_policy.h: 287-287
- /usr/include/c++/11/bits/hashtable_policy.h: 1916-1916
- /usr/include/c++/11/ext/new_allocator.h: 145-145
- /usr/include/c++/11/bits/basic_string.h: 195-195
- /usr/include/c++/11/bits/basic_string.h: 211-211
- /usr/include/c++/11/bits/basic_string.h: 239-239
- /usr/include/c++/11/bits/basic_string.h: 245-245
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/src/llama-vocab.cpp: 3216-3216
- /usr/include/c++/11/bits/hashtable_policy.h: 287-287
- /usr/include/c++/11/bits/hashtable_policy.h: 1916-1916
- /usr/include/c++/11/ext/new_allocator.h: 145-145
- /usr/include/c++/11/bits/basic_string.h: 195-195
- /usr/include/c++/11/bits/basic_string.h: 211-211
- /usr/include/c++/11/bits/basic_string.h: 239-239
- /usr/include/c++/11/bits/basic_string.h: 245-245
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 3732 | 0.02 | 0.00 | 0.01 | 0 | 25 | 3800 | 0.01 | 0.01 | 0.01 | 0 | 25 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-impl.h: 354-361
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-impl.h: 369-377
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-quants.c: 408-412
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 584 | 0.02 | 0.02 | 0.02 | 39.13 | 29.21 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-409-8358/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 8825-8826
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1463 | 0.01 | 0.00 | 0.02 | 0 | 0 | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |