| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-948
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-910
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 928-929
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 13617-13617
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16103-16103
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 20765-20765
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 30781-30781
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 31056-31056
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 34664-34664
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-910
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 928-929
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 13617-13617
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16103-16103
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 20765-20765
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 30781-30781
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 31056-31056
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 34664-34664
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2450 | 0.04 | 0.03 | 0.11 | 10.53 | 49.34 | 1938 | 0.09 | 0.06 | 1.54 | 16.22 | 51.35 | 1940 | 0.08 | 4.20 | 2.54 | 16.22 | 51.35 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2450) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1938) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1940) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6210-6211
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6230
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6413-6413
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6211-6211
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6223-6231
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 790 | 0.04 | 0.02 | 0.48 | 2.33 | 20.2 | 805 | 0.03 | 0.56 | 0.34 | 0 | 25.52 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 790) | Sum on 1 analyzed binary loop (libggml-cpu.so - 805) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | Loop Computation Issues | |
| | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| | Control Flow Issues | | Control Flow Issues | |
| | Presence of calls | 1 | Presence of calls | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | Presence of calls | 1 | Presence of calls | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 979-1002
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 979-979
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 986-1000
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 979-979
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 986-1000
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2447 | 0.05 | 0.04 | 0.14 | 56.25 | 68.36 | 1940 | 0.08 | 0.01 | 0.18 | 58.06 | 69.56 | 1942 | 0.05 | 0.70 | 0.42 | 55.38 | 70.96 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2447) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1940) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1942) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1008-1034
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1015 | 0.06 | 0.02 | 0.08 | 68.18 | 82.24 | 767 | 0.04 | 0.01 | 0.32 | 80 | 97.68 | 790 | 0.02 | 0.47 | 0.28 | 90 | 98.67 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1015) | Sum on 1 analyzed binary loop (libggml-cpu.so - 767) | Sum on 1 analyzed binary loop (libggml-cpu.so - 790) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 829 | 0.03 | 0.88 | 0.53 | 0 | 28.37 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 829) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | |
| | | | Presence of expensive FP instructions | 1 |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | Control Flow Issues | |
| | | | Presence of calls | 1 |
| | | | Data Access Issues | |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Presence of indirect access | 1 |
| | | | Vectorization Roadblocks | |
| | | | Presence of calls | 1 |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Presence of indirect access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1398 | 0.02 | 0.00 | 0.02 | 96.97 | 98.48 | 1127 | 0.03 | 0.01 | 0.14 | 0 | 26.56 | 1159 | 0.03 | 0.46 | 0.28 | 17.39 | 56.52 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1398) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1127) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1159) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 354-354
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 6 | 0.02 | 0.00 | 0.13 | 0 | 18.47 | 1 | 0.01 | 0.19 | 0.11 | 72.6 | 83.56 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 6) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | Loop Computation Issues | |
| | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | |
| | Data Access Issues | | Data Access Issues | |
| | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 495 | 0.01 | 0.00 | 0.08 | 25 | 100 | 501 | 0.01 | 0.21 | 0.12 | 25 | 100 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 495) | Sum on 1 analyzed binary loop (libggml-cpu.so - 501) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | Loop Computation Issues | |
| | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | Data Access Issues | | Data Access Issues | |
| | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 434 | 0.03 | 0.00 | 0.01 | 0 | 25 | 411 | 0.02 | 0.00 | 0.08 | 25 | 100 | 413 | 0.01 | 0.14 | 0.09 | 25 | 100 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 434) | Sum on 1 analyzed binary loop (libggml-cpu.so - 411) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Data Access Issues | | Data Access Issues | | | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1454 | 0.02 | 0.00 | 0.07 | 100 | 100 | 1501 | 0.01 | 0.18 | 0.11 | 100 | 100 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 231-262
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 231-262
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 764 | 0.01 | 0.00 | 0.05 | 100 | 100 | 788 | 0.02 | 0.18 | 0.11 | 100 | 100 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 788) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Data Access Issues | |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Vectorization Roadblocks | |
| | | | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2483 | 0.01 | 0.00 | 0.00 | 0 | 0 | 3764 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4195 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 1151 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1648 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3220 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2457 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4472 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3799 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2634 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4127 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4538 | 0.01 | 0.01 | 0.01 | 0 | 0 |
| 2480 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3732 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1674 | 0.01 | 0.01 | 0.01 | 0 | 0 |
| 2650 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4141 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1653 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2514 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4180 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4245 | 0.01 | 0.01 | 0.01 | 0 | 0 |
| 2767 | 0.01 | 0.00 | 0.00 | 0 | 0 | 620 | 0.01 | 0.00 | 0.01 | 0 | 0 | 4198 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2649 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1915 | 0.01 | 0.00 | 0.00 | 0 | 0 | 64 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 314 | 0.03 | 0.00 | 0.00 | 0 | 0 | 59 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1916 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1024 | 0.00 | 0.00 | 0.00 | 0 | 0 | 51 | 0.01 | 0.00 | 0.01 | 0 | 0 | 63 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 0 | 0.01 | 0.00 | 0.01 | 0 | 0 | 63 | 0.01 | 0.00 | 0.00 | 0 | 0 | 67 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1901 | 0.01 | 0.00 | 0.00 | 0 | 0 | 369 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1498 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1401 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1455 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1072 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 1906 | 0.01 | 0.00 | 0.00 | 0 | 0 | | 1193 | 0.01 | 0.00 | 0.00 | 0 | 0 |
| 2041 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| 1007 | 0.01 | 0.00 | 0.01 | 0 | 0 | | |
| 77 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| 1483 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| 400 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| 1904 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 2427 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| 551 | 0.02 | 0.00 | 0.01 | 0 | 0 | | |
| 78 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 1570 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 1900 | 0.01 | 0.00 | 0.01 | 0 | 0 | | |
| 96 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 646-653
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 646-653
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1126 | 0.00 | 0.00 | 0.01 | 100 | 100 | 1161 | 0.01 | 0.07 | 0.04 | 100 | 100 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6220
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6229-6230
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
| Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1565 | 0.04 | 0.01 | 0.06 | 0 | 26.34 | | |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1565) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | | | |
| Presence of expensive FP instructions | 1 | | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | |
| Control Flow Issues | | | | | |
| Presence of calls | 1 | | | | |
| Data Access Issues | | | | | |
| Presence of constant non-unit stride data access | 1 | | | | |
| Vectorization Roadblocks | | | | | |
| Presence of calls | 1 | | | | |
| Presence of constant non-unit stride data access | 1 | | | | |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6446-6447
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6453-6456
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6446-6447
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6453-6456
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 800 | 0.01 | 0.00 | 0.02 | 37.5 | 81.25 | 839 | 0.00 | 0.03 | 0.02 | 37.5 | 81.25 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 8825-8826
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1447 | 0.01 | 0.00 | 0.02 | 0 | 0 | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /usr/include/c++/11/bits/char_traits.h: 374-374
- /usr/include/c++/11/bits/char_traits.h: 389-389
- /usr/include/c++/11/bits/hashtable_policy.h: 434-434
- /usr/include/c++/11/bits/hashtable_policy.h: 1251-1255
- /usr/include/c++/11/bits/hashtable_policy.h: 1621-1621
- /usr/include/c++/11/bits/hashtable.h: 1843-1843
- /usr/include/c++/11/bits/basic_string.h: 6237-6237
- /usr/include/c++/11/bits/stl_pair.h: 466-466
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 4209 | 0.02 | 0.02 | 0.01 | 0 | 50 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_2 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-quants.c: 408-412
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-impl.h: 354-361
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-35-140.ec2.internal/176-410-1312/llama.cpp/build/llama.cpp/ggml/src/ggml-impl.h: 369-377
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 592 | 0.02 | 0.02 | 0.01 | 39.13 | 58.42 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |