| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-948
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-910
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 928-929
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 13617-13617
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16103-16103
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 20765-20765
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 30781-30781
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 31056-31056
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 34664-34664
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 910-910
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/arm/quants.c: 928-929
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h: 51-51
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 13617-13617
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 16103-16103
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 20765-20765
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 30781-30781
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 31056-31056
- /usr/lib/gcc/aarch64-amazon-linux/11/include/arm_neon.h: 34664-34664
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2293 | 0.09 | 0.08 | 0.21 | 10.53 | 24.67 | 1955 | 0.09 | 0.08 | 0.68 | 16.22 | 25.68 | 1934 | 0.09 | 4.34 | 1.21 | 16.22 | 25.68 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2293) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1955) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1934) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1008-1034
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 385-387
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1009-1023
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 1031-1034
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 908 | 0.07 | 0.03 | 0.08 | 68.18 | 79.76 | 766 | 0.06 | 0.04 | 0.33 | 80 | 97.59 | 783 | 0.06 | 1.55 | 0.43 | 88.89 | 97.32 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 908) | Sum on 1 analyzed binary loop (libggml-cpu.so - 766) | Sum on 1 analyzed binary loop (libggml-cpu.so - 783) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6210-6211
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6230
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6413-6413
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6211-6211
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6223-6231
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 789 | 0.09 | 0.04 | 0.39 | 2.33 | 10.1 | 798 | 0.04 | 1.16 | 0.32 | 0 | 12.76 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 789) | Sum on 1 analyzed binary loop (libggml-cpu.so - 798) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | Loop Computation Issues | |
| | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| | Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
| | Control Flow Issues | | Control Flow Issues | |
| | Presence of calls | 1 | Presence of calls | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | Presence of calls | 1 | Presence of calls | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 822 | 0.05 | 1.84 | 0.51 | 0 | 14.18 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 822) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | |
| | | | Presence of expensive FP instructions | 1 |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | Control Flow Issues | |
| | | | Presence of calls | 1 |
| | | | Data Access Issues | |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Presence of indirect access | 1 |
| | | | Vectorization Roadblocks | |
| | | | Presence of calls | 1 |
| | | | Presence of constant non-unit stride data access | 1 |
| | | | Presence of indirect access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 10-10
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 433 | 0.04 | 0.01 | 0.03 | 0 | 12.5 | 410 | 0.03 | 0.01 | 0.10 | 25 | 50 | 416 | 0.03 | 0.53 | 0.15 | 25 | 50 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 433) | Sum on 1 analyzed binary loop (libggml-cpu.so - 410) | Sum on 1 analyzed binary loop (libggml-cpu.so - 416) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 18-18
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/binary-ops.cpp: 31-32
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 538 | 0.04 | 0.01 | 0.03 | 0 | 11.84 | 494 | 0.03 | 0.01 | 0.09 | 25 | 50 | 502 | 0.04 | 0.60 | 0.17 | 25 | 50 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 538) | Sum on 1 analyzed binary loop (libggml-cpu.so - 494) | Sum on 1 analyzed binary loop (libggml-cpu.so - 502) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 4325-4326
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1278 | 0.04 | 0.01 | 0.03 | 94.12 | 95.59 | 1132 | 0.02 | 0.01 | 0.09 | 0 | 13.28 | 1154 | 0.02 | 0.42 | 0.12 | 90.91 | 75.04 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1278) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1132) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1154) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 411-458
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1753 | 0.02 | 0.01 | 0.01 | 100 | 100 | 1470 | 0.02 | 0.01 | 0.06 | 100 | 100 | 1499 | 0.03 | 0.35 | 0.10 | 100 | 100 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1753) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1470) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1499) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: 3228-3229
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 0 | 0.03 | 0.01 | 0.02 | 86.36 | 95.61 | 4 | 0.02 | 0.01 | 0.06 | 93.18 | 94.03 | 0 | 0.01 | 0.26 | 0.07 | 92.11 | 93.09 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 0) | Sum on 1 analyzed binary loop (libggml-cpu.so - 4) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Data Access Issues | | Data Access Issues | | | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | |
| Vectorization Roadblocks | | Vectorization Roadblocks | | | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 231-262
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 231-262
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.cpp: 231-262
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 901 | 0.02 | 0.00 | 0.01 | 96 | 97 | 763 | 0.02 | 0.01 | 0.05 | 100 | 100 | 780 | 0.02 | 0.30 | 0.08 | 100 | 100 |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 901) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 780) |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Data Access Issues | | | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | | | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | | | Presence of constant non-unit stride data access | 1 |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6220-6220
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6229-6230
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6238-6245
| Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1432 | 0.07 | 0.04 | 0.10 | 0 | 13.17 | | |
| | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1432) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | | | |
| Presence of expensive FP instructions | 1 | | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | | |
| Control Flow Issues | | | | | |
| Presence of calls | 1 | | | | |
| Data Access Issues | | | | | |
| Presence of constant non-unit stride data access | 1 | | | | |
| Vectorization Roadblocks | | | | | |
| Presence of calls | 1 | | | | |
| Presence of constant non-unit stride data access | 1 | | | | |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2483 | 0.01 | 0.00 | 0.00 | 0 | 0 | 4180 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4261 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2473 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3768 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4575 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1151 | 0.00 | 0.00 | 0.00 | 0 | 0 | 518 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1676 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2457 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3907 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4216 | 0.02 | 0.02 | 0.01 | 0 | 0 |
| 2634 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4140 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3955 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2650 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4130 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3796 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2480 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3732 | 0.02 | 0.00 | 0.00 | 0 | 0 | 3972 | 0.01 | 0.01 | 0.00 | 0 | 0 |
| 2767 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4141 | 0.00 | 0.00 | 0.00 | 0 | 0 | 3800 | 0.02 | 0.02 | 0.01 | 0 | 0 |
| 330 | 0.00 | 0.00 | 0.00 | 0 | 0 | 4472 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1227 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 314 | 0.03 | 0.00 | 0.00 | 0 | 0 | 620 | 0.04 | 0.00 | 0.01 | 0 | 0 | 1910 | 0.00 | 0.01 | 0.00 | 0 | 0 |
| 1744 | 0.00 | 0.00 | 0.00 | 0 | 0 | 56 | 0.01 | 0.00 | 0.00 | 0 | 0 | 65 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 2268 | 0.00 | 0.00 | 0.00 | 0 | 0 | 48 | 0.01 | 0.00 | 0.00 | 0 | 0 | 53 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 83 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1466 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1496 | 0.00 | 0.02 | 0.01 | 0 | 0 |
| 65 | 0.00 | 0.00 | 0.00 | 0 | 0 | 368 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1600 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 396 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1199 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1494 | 0.01 | 0.03 | 0.01 | 0 | 0 |
| 1437 | 0.02 | 0.00 | 0.01 | 0 | 0 | 1932 | 0.00 | 0.00 | 0.00 | 0 | 0 | 61 | 0.00 | 0.00 | 0.00 | 0 | 0 |
| 1282 | 0.01 | 0.00 | 0.00 | 0 | 0 | 1468 | 0.01 | 0.00 | 0.00 | 0 | 0 | |
| 66 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 1756 | 0.01 | 0.00 | 0.00 | 0 | 0 | | |
| 1754 | 0.00 | 0.00 | 0.00 | 0 | 0 | | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 646-653
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/vec.h: 646-653
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1131 | 0.01 | 0.00 | 0.01 | 100 | 100 | 1157 | 0.01 | 0.16 | 0.04 | 100 | 100 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6446-6447
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6453-6456
| Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6446-6447
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 6453-6456
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 799 | 0.01 | 0.00 | 0.01 | 37.5 | 40.63 | 832 | 0.01 | 0.07 | 0.02 | 37.5 | 40.63 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/ops.cpp: 8825-8826
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 1463 | 0.01 | 0.00 | 0.02 | 0 | 0 | |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |
| Run orig_default | Run gcc_default | Run gcc_1 |
| Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-impl.h: 354-361
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-impl.h: 369-377
- /home/eoseret/Tools/QaaS/qaas_runs/ip-172-31-46-37.ec2.internal/176-415-7045/llama.cpp/build/llama.cpp/ggml/src/ggml-quants.c: 408-412
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| | 603 | 0.05 | 0.06 | 0.02 | 39.13 | 29.21 |
| | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
| Analysis | Count | Analysis | Count | Analysis | Count |