| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 4 | 0.61 | 0.61 | 6.99 | 0 | 30 | 8.76 | 68 | 0.56 | 0.56 | 6.91 | 0 | 30 | 8.83 |
| 7 | 0.59 | 0.58 | 6.76 | 0 | 30 | 8.68 | 43 | 3.00 | 3.00 | 36.74 | 0 | 31.82 | 6.69 |
| 10 | 0.65 | 0.64 | 7.45 | 0 | 30 | 8.09 | 62 | 0.56 | 0.55 | 6.78 | 0 | 30 | 9.09 |
| 14 | 2.57 | 2.57 | 29.68 | 0 | 30 | 8.07 | 65 | 0.53 | 0.53 | 6.48 | 0 | 30 | 9.43 |
| 17 | 3.13 | 3.13 | 36.09 | 0 | 30 | 6.56 | 54 | 2.46 | 2.46 | 30.07 | 0 | 31.82 | 7.63 |
| |
| Sum on 5 analyzed binary loops (attention-gcc-native - 4, attention-gcc-native - 7, attention-gcc-native - 10, attention-gcc-native - 14, attention-gcc-native - 17) | Sum on 5 analyzed binary loops (attention-armclang-native - 68, attention-armclang-native - 43, attention-armclang-native - 62, attention-armclang-native - 65, attention-armclang-native - 54) |
| Analysis | Count | Analysis | Count |
| Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 163-163
- /usr/include/c++/14/bits/random.tcc: 333-333
- /usr/include/c++/14/bits/random.tcc: 458-466
- /usr/include/c++/14/bits/random.tcc: 3367-3371
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 2 | 0.04 | 0.04 | 0.52 | 0 | 40.63 | 2.83 | |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 2) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Control Flow Issues | | | |
| Presence of 2 to 4 paths | 1 | | |
| Data Access Issues | | | |
| Presence of indirect access | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of 2 to 4 paths | 1 | | |
| Presence of indirect access | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/random.tcc: 412-417
| Loop Source Regions | - /usr/lib/gcc/aarch64-amazon-linux/14/../../../../include/c++/14/bits/random.tcc: 412-417
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 42 | 0.01 | 0.02 | 0.17 | 100 | 100 | 0 | 72 | 0.00 | 0.00 | 0.06 | 91.67 | 95.83 | 0 |
| 82 | 0.02 | 0.02 | 0.24 | 95.83 | 97.92 | 0 |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 52-53
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 52-53
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 30 | 0.02 | 0.02 | 0.23 | 0 | 25 | 6.25 | 49 | 0.01 | 0.01 | 0.12 | 75 | 87.5 | 2.5 |
| 37 | 0.00 | 0.00 | 0.06 | 50 | 62.5 | 29.5 |
| 36 | 0.00 | 0.00 | 0.06 | 100 | 100 | 25 |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 55-56
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 31 | 0.04 | 0.04 | 0.46 | 0 | 25 | 4.38 | |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 31) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Presence of expensive FP instructions | 1 | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Control Flow Issues | | | |
| Presence of calls | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of calls | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/random.tcc: 404-409
| Loop Source Regions | - /usr/lib/gcc/aarch64-amazon-linux/14/../../../../include/c++/14/bits/random.tcc: 404-409
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 41 | 0.01 | 0.01 | 0.12 | 100 | 100 | 0.38 | 81 | 0.01 | 0.01 | 0.18 | 97.87 | 98.94 | 0.25 |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/stl_vector.h: 1128-1128
- /home/eoseret/llm-attention/attention_v2.cpp: 237-238
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 22 | 0.03 | 0.02 | 0.29 | 14.29 | 28.57 | 0 | |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 164-167
- /usr/include/c++/14/bits/random.tcc: 458-466
- /usr/include/c++/14/bits/random.tcc: 3367-3374
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 24 | 0.03 | 0.02 | 0.29 | 0 | 41.18 | 4.9 | |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 24) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Control Flow Issues | | | |
| Presence of calls | 1 | | |
| Presence of more than 4 paths | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of calls | 1 | | |
| Presence of more than 4 paths | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 47-48
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 47-48
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 29 | 0.00 | 0.00 | 0.06 | 100 | 100 | 2.25 | 51 | 0.00 | 0.00 | 0.06 | 100 | 100 | 2 |
| 32 | 0.00 | 0.00 | 0.06 | 100 | 100 | 4.25 | |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |