| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 4 | 0.78 | 0.78 | 7.55 | 0 | 15 | 6.92 | 60 | 0.74 | 0.74 | 7.44 | 0 | 16.67 | 6.63 |
| 7 | 0.76 | 0.76 | 7.35 | 0 | 15 | 6.49 | 39 | 3.48 | 3.48 | 34.99 | 0 | 15.91 | 5.81 |
| 10 | 0.73 | 0.73 | 7.01 | 0 | 15 | 7.1 | 50 | 3.03 | 3.03 | 30.42 | 0 | 15.91 | 5.76 |
| 14 | 3.10 | 3.10 | 30.00 | 0 | 15 | 6.69 | 63 | 0.70 | 0.70 | 7.04 | 0 | 16.67 | 6.87 |
| 17 | 3.59 | 3.59 | 34.69 | 0 | 15 | 5.8 | 57 | 0.72 | 0.72 | 7.24 | 0 | 16.67 | 6.81 |
| |
| Sum on 5 analyzed binary loops (attention-gcc-native - 4, attention-gcc-native - 7, attention-gcc-native - 10, attention-gcc-native - 14, attention-gcc-native - 17) | Sum on 5 analyzed binary loops (attention-armclang-native - 60, attention-armclang-native - 39, attention-armclang-native - 50, attention-armclang-native - 63, attention-armclang-native - 57) |
| Analysis | Count | Analysis | Count |
| Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/random.tcc: 412-417
| Loop Source Regions | - /usr/lib/gcc/aarch64-amazon-linux/14/../../../../include/c++/14/bits/random.tcc: 412-417
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 42 | 0.04 | 0.04 | 0.34 | 100 | 54.55 | 0 | 84 | 0.04 | 0.04 | 0.35 | 94.12 | 95.59 | 0 |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/random.h: 248-248
- /usr/include/c++/14/bits/random.tcc: 333-339
- /usr/include/c++/14/bits/random.tcc: 458-466
- /usr/include/c++/14/bits/random.tcc: 3367-3371
- /home/eoseret/llm-attention/attention_v2.cpp: 163-163
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 1 | 0.01 | 0.01 | 0.10 | 0 | 20.83 | 0 | |
| 2 | 0.06 | 0.06 | 0.58 | 0 | 20.31 | 2.35 | |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 2) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Control Flow Issues | | | |
| Presence of 2 to 4 paths | 1 | | |
| Data Access Issues | | | |
| Presence of indirect access | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of 2 to 4 paths | 1 | | |
| Presence of indirect access | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/stl_vector.h: 1128-1128
- /home/eoseret/llm-attention/attention_v2.cpp: 237-238
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 237-238
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 22 | 0.04 | 0.04 | 0.34 | 14.29 | 14.29 | 0 | 31 | 0.03 | 0.02 | 0.25 | 25 | 18.75 | 0 |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 22) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Data Access Issues | | | |
| Presence of constant non-unit stride data access | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of constant non-unit stride data access | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 52-53
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 52-53
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 30 | 0.03 | 0.03 | 0.29 | 0 | 12.5 | 3.13 | 45 | 0.00 | 0.00 | 0.05 | 75 | 43.75 | 10.5 |
| 33 | 0.01 | 0.01 | 0.10 | 66.67 | 93.75 | 14.13 |
| 44 | 0.00 | 0.00 | 0.05 | 37.5 | 48.44 | 23 |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 55-56
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 55-56
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 31 | 0.04 | 0.04 | 0.39 | 0 | 12.5 | 3.31 | 42 | 0.01 | 0.01 | 0.10 | 85.71 | 82.14 | 23 |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 31) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Presence of expensive FP instructions | 1 | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Control Flow Issues | | | |
| Presence of calls | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of calls | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /usr/include/c++/14/bits/random.tcc: 404-409
| Loop Source Regions | - /usr/lib/gcc/aarch64-amazon-linux/14/../../../../include/c++/14/bits/random.tcc: 404-409
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 41 | 0.02 | 0.01 | 0.15 | 100 | 100 | 0.08 | 82 | 0.01 | 0.01 | 0.15 | 93.75 | 95.31 | 0.08 |
| 66 | 0.01 | 0.01 | 0.10 | 93.75 | 95.31 | 0 |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 164-167
- /usr/include/c++/14/bits/random.tcc: 458-466
- /usr/include/c++/14/bits/random.tcc: 3367-3374
| Loop Source Regions | |
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 24 | 0.04 | 0.04 | 0.34 | 0 | 20.59 | 3.96 | |
| |
| Sum on 1 analyzed binary loop (attention-gcc-native - 24) | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |
| Loop Computation Issues | | | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
| Presence of a large number of scalar integer instructions | 1 | | |
| Control Flow Issues | | | |
| Presence of calls | 1 | | |
| Presence of more than 4 paths | 1 | | |
| Vectorization Roadblocks | | | |
| Presence of calls | 1 | | |
| Presence of more than 4 paths | 1 | | |
| Run gcc | Run armclang |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 47-48
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 47-48
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 29 | 0.01 | 0.01 | 0.10 | 100 | 100 | 1.13 | 32 | 0.00 | 0.00 | 0.05 | 80 | 85 | 1.75 |
| 32 | 0.00 | 0.01 | 0.05 | 100 | 100 | 1.5 | |
| |
| No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. | No Loops Overview analysis found for any assembly loop. More loops can be analyzed using option --summary-loop-count. |
| Analysis | Count | Analysis | Count |