| Run gcc-256 | Run gcc-512 | Run clang-256 | Run clang-512 | Run icx-256 | Run icx-512 |
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
| Loop Source Regions | - /home/eoseret/llm-attention/attention_v2.cpp: 30-31
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
| 2 | 0.73 | 0.73 | 7.28 | 0 | 8.75 | 2 | 0.87 | 0.87 | 8.74 | 0 | 8.75 | 53 | 4.23 | 4.23 | 36.92 | 0 | 8.33 | 53 | 4.40 | 4.40 | 37.82 | 0 | 8.33 | 28 | 2.67 | 2.67 | 28.44 | 81.82 | 34.09 | 30 | 1.00 | 1.00 | 18.33 | 85.71 | 43.75 |
| 5 | 0.60 | 0.60 | 5.98 | 0 | 8.75 | 5 | 0.64 | 0.65 | 6.52 | 0 | 8.75 | 64 | 2.95 | 2.95 | 25.78 | 0 | 8.33 | 44 | 0.72 | 0.72 | 6.19 | 0 | 7.5 | 39 | 0.63 | 0.64 | 6.78 | 77.14 | 32.5 | 41 | 0.30 | 0.30 | 5.50 | 85.71 | 43.75 |
| 8 | 0.61 | 0.61 | 6.08 | 0 | 8.75 | 8 | 0.55 | 0.55 | 5.56 | 0 | 8.75 | 43 | 0.70 | 0.70 | 6.12 | 0 | 7.5 | 43 | 0.71 | 0.71 | 6.10 | 0 | 7.5 | 20 | 2.91 | 2.91 | 31.11 | 77.14 | 32.5 | 22 | 0.99 | 1.00 | 18.24 | 90 | 48.13 |
| 13 | 2.90 | 2.90 | 29.13 | 0 | 8.75 | 13 | 2.55 | 2.55 | 25.76 | 0 | 8.75 | 41 | 0.71 | 0.71 | 6.16 | 0 | 7.5 | 42 | 0.74 | 0.74 | 6.32 | 0 | 7.5 | 42 | 0.60 | 0.60 | 6.35 | 77.14 | 32.5 | 44 | 0.28 | 0.28 | 5.13 | 85.71 | 43.75 |
| 16 | 3.03 | 3.03 | 30.44 | 0 | 8.75 | 16 | 3.17 | 3.17 | 32.07 | 0 | 8.75 | 42 | 0.73 | 0.73 | 6.38 | 0 | 7.5 | 65 | 2.76 | 2.76 | 23.72 | 0 | 8.33 | 36 | 0.76 | 0.76 | 8.11 | 77.14 | 32.5 | 38 | 0.22 | 0.22 | 4.12 | 85.71 | 43.75 |
| | | | | |
| Sum on 5 analyzed binary loops (attention-gcc-gnr-256 - 2, attention-gcc-gnr-256 - 5, attention-gcc-gnr-256 - 8, attention-gcc-gnr-256 - 13, attention-gcc-gnr-256 - 16) | Sum on 5 analyzed binary loops (attention-gcc-gnr-512 - 2, attention-gcc-gnr-512 - 5, attention-gcc-gnr-512 - 8, attention-gcc-gnr-512 - 13, attention-gcc-gnr-512 - 16) | Sum on 5 analyzed binary loops (attention-clang-gnr256 - 53, attention-clang-gnr256 - 64, attention-clang-gnr256 - 43, attention-clang-gnr256 - 41, attention-clang-gnr256 - 42) | Sum on 5 analyzed binary loops (attention-clang-gnr512 - 53, attention-clang-gnr512 - 44, attention-clang-gnr512 - 43, attention-clang-gnr512 - 42, attention-clang-gnr512 - 65) | Sum on 5 analyzed binary loops (attention-gnr-256 - 28, attention-gnr-256 - 39, attention-gnr-256 - 20, attention-gnr-256 - 42, attention-gnr-256 - 36) | Sum on 5 analyzed binary loops (attention-gnr-512 - 30, attention-gnr-512 - 41, attention-gnr-512 - 22, attention-gnr-512 - 44, attention-gnr-512 - 38) |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 |
| Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | 1 |
| Data Access Issues | | Data Access Issues | | Data Access Issues | | Data Access Issues | | Data Access Issues | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 |
| Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 1 | Presence of indirect access | 1 |
| Presence of expensive instructions: scatter/gather | 0 | Presence of expensive instructions: scatter/gather | 0 | Presence of expensive instructions: scatter/gather | 0 | Presence of expensive instructions: scatter/gather | 0 | Presence of expensive instructions: scatter/gather | 1 | Presence of expensive instructions: scatter/gather | 1 |
| Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 0 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
| Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 |
| Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 0 | Presence of indirect access | 1 | Presence of indirect access | 1 |
| Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | |
| Presence of expensive instructions: scatter/gather | | Presence of expensive instructions: scatter/gather | | Presence of expensive instructions: scatter/gather | | Presence of expensive instructions: scatter/gather | | Presence of expensive instructions: scatter/gather | 1 | Presence of expensive instructions: scatter/gather | 1 |
| Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |