| ID | Module | Source Location | Source Function | Level | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Coverage (% app. time) | Speedup if no scalar integer | Speedup if FP arith vectorized | Speedup if fully vectorized | Speedup if FP only | Number of paths | Vectorization Ratio (%) | Vector Length Use (%) | Flops (GFLOP/s) | CQA cycles | CQA cycles if no scalar integer | CQA cycles if FP arith vectorized | CQA cycles if fully vectorized | CQA cycles if FP only |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ○Loop 18 | attention-gcc-skl256 | attention_v2.cpp:30-31 | main | Innermost | 9.25 | 9.25 | 33.64 | 1.00 | 2.58 | 14.55 | 1.00 | 1 | 20.00 | 11.25 | 1.12 | 4.00 | 4.00 | 1.55 | 0.28 | 4.00 |
| ○Loop 15 | attention-gcc-skl256 | attention_v2.cpp:30-31 | main | Innermost | 7.33 | 7.33 | 26.68 | 1.00 | 2.58 | 14.55 | 1.00 | 1 | 20.00 | 11.25 | 1.32 | 4.00 | 4.00 | 1.55 | 0.28 | 4.00 |
| ○Loop 10 | attention-gcc-skl256 | attention_v2.cpp:30-31 | main | Innermost | 2.02 | 2.02 | 7.35 | 1.00 | 2.58 | 14.55 | 1.00 | 1 | 20.00 | 11.25 | 1.22 | 4.00 | 4.00 | 1.55 | 0.28 | 4.00 |
| ○Loop 7 | attention-gcc-skl256 | attention_v2.cpp:30-31 | main | Innermost | 2.00 | 2.00 | 7.27 | 1.00 | 2.58 | 14.55 | 1.00 | 1 | 20.00 | 11.25 | 1.35 | 4.00 | 4.00 | 1.55 | 0.28 | 4.00 |
| ○Loop 4 | attention-gcc-skl256 | attention_v2.cpp:30-31 | main | Innermost | 1.99 | 1.99 | 7.24 | 1.00 | 2.58 | 14.55 | 1.00 | 1 | 20.00 | 11.25 | 1.23 | 4.00 | 4.00 | 1.55 | 0.28 | 4.00 |
| ○Loop 32 | attention-gcc-skl256 | attention_v2.cpp:55-56 | softmax(float const*, float*, float*, int) | Innermost | 1.26 | 1.26 | 4.58 | 1.00 | 2.00 | 4.00 | 1.00 | 1 | 0.00 | 6.25 | 0.39 | 3.00 | 3.00 | 1.50 | 0.75 | 3.00 |
| ○Loop 16 | attention-gcc-skl256 | attention_v2.cpp:27-30,attention_v2.cpp:33-33,attention_v2.cpp:236-236 | main | InBetween | 0.69 | 0.69 | 2.53 | 1.57 | 2.75 | 14.67 | 2.75 | 1 | 20.00 | 11.25 | 1.37 | 2.75 | 1.75 | 1.00 | 0.19 | 1.00 |
| ○Loop 2 | attention-gcc-skl256 | attention_v2.cpp:163-163,random.tcc:458-466,random.tcc:3557-3558 | main | Innermost | 0.22 | 0.22 | 0.78 | 3.08 | 2.12 | 12.83 | 4.63 | 2 | 2.08 | 10.40 | 0.14 | 9.25 | 3.00 | 4.36 | 0.72 | 2.00 |
| ○Loop 11 | attention-gcc-skl256 | attention_v2.cpp:26-30,attention_v2.cpp:33-33 | main | InBetween | 0.20 | 0.20 | 0.73 | 1.67 | 1.00 | 14.55 | 2.50 | 1 | 25.00 | 12.50 | 0.30 | 2.50 | 1.50 | 2.50 | 0.17 | 1.00 |
| ○Loop 19 | attention-gcc-skl256 | attention_v2.cpp:26-30,attention_v2.cpp:33-33 | main | InBetween | 0.17 | 0.17 | 0.62 | 1.67 | 1.00 | 14.55 | 2.50 | 1 | 25.00 | 12.50 | 0.71 | 2.50 | 1.50 | 2.50 | 0.17 | 1.00 |
| ○Loop 8 | attention-gcc-skl256 | attention_v2.cpp:26-30,attention_v2.cpp:33-33 | main | InBetween | 0.15 | 0.14 | 0.53 | 1.67 | 1.00 | 14.55 | 2.50 | 1 | 25.00 | 12.50 | 1.10 | 2.50 | 1.50 | 2.50 | 0.17 | 1.00 |
| ○Loop 31 | attention-gcc-skl256 | attention_v2.cpp:52-53 | softmax(float const*, float*, float*, int) | Innermost | 0.15 | 0.14 | 0.53 | 1.33 | 1.33 | 16.00 | 2.00 | 1 | 0.00 | 6.25 | 1.17 | 2.00 | 1.50 | 1.50 | 0.13 | 1.00 |
| ○Loop 3 | attention-gcc-skl256 | attention_v2.cpp:164-167,random.tcc:406-409,random.tcc:458-459,random.tcc:462-466,random.tcc:3519-3519,random.tcc:3557-3558 | main | InBetween | 0.12 | 0.12 | 0.44 | 2.68 | 1.33 | 10.62 | 8.33 | 8 | 13.33 | 15.00 | 0.17 | 18.75 | 7.00 | 14.06 | 1.77 | 2.25 |
| ○Loop 5 | attention-gcc-skl256 | attention_v2.cpp:27-30,attention_v2.cpp:33-33,random.tcc:422-422 | main | InBetween | 0.11 | 0.11 | 0.40 | 1.67 | 1.00 | 14.55 | 2.50 | 1 | 25.00 | 12.50 | 0.91 | 2.50 | 1.50 | 2.50 | 0.17 | 1.00 |
| ○Loop 13 | attention-gcc-skl256 | attention_v2.cpp:237-238 | main | Innermost | 0.06 | 0.06 | 0.24 | 1.25 | 1.00 | 16.00 | 1.25 | 1 | 0.00 | 6.25 | 0.00 | 1.25 | 1.00 | 1.25 | 0.08 | 1.00 |
| ○Loop 23 | attention-gcc-skl256 | random.tcc:404-409,random.tcc:420-423,random.tcc:458-458,random.tcc:462-466,random.tcc:3557-3558 | main | InBetween | 0.05 | 0.05 | 0.20 | 2.33 | 2.17 | 11.00 | 4.43 | 2 | 10.00 | 12.03 | 0.18 | 11.63 | 5.00 | 5.36 | 1.06 | 2.63 |
| ○Loop 26 | attention-gcc-skl256 | random.tcc:458-466,random.tcc:3557-3558 | main | Innermost | 0.05 | 0.05 | 0.18 | 2.78 | 2.34 | 11.87 | 5.33 | 2 | 7.89 | 11.63 | 0.60 | 8.00 | 2.88 | 3.42 | 0.67 | 1.50 |
| ○Loop 34 | attention-gcc-skl256 | attention_v2.cpp:43-43,attention_v2.cpp:46-47,attention_v2.cpp:50-52,attention_v2.cpp:58-61 | softmax(float const*, float*, float*, int) | InBetween | 0.05 | 0.05 | 0.18 | 3.39 | 1.00 | 10.67 | 10.17 | 1 | 22.58 | 12.50 | 0.00 | 15.25 | 4.50 | 15.25 | 1.43 | 1.50 |
| ○Loop 25 | attention-gcc-skl256 | random.tcc:412-417 | main | Innermost | 0.04 | 0.03 | 0.13 | 1.00 | 1.00 | 2.00 | 3.50 | 1 | 100.00 | 50.00 | 0.00 | 3.50 | 3.50 | 3.50 | 1.75 | 1.00 |
| ○Loop 42 | attention-gcc-skl256 | random.tcc:412-417 | std::mersenne_twister_engine | Single | 0.02 | 0.02 | 0.07 | 1.00 | 1.00 | 2.00 | 3.50 | 1 | 100.00 | 50.00 | 0.00 | 3.50 | 3.50 | 3.50 | 1.75 | 1.00 |
| ○Loop 35 | attention-gcc-skl256 | attention_v2.cpp:47-48 | softmax(float const*, float*, float*, int) | InBetween | 0.01 | 0.01 | 0.04 | 1.43 | 1.00 | 10.55 | 4.13 | 1 | 40.00 | 12.92 | 0.00 | 8.25 | 5.75 | 8.25 | 0.78 | 2.00 |
| ○Loop 33 | attention-gcc-skl256 | attention_v2.cpp:47-48 | softmax(float const*, float*, float*, int) | Innermost | 0.01 | 0.01 | 0.04 | 1.00 | 1.00 | 2.00 | 1.00 | 1 | 100.00 | 50.00 | 1.00 | 4.00 | 4.00 | 4.00 | 2.00 | 4.00 |
| ○Loop 41 | attention-gcc-skl256 | random.tcc:404-409 | std::mersenne_twister_engine | Single | 0.01 | 0.01 | 0.04 | 1.00 | 1.00 | 2.00 | 3.50 | 1 | 100.00 | 50.00 | 0.00 | 3.50 | 3.50 | 3.50 | 1.75 | 1.00 |
| ○Loop 1 | attention-gcc-skl256 | random.h:585-585,random.tcc:333-339 | main | Innermost | 0.01 | 0.01 | 0.04 | 1.00 | 1.00 | 8.00 | 1.00 | 1 | 0.00 | 12.50 | 0.00 | 6.00 | 6.00 | 6.00 | 0.75 | 6.00 |
| ○Loop 6 | attention-gcc-skl256 | attention_v2.cpp:26-27 | main | InBetween | 0.00 | 0.00 | 0.02 | 1.00 | 1.00 | 16.00 | 1.25 | 1 | 0.00 | 6.25 | 0.00 | 1.25 | 1.25 | 1.25 | 0.08 | 1.00 |