| Function: __svml_expf16_z0 | Module: attention-avx512 | Source: :0-0 | Coverage (incl. loops): 0.70% | (excl. loops): 0.70% |
|---|
| Function: __svml_expf16_z0 | Module: attention-avx512 | Source: :0-0 | Coverage (incl. loops): 0.70% | (excl. loops): 0.70% |
|---|
*** This Panel is Intentionally Left Blank. *** It is due to a lack of debug symbols in the given object |
0x408600 ENDBR64 |
0x408604 VMOVUPS 0x97f2(%RIP),%ZMM1 |
0x40860e VMOVUPS 0x9828(%RIP),%ZMM2 |
0x408618 VFMADD213PS {rz-sae},%ZMM2,%ZMM0,%ZMM1 |
0x40861e VSUBPS {rn-sae},%ZMM2,%ZMM1,%ZMM2 |
0x408624 VMOVUPS 0x9852(%RIP),%ZMM3 |
0x40862e VMOVUPS 0x9888(%RIP),%ZMM4 |
0x408638 VFNMADD213PS {rn-sae},%ZMM0,%ZMM2,%ZMM3 |
0x40863e VFNMADD231PS {rn-sae},%ZMM4,%ZMM2,%ZMM3 |
0x408644 VPERMPS 0x9772(%RIP),%ZMM1,%ZMM0 |
0x40864e VANDPS 0x98a8(%RIP),%ZMM3,%ZMM1 |
0x408658 VMOVUPS 0x98de(%RIP),%ZMM3 |
0x408662 VMOVUPS 0x9914(%RIP),%ZMM4 |
0x40866c VFMADD231PS {rn-sae},%ZMM3,%ZMM1,%ZMM4 |
0x408672 VMULPS {rn-sae},%ZMM1,%ZMM1,%ZMM3 |
0x408678 VFMADD213PS {rn-sae},%ZMM1,%ZMM4,%ZMM3 |
0x40867e VFMADD213PS {rn-sae},%ZMM0,%ZMM0,%ZMM3 |
0x408684 VANDPS 0x9932(%RIP),%ZMM3,%ZMM0 |
0x40868e VSCALEFPS {rn-sae},%ZMM2,%ZMM0,%ZMM0 |
0x408694 RET |
0x408695 NOPW %CS:(%RAX,%RAX,1) |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ○57.14 | main | attention.cpp:53 | attention-avx512 |
| ○42.86 | main | attention.cpp:56 | attention-avx512 |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| Path / |
The code analyzed by CQA in that panel excludes loops and represents 0.70% of application time for run run_0
| Source file and lines | |
| Module | attention-avx512 |
| nb instructions | 20 |
| nb uops | 19 |
| loop length | 149 |
| used x86 registers | 0 |
| used mmx registers | 0 |
| used xmm registers | 0 |
| used ymm registers | 0 |
| used zmm registers | 5 |
| nb stack references | 0 |
| ADD-SUB / MUL ratio | 1.00 |
| micro-operation queue | 3.17 cycles |
| front end | 3.17 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 6.00 | 0.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| cycles | 6.00 | 4.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| Cycles executing div or sqrt instructions | NA |
| Front-end | 3.17 |
| Dispatch | 6.00 |
| Overall L1 | 6.00 |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENDBR64 | N/A | |||||||||||||||
| VMOVUPS 0x97f2(%RIP),%ZMM1 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9828(%RIP),%ZMM2 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD213PS {rz-sae},%ZMM2,%ZMM0,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VSUBPS {rn-sae},%ZMM2,%ZMM1,%ZMM2 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9852(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9888(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFNMADD213PS {rn-sae},%ZMM0,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFNMADD231PS {rn-sae},%ZMM4,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VPERMPS 0x9772(%RIP),%ZMM1,%ZMM0 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 3 | 1 | vect (100.0%) |
| VANDPS 0x98a8(%RIP),%ZMM3,%ZMM1 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VMOVUPS 0x98de(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9914(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD231PS {rn-sae},%ZMM3,%ZMM1,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VMULPS {rn-sae},%ZMM1,%ZMM1,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM1,%ZMM4,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM0,%ZMM0,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VANDPS 0x9932(%RIP),%ZMM3,%ZMM0 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VSCALEFPS {rn-sae},%ZMM2,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 | N/A |
The code analyzed by CQA in that panel excludes loops and represents 0.70% of application time for run run_0
| Source file and lines | |
| Module | attention-avx512 |
| nb instructions | 20 |
| nb uops | 19 |
| loop length | 149 |
| used x86 registers | 0 |
| used mmx registers | 0 |
| used xmm registers | 0 |
| used ymm registers | 0 |
| used zmm registers | 5 |
| nb stack references | 0 |
| ADD-SUB / MUL ratio | 1.00 |
| micro-operation queue | 3.17 cycles |
| front end | 3.17 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 6.00 | 0.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| cycles | 6.00 | 4.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| Cycles executing div or sqrt instructions | NA |
| Front-end | 3.17 |
| Dispatch | 6.00 |
| Overall L1 | 6.00 |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENDBR64 | N/A | |||||||||||||||
| VMOVUPS 0x97f2(%RIP),%ZMM1 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9828(%RIP),%ZMM2 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD213PS {rz-sae},%ZMM2,%ZMM0,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VSUBPS {rn-sae},%ZMM2,%ZMM1,%ZMM2 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9852(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9888(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFNMADD213PS {rn-sae},%ZMM0,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFNMADD231PS {rn-sae},%ZMM4,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VPERMPS 0x9772(%RIP),%ZMM1,%ZMM0 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 3 | 1 | vect (100.0%) |
| VANDPS 0x98a8(%RIP),%ZMM3,%ZMM1 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VMOVUPS 0x98de(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9914(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD231PS {rn-sae},%ZMM3,%ZMM1,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VMULPS {rn-sae},%ZMM1,%ZMM1,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM1,%ZMM4,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM0,%ZMM0,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VANDPS 0x9932(%RIP),%ZMM3,%ZMM0 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VSCALEFPS {rn-sae},%ZMM2,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 | N/A |
| Name | Coverage (%) | Time (s) |
|---|---|---|
| ○__svml_expf16_z0 | 0.70 | 0.03 |
