| Function: __svml_expf16_z0 | Module: attention-avx512 | Source: :0-0 | Coverage (incl. loops): 0.28% | (excl. loops): 0.28% |
|---|
| Function: __svml_expf16_z0 | Module: attention-avx512 | Source: :0-0 | Coverage (incl. loops): 0.28% | (excl. loops): 0.28% |
|---|
*** This Panel is Intentionally Left Blank. *** It is due to a lack of debug symbols in the given object |
0x4099b0 ENDBR64 |
0x4099b4 VMOVUPS 0x9502(%RIP),%ZMM1 |
0x4099be VMOVUPS 0x9538(%RIP),%ZMM2 |
0x4099c8 VFMADD213PS {rz-sae},%ZMM2,%ZMM0,%ZMM1 |
0x4099ce VSUBPS {rn-sae},%ZMM2,%ZMM1,%ZMM2 |
0x4099d4 VMOVUPS 0x9562(%RIP),%ZMM3 |
0x4099de VMOVUPS 0x9598(%RIP),%ZMM4 |
0x4099e8 VFNMADD213PS {rn-sae},%ZMM0,%ZMM2,%ZMM3 |
0x4099ee VFNMADD231PS {rn-sae},%ZMM4,%ZMM2,%ZMM3 |
0x4099f4 VPERMPS 0x9482(%RIP),%ZMM1,%ZMM0 |
0x4099fe VANDPS 0x95b8(%RIP),%ZMM3,%ZMM1 |
0x409a08 VMOVUPS 0x95ee(%RIP),%ZMM3 |
0x409a12 VMOVUPS 0x9624(%RIP),%ZMM4 |
0x409a1c VFMADD231PS {rn-sae},%ZMM3,%ZMM1,%ZMM4 |
0x409a22 VMULPS {rn-sae},%ZMM1,%ZMM1,%ZMM3 |
0x409a28 VFMADD213PS {rn-sae},%ZMM1,%ZMM4,%ZMM3 |
0x409a2e VFMADD213PS {rn-sae},%ZMM0,%ZMM0,%ZMM3 |
0x409a34 VANDPS 0x9642(%RIP),%ZMM3,%ZMM0 |
0x409a3e VSCALEFPS {rn-sae},%ZMM2,%ZMM0,%ZMM0 |
0x409a44 RET |
0x409a45 NOPW %CS:(%RAX,%RAX,1) |
0x409a4f NOP |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ○100.00 | main | attention.cpp:53 | attention-avx512 |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| Path / |
The code analyzed by CQA in that panel excludes loops and represents 0.28% of application time for run run_0
| Source file and lines | |
| Module | attention-avx512 |
| nb instructions | 20 |
| nb uops | 19 |
| loop length | 149 |
| used x86 registers | 0 |
| used mmx registers | 0 |
| used xmm registers | 0 |
| used ymm registers | 0 |
| used zmm registers | 5 |
| nb stack references | 0 |
| ADD-SUB / MUL ratio | 1.00 |
| micro-operation queue | 3.17 cycles |
| front end | 3.17 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 6.00 | 0.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| cycles | 6.00 | 4.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| Cycles executing div or sqrt instructions | NA |
| Front-end | 3.17 |
| Dispatch | 6.00 |
| Overall L1 | 6.00 |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENDBR64 | N/A | |||||||||||||||
| VMOVUPS 0x9502(%RIP),%ZMM1 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9538(%RIP),%ZMM2 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD213PS {rz-sae},%ZMM2,%ZMM0,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VSUBPS {rn-sae},%ZMM2,%ZMM1,%ZMM2 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9562(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9598(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFNMADD213PS {rn-sae},%ZMM0,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFNMADD231PS {rn-sae},%ZMM4,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VPERMPS 0x9482(%RIP),%ZMM1,%ZMM0 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 3 | 1 | vect (100.0%) |
| VANDPS 0x95b8(%RIP),%ZMM3,%ZMM1 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VMOVUPS 0x95ee(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9624(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD231PS {rn-sae},%ZMM3,%ZMM1,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VMULPS {rn-sae},%ZMM1,%ZMM1,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM1,%ZMM4,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM0,%ZMM0,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VANDPS 0x9642(%RIP),%ZMM3,%ZMM0 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VSCALEFPS {rn-sae},%ZMM2,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 | N/A |
The code analyzed by CQA in that panel excludes loops and represents 0.28% of application time for run run_0
| Source file and lines | |
| Module | attention-avx512 |
| nb instructions | 20 |
| nb uops | 19 |
| loop length | 149 |
| used x86 registers | 0 |
| used mmx registers | 0 |
| used xmm registers | 0 |
| used ymm registers | 0 |
| used zmm registers | 5 |
| nb stack references | 0 |
| ADD-SUB / MUL ratio | 1.00 |
| micro-operation queue | 3.17 cycles |
| front end | 3.17 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 6.00 | 0.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| cycles | 6.00 | 4.00 | 3.33 | 3.33 | 0.00 | 6.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.33 |
| Cycles executing div or sqrt instructions | NA |
| Front-end | 3.17 |
| Dispatch | 6.00 |
| Overall L1 | 6.00 |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| all | 100% |
| load | 100% |
| store | NA (no store vectorizable/vectorized instructions) |
| mul | 100% |
| add-sub | 100% |
| fma | 100% |
| div/sqrt | NA (no div/sqrt vectorizable/vectorized instructions) |
| other | 100% |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENDBR64 | N/A | |||||||||||||||
| VMOVUPS 0x9502(%RIP),%ZMM1 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9538(%RIP),%ZMM2 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD213PS {rz-sae},%ZMM2,%ZMM0,%ZMM1 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VSUBPS {rn-sae},%ZMM2,%ZMM1,%ZMM2 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9562(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9598(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFNMADD213PS {rn-sae},%ZMM0,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFNMADD231PS {rn-sae},%ZMM4,%ZMM2,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VPERMPS 0x9482(%RIP),%ZMM1,%ZMM0 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 3 | 1 | vect (100.0%) |
| VANDPS 0x95b8(%RIP),%ZMM3,%ZMM1 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VMOVUPS 0x95ee(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VMOVUPS 0x9624(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 | vect (100.0%) |
| VFMADD231PS {rn-sae},%ZMM3,%ZMM1,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VMULPS {rn-sae},%ZMM1,%ZMM1,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM1,%ZMM4,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VFMADD213PS {rn-sae},%ZMM0,%ZMM0,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| VANDPS 0x9642(%RIP),%ZMM3,%ZMM0 | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.67 | vect (100.0%) |
| VSCALEFPS {rn-sae},%ZMM2,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 | vect (100.0%) |
| RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 | N/A |
| Name | Coverage (%) | Time (s) |
|---|---|---|
| ○__svml_expf16_z0 | 0.28 | 0.01 |
