Function: __svml_i64rem8_z0 | Module: exec | Source: :0-0 | Coverage: 19.85% |
---|
Function: __svml_i64rem8_z0 | Module: exec | Source: :0-0 | Coverage: 19.85% |
---|
*** This Panel is Intentionally Left Blank. *** It is due to a lack of debug symbols in the given object |
0x460220 ENDBR64 |
0x460224 VMOVDQU64 0x24ed2(%RIP),%ZMM2 |
0x46022e VPCMPGTQ %ZMM1,%ZMM2,%K1 |
0x460234 VPSUBQ %ZMM1,%ZMM2,%ZMM1{%K1} |
0x46023a VMOVDQU64 0x24ffc(%RIP),%ZMM5 |
0x460244 VMOVUPD 0x24f32(%RIP),%ZMM3 |
0x46024e VMOVUPD 0x25068(%RIP),%ZMM6 |
0x460258 VMOVUPD 0x24f5e(%RIP),%ZMM4 |
0x460262 VPCMPEQQ %ZMM2,%ZMM1,%K0 |
0x460268 KORTESTB %K0,%K0 |
0x46026c JE 460275 |
0x46026e MOV $0,%EAX |
0x460273 DIV %AL |
0x460275 VPANDQ %ZMM5,%ZMM1,%ZMM7 |
0x46027b VCVTUQQ2PD {rn-sae},%ZMM7,%ZMM7 |
0x460281 VPANDNQ %ZMM1,%ZMM5,%ZMM8 |
0x460287 VCVTUQQ2PD {rn-sae},%ZMM8,%ZMM8 |
0x46028d VCVTUQQ2PD {rn-sae},%ZMM1,%ZMM9 |
0x460293 VANDPD %ZMM9,%ZMM6,%ZMM6 |
0x460299 VSUBPD {rn-sae},%ZMM6,%ZMM8,%ZMM8 |
0x46029f VADDPD {rn-sae},%ZMM8,%ZMM7,%ZMM7 |
0x4602a5 VPCMPGTQ %ZMM0,%ZMM2,%K1 |
0x4602ab VPSUBQ %ZMM0,%ZMM2,%ZMM0{%K1} |
0x4602b1 VCVTUQQ2PD {rn-sae},%ZMM0,%ZMM8 |
0x4602b7 VRCP14PD %ZMM9,%ZMM10 |
0x4602bd VFNMADD231PD {rn-sae},%ZMM9,%ZMM10,%ZMM3 |
0x4602c3 VFMADD132PD {rn-sae},%ZMM10,%ZMM10,%ZMM3 |
0x4602c9 VMULPD {rn-sae},%ZMM3,%ZMM8,%ZMM8 |
0x4602cf VRNDSCALEPD $0x3,{sae},%ZMM8,%ZMM8 |
0x4602d6 VANDPD %ZMM8,%ZMM4,%ZMM8 |
0x4602dc VPANDNQ %ZMM0,%ZMM5,%ZMM9 |
0x4602e2 VCVTUQQ2PD {rn-sae},%ZMM9,%ZMM9 |
0x4602e8 VFNMADD231PD {rn-sae},%ZMM8,%ZMM6,%ZMM9 |
0x4602ee VPANDQ %ZMM5,%ZMM0,%ZMM5 |
0x4602f4 VCVTUQQ2PD {rn-sae},%ZMM5,%ZMM5 |
0x4602fa VFNMADD231PD {rn-sae},%ZMM8,%ZMM7,%ZMM5 |
0x460300 VADDPD {rn-sae},%ZMM5,%ZMM9,%ZMM5 |
0x460306 VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM9 |
0x46030c VRNDSCALEPD $0x3,{sae},%ZMM9,%ZMM9 |
0x460313 VANDPD %ZMM9,%ZMM4,%ZMM9 |
0x460319 VFNMADD231PD {rn-sae},%ZMM9,%ZMM6,%ZMM5 |
0x46031f VFNMADD231PD {rn-sae},%ZMM9,%ZMM7,%ZMM5 |
0x460325 VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM10 |
0x46032b VRNDSCALEPD $0x3,{sae},%ZMM10,%ZMM10 |
0x460332 VANDPD %ZMM10,%ZMM4,%ZMM4 |
0x460338 VFNMADD231PD {rn-sae},%ZMM6,%ZMM4,%ZMM5 |
0x46033e VFNMADD231PD {rn-sae},%ZMM7,%ZMM4,%ZMM5 |
0x460344 VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM3 |
0x46034a VCVTPD2UQQ {rz-sae},%ZMM4,%ZMM4 |
0x460350 VCVTPD2UQQ {rz-sae},%ZMM9,%ZMM5 |
0x460356 VCVTPD2UQQ {rz-sae},%ZMM8,%ZMM6 |
0x46035c VPADDQ %ZMM5,%ZMM6,%ZMM5 |
0x460362 VCVTPD2UQQ {rz-sae},%ZMM3,%ZMM3 |
0x460368 VPADDD %ZMM4,%ZMM3,%ZMM3 |
0x46036e VPADDQ %ZMM3,%ZMM5,%ZMM3 |
0x460374 VPMULLQ %ZMM1,%ZMM3,%ZMM3 |
0x46037a VPSUBQ %ZMM3,%ZMM0,%ZMM0 |
0x460380 VPCMPNLTUQ %ZMM1,%ZMM0,%K2 |
0x460387 VPSUBQ %ZMM1,%ZMM0,%ZMM0{%K2} |
0x46038d VPSUBQ %ZMM0,%ZMM2,%ZMM0{%K1} |
0x460393 RET |
0x460394 NOPW %CS:(%RAX,%RAX,1) |
0x46039e XCHG %AX,%AX |
Path / |
Source file and lines | |
Module | exec |
nb instructions | 60 |
nb uops | 68.50 |
loop length | 368.50 |
used x86 registers | 0.50 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 11 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 0.75 |
micro-operation queue | 11.42 cycles |
front end | 11.42 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 28.50 | 1.50 | 2.00 | 2.00 | 0.00 | 28.50 | 2.00 | 0.00 | 0.00 | 0.00 | 0.50 | 2.00 |
cycles | 28.50 | 7.50 | 2.00 | 2.00 | 0.00 | 28.50 | 2.00 | 0.00 | 0.00 | 0.00 | 0.50 | 2.00 |
Cycles executing div or sqrt instructions | 3.00 |
FE+BE cycles | 149.63-149.73 |
Stall cycles | 138.06-138.16 |
RS full (events) | 149.11-149.20 |
Front-end | 11.42 |
Dispatch | 28.50 |
DIV/SQRT | 3.00 |
Overall L1 | 28.50 |
all | 98% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 96% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 99% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 98% |
all | 98% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 96% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 99% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 98% |
Source file and lines | |
Module | exec |
nb instructions | 61 |
nb uops | 71 |
loop length | 372 |
used x86 registers | 1 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 11 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 0.75 |
micro-operation queue | 11.83 cycles |
front end | 11.83 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 28.50 | 3.00 | 2.00 | 2.00 | 0.00 | 28.50 | 2.00 | 0.00 | 0.00 | 0.00 | 1.00 | 2.00 |
cycles | 28.50 | 7.50 | 2.00 | 2.00 | 0.00 | 28.50 | 2.00 | 0.00 | 0.00 | 0.00 | 1.00 | 2.00 |
Cycles executing div or sqrt instructions | 6.00 |
FE+BE cycles | 149.67-149.86 |
Stall cycles | 137.75-137.94 |
RS full (events) | 149.13-149.32 |
Front-end | 11.83 |
Dispatch | 28.50 |
DIV/SQRT | 6.00 |
Overall L1 | 28.50 |
all | 96% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 93% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 98% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 96% |
all | 96% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 93% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 98% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 96% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ENDBR64 | |||||||||||||||
VMOVDQU64 0x24ed2(%RIP),%ZMM2 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VPCMPGTQ %ZMM1,%ZMM2,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0-3 | 1 |
VPSUBQ %ZMM1,%ZMM2,%ZMM1{%K1} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VMOVDQU64 0x24ffc(%RIP),%ZMM5 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VMOVUPD 0x24f32(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VMOVUPD 0x25068(%RIP),%ZMM6 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VMOVUPD 0x24f5e(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VPCMPEQQ %ZMM2,%ZMM1,%K0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
KORTESTB %K0,%K0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
JE 460275 <__svml_i64rem8_z0+0x55> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
MOV $0,%EAX | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
DIV %AL | 4 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11-16 | 6 |
VPANDQ %ZMM5,%ZMM1,%ZMM7 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM7,%ZMM7 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPANDNQ %ZMM1,%ZMM5,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM1,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VANDPD %ZMM9,%ZMM6,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VSUBPD {rn-sae},%ZMM6,%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VADDPD {rn-sae},%ZMM8,%ZMM7,%ZMM7 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VPCMPGTQ %ZMM0,%ZMM2,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0-3 | 1 |
VPSUBQ %ZMM0,%ZMM2,%ZMM0{%K1} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM0,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRCP14PD %ZMM9,%ZMM10 | 3 | 2.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 2 |
VFNMADD231PD {rn-sae},%ZMM9,%ZMM10,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132PD {rn-sae},%ZMM10,%ZMM10,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRNDSCALEPD $0x3,{sae},%ZMM8,%ZMM8 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 |
VANDPD %ZMM8,%ZMM4,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VPANDNQ %ZMM0,%ZMM5,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM9,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM8,%ZMM6,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPANDQ %ZMM5,%ZMM0,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM5,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM8,%ZMM7,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VADDPD {rn-sae},%ZMM5,%ZMM9,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRNDSCALEPD $0x3,{sae},%ZMM9,%ZMM9 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 |
VANDPD %ZMM9,%ZMM4,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM9,%ZMM6,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM9,%ZMM7,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM10 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRNDSCALEPD $0x3,{sae},%ZMM10,%ZMM10 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 |
VANDPD %ZMM10,%ZMM4,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM6,%ZMM4,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM7,%ZMM4,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM4,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM9,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM8,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPADDQ %ZMM5,%ZMM6,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM3,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPADDD %ZMM4,%ZMM3,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VPADDQ %ZMM3,%ZMM5,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VPMULLQ %ZMM1,%ZMM3,%ZMM3 | 5 | 1.50 | 0 | 0 | 0 | 0 | 1.50 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 1.50 |
VPSUBQ %ZMM3,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VPCMPNLTUQ %ZMM1,%ZMM0,%K2 | |||||||||||||||
VPSUBQ %ZMM1,%ZMM0,%ZMM0{%K2} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VPSUBQ %ZMM0,%ZMM2,%ZMM0{%K1} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 |
Source file and lines | |
Module | exec |
nb instructions | 59 |
nb uops | 66 |
loop length | 365 |
used x86 registers | 0 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 11 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 0.75 |
micro-operation queue | 11.00 cycles |
front end | 11.00 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 28.50 | 0.00 | 2.00 | 2.00 | 0.00 | 28.50 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 |
cycles | 28.50 | 7.50 | 2.00 | 2.00 | 0.00 | 28.50 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.00 |
Cycles executing div or sqrt instructions | NA |
FE+BE cycles | 149.60 |
Stall cycles | 138.38 |
RS full (events) | 149.09 |
Front-end | 11.00 |
Dispatch | 28.50 |
Overall L1 | 28.50 |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 100% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 100% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 100% |
load | 100% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ENDBR64 | |||||||||||||||
VMOVDQU64 0x24ed2(%RIP),%ZMM2 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VPCMPGTQ %ZMM1,%ZMM2,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0-3 | 1 |
VPSUBQ %ZMM1,%ZMM2,%ZMM1{%K1} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VMOVDQU64 0x24ffc(%RIP),%ZMM5 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VMOVUPD 0x24f32(%RIP),%ZMM3 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VMOVUPD 0x25068(%RIP),%ZMM6 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VMOVUPD 0x24f5e(%RIP),%ZMM4 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0-1 | 0.50 |
VPCMPEQQ %ZMM2,%ZMM1,%K0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
KORTESTB %K0,%K0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
JE 460275 <__svml_i64rem8_z0+0x55> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VPANDQ %ZMM5,%ZMM1,%ZMM7 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM7,%ZMM7 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPANDNQ %ZMM1,%ZMM5,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM1,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VANDPD %ZMM9,%ZMM6,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VSUBPD {rn-sae},%ZMM6,%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VADDPD {rn-sae},%ZMM8,%ZMM7,%ZMM7 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VPCMPGTQ %ZMM0,%ZMM2,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0-3 | 1 |
VPSUBQ %ZMM0,%ZMM2,%ZMM0{%K1} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM0,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRCP14PD %ZMM9,%ZMM10 | 3 | 2.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 2 |
VFNMADD231PD {rn-sae},%ZMM9,%ZMM10,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132PD {rn-sae},%ZMM10,%ZMM10,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM8,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRNDSCALEPD $0x3,{sae},%ZMM8,%ZMM8 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 |
VANDPD %ZMM8,%ZMM4,%ZMM8 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VPANDNQ %ZMM0,%ZMM5,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM9,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM8,%ZMM6,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPANDQ %ZMM5,%ZMM0,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTUQQ2PD {rn-sae},%ZMM5,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM8,%ZMM7,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VADDPD {rn-sae},%ZMM5,%ZMM9,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRNDSCALEPD $0x3,{sae},%ZMM9,%ZMM9 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 |
VANDPD %ZMM9,%ZMM4,%ZMM9 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM9,%ZMM6,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM9,%ZMM7,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM10 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VRNDSCALEPD $0x3,{sae},%ZMM10,%ZMM10 | 2 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 1 |
VANDPD %ZMM10,%ZMM4,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM6,%ZMM4,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFNMADD231PD {rn-sae},%ZMM7,%ZMM4,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD {rn-sae},%ZMM3,%ZMM5,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM4,%ZMM4 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM9,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM8,%ZMM6 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPADDQ %ZMM5,%ZMM6,%ZMM5 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VCVTPD2UQQ {rz-sae},%ZMM3,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VPADDD %ZMM4,%ZMM3,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VPADDQ %ZMM3,%ZMM5,%ZMM3 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50 |
VPMULLQ %ZMM1,%ZMM3,%ZMM3 | 5 | 1.50 | 0 | 0 | 0 | 0 | 1.50 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 1.50 |
VPSUBQ %ZMM3,%ZMM0,%ZMM0 | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VPCMPNLTUQ %ZMM1,%ZMM0,%K2 | |||||||||||||||
VPSUBQ %ZMM1,%ZMM0,%ZMM0{%K2} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
VPSUBQ %ZMM0,%ZMM2,%ZMM0{%K1} | 1 | 0.50 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.50 |
RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 |
Name | Coverage (%) | Time (s) |
---|---|---|
○__svml_i64rem8_z0 | 19.85 | 13.6 |