Loop Id: 165 | Module: exec | Source: advec_mom_kernel.f90:81-177 [...] | Coverage: 4.28% |
---|
Loop Id: 165 | Module: exec | Source: advec_mom_kernel.f90:81-177 [...] | Coverage: 4.28% |
---|
0x430220 VMOVUPD (%R15,%RDI,8),%YMM18 [2] |
0x430227 VCMPPD $0x1,%YMM16,%YMM18,%K1 |
0x43022e LEA (%RDI,%R13,1),%EAX |
0x430232 VPBROADCASTD %EAX,%XMM19 |
0x430238 VPADDD %XMM8,%XMM19,%XMM20 |
0x43023e VPADDD %XMM9,%XMM19,%XMM21 |
0x430244 VPADDD %XMM10,%XMM19,%XMM22 |
0x43024a VPBLENDMD %XMM22,%XMM20,%XMM23{%K1} |
0x430250 VMOVDQA32 %XMM20,%XMM22{%K1} |
0x430256 VPMOVSXDQ %XMM22,%YMM22 |
0x43025c VPSUBQ %YMM0,%YMM22,%YMM22 |
0x430262 KXNORW %K0,%K0,%K2 |
0x430266 VXORPD %XMM24,%XMM24,%XMM24 |
0x43026c VGATHERQPD (%RDX,%YMM22,8),%YMM24{%K2} [6] |
0x430273 KXNORW %K0,%K0,%K2 |
0x430277 VXORPD %XMM25,%XMM25,%XMM25 |
0x43027d VGATHERQPD (%RCX,%YMM22,8),%YMM25{%K2} [7] |
0x430284 VPBLENDMD %XMM20,%XMM21,%XMM20{%K1} |
0x43028a VPADDD %XMM11,%XMM19,%XMM21{%K1} |
0x430290 VANDPD %YMM12,%YMM18,%YMM19 |
0x430296 VPMOVSXDQ %XMM21,%YMM21 |
0x43029c VPSUBQ %YMM0,%YMM21,%YMM21 |
0x4302a2 KXNORW %K0,%K0,%K1 |
0x4302a6 VXORPD %XMM22,%XMM22,%XMM22 |
0x4302ac VGATHERQPD (%RCX,%YMM21,8),%YMM22{%K1} [5] |
0x4302b3 VDIVPD %YMM24,%YMM19,%YMM19 |
0x4302b9 VSUBPD %YMM22,%YMM25,%YMM21 |
0x4302bf VPMOVSXDQ %XMM23,%YMM22 |
0x4302c5 VPSUBQ %YMM0,%YMM22,%YMM22 |
0x4302cb KXNORW %K0,%K0,%K1 |
0x4302cf VPXORD %XMM23,%XMM23,%XMM23 |
0x4302d5 VGATHERQPD (%RCX,%YMM22,8),%YMM23{%K1} [8] |
0x4302dc VSUBPD %YMM25,%YMM23,%YMM22 |
0x4302e2 VMULPD %YMM21,%YMM22,%YMM23 |
0x4302e8 VCMPPD $0x1,%YMM23,%YMM16,%K1 |
0x4302ef VMOVUPD (%R10,%RDI,8),%YMM23{%K1}{z} [3] |
0x4302f6 VANDPD %YMM12,%YMM21,%YMM21 |
0x4302fc VANDPD %YMM12,%YMM22,%YMM24 |
0x430302 VSUBPD %YMM19,%YMM13,%YMM26 |
0x430308 VMULPD %YMM26,%YMM24,%YMM26 |
0x43030e VDIVPD %YMM23,%YMM26,%YMM26 |
0x430314 VPMOVSXDQ %XMM20,%YMM20 |
0x43031a VPSUBQ %YMM0,%YMM20,%YMM20 |
0x430320 KMOVQ %K1,%K2 |
0x430325 VXORPD %XMM27,%XMM27,%XMM27 |
0x43032b VGATHERQPD (%R8,%YMM20,8),%YMM27{%K2} [1] |
0x430332 VCMPPD $0x2,%YMM24,%YMM21,%K2 |
0x430339 VMOVAPD %YMM21,%YMM24{%K2} |
0x43033f VFMADD213PD %YMM21,%YMM19,%YMM21 |
0x430345 VDIVPD %YMM27,%YMM21,%YMM20 |
0x43034b VADDPD %YMM26,%YMM20,%YMM20 |
0x430351 VMULPD %YMM15,%YMM23,%YMM21 |
0x430357 VMULPD %YMM20,%YMM21,%YMM20 |
0x43035d VCMPPD $0x2,%YMM24,%YMM20,%K2 |
0x430364 VMOVAPD %YMM20,%YMM24{%K2} |
0x43036a VCMPPD $0x2,%YMM16,%YMM22,%K2 |
0x430371 VXORPD %YMM17,%YMM24,%YMM24{%K2} |
0x430377 VMOVAPD %YMM24,%YMM20{%K1}{z} |
0x43037d VSUBPD %YMM19,%YMM14,%YMM19 |
0x430383 VFMADD213PD %YMM25,%YMM20,%YMM19 |
0x430389 VMULPD %YMM18,%YMM19,%YMM18 |
0x43038f VMOVUPD %YMM18,(%R9,%RDI,8) [4] |
0x430396 ADD $0x4,%RDI |
0x43039a CMP %R14,%RDI |
0x43039d JL 430220 |
/beegfs/hackathon/users/eoseret/qaas_runs/170-861-0321/intel/CloverLeafFC/build/CloverLeafFC/CloverLeaf_ref/kernels/advec_mom_kernel.f90: 81 - 177 |
-------------------------------------------------------------------------------- |
81: IF(mom_sweep.EQ.1)THEN ! x 1 |
[...] |
152: IF(node_flux(j,k).LT.0.0)THEN |
[...] |
158: upwind=j-1 |
159: donor=j |
160: downwind=j+1 |
161: dif=upwind |
162: ENDIF |
163: sigma=ABS(node_flux(j,k))/(node_mass_pre(donor,k)) |
164: width=celldx(j) |
165: vdiffuw=vel1(donor,k)-vel1(upwind,k) |
166: vdiffdw=vel1(downwind,k)-vel1(donor,k) |
167: limiter=0.0 |
168: IF(vdiffuw*vdiffdw.GT.0.0)THEN |
169: auw=ABS(vdiffuw) |
170: adw=ABS(vdiffdw) |
171: wind=1.0_8 |
172: IF(vdiffdw.LE.0.0) wind=-1.0_8 |
173: limiter=wind*MIN(width*((2.0_8-sigma)*adw/width+(1.0_8+sigma)*auw/celldx(dif))/6.0_8,auw,adw) |
174: ENDIF |
175: advec_vel_s=vel1(donor,k)+(1.0-sigma)*limiter |
176: mom_flux(j,k)=advec_vel_s*node_flux(j,k) |
177: ENDDO |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.03 |
CQA speedup if FP arith vectorized | 1.46 |
CQA speedup if fully vectorized | 2.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.68 |
Bottlenecks | micro-operation queue, |
Function | advec_mom_kernel_.DIR.OMP.PARALLEL.2 |
Source | advec_mom_kernel.f90:81-81,advec_mom_kernel.f90:152-152,advec_mom_kernel.f90:158-177 |
Source loop unroll info | unrolled by 4 |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | main |
Unroll factor | 4 |
CQA cycles | 30.00 |
CQA cycles if no scalar integer | 29.00 |
CQA cycles if FP arith vectorized | 20.54 |
CQA cycles if fully vectorized | 15.00 |
Front-end cycles | 30.00 |
DIV/SQRT cycles | 0.75 |
P0 cycles | 0.75 |
P1 cycles | 0.50 |
P2 cycles | 0.50 |
P3 cycles | 0.50 |
P4 cycles | 1.00 |
P5 cycles | 1.00 |
P6 cycles | 1.00 |
P7 cycles | 17.75 |
P8 cycles | 17.75 |
P9 cycles | 17.83 |
P10 cycles | 17.67 |
P11 cycles | 15.50 |
P12 cycles | 15.50 |
P13 cycles | 15.00 |
Inter-iter dependencies cycles | 1 |
FE+BE cycles (UFS) | NA |
Stall cycles (UFS) | NA |
Nb insns | 65.00 |
Nb uops | 180.00 |
Nb loads | 7.00 |
Nb stores | 1.00 |
Nb stack references | 0.00 |
FLOP/cycle | 2.27 |
Nb FLOP add-sub | 20.00 |
Nb FLOP mul | 20.00 |
Nb FLOP fma | 8.00 |
Nb FLOP div | 12.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 8.53 |
Bytes prefetched | 0.00 |
Bytes loaded | 224.00 |
Bytes stored | 32.00 |
Stride 0 | 0.00 |
Stride 1 | 3.00 |
Stride n | 0.00 |
Stride unknown | 0.00 |
Stride indirect | 4.00 |
Vectorization ratio all | 98.21 |
Vectorization ratio load | 100.00 |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | 100.00 |
Vectorization ratio other | 96.67 |
Vector-efficiency ratio all | 42.08 |
Vector-efficiency ratio load | 50.00 |
Vector-efficiency ratio store | 50.00 |
Vector-efficiency ratio mul | 50.00 |
Vector-efficiency ratio add_sub | 42.31 |
Vector-efficiency ratio fma | 50.00 |
Vector-efficiency ratio div_sqrt | 50.00 |
Vector-efficiency ratio other | 38.54 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.03 |
CQA speedup if FP arith vectorized | 1.46 |
CQA speedup if fully vectorized | 2.00 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.68 |
Bottlenecks | micro-operation queue, |
Function | advec_mom_kernel_.DIR.OMP.PARALLEL.2 |
Source | advec_mom_kernel.f90:81-81,advec_mom_kernel.f90:152-152,advec_mom_kernel.f90:158-177 |
Source loop unroll info | unrolled by 4 |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | main |
Unroll factor | 4 |
CQA cycles | 30.00 |
CQA cycles if no scalar integer | 29.00 |
CQA cycles if FP arith vectorized | 20.54 |
CQA cycles if fully vectorized | 15.00 |
Front-end cycles | 30.00 |
DIV/SQRT cycles | 0.75 |
P0 cycles | 0.75 |
P1 cycles | 0.50 |
P2 cycles | 0.50 |
P3 cycles | 0.50 |
P4 cycles | 1.00 |
P5 cycles | 1.00 |
P6 cycles | 1.00 |
P7 cycles | 17.75 |
P8 cycles | 17.75 |
P9 cycles | 17.83 |
P10 cycles | 17.67 |
P11 cycles | 15.50 |
P12 cycles | 15.50 |
P13 cycles | 15.00 |
Inter-iter dependencies cycles | 1 |
FE+BE cycles (UFS) | NA |
Stall cycles (UFS) | NA |
Nb insns | 65.00 |
Nb uops | 180.00 |
Nb loads | 7.00 |
Nb stores | 1.00 |
Nb stack references | 0.00 |
FLOP/cycle | 2.27 |
Nb FLOP add-sub | 20.00 |
Nb FLOP mul | 20.00 |
Nb FLOP fma | 8.00 |
Nb FLOP div | 12.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 0.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 8.53 |
Bytes prefetched | 0.00 |
Bytes loaded | 224.00 |
Bytes stored | 32.00 |
Stride 0 | 0.00 |
Stride 1 | 3.00 |
Stride n | 0.00 |
Stride unknown | 0.00 |
Stride indirect | 4.00 |
Vectorization ratio all | 98.21 |
Vectorization ratio load | 100.00 |
Vectorization ratio store | 100.00 |
Vectorization ratio mul | 100.00 |
Vectorization ratio add_sub | 100.00 |
Vectorization ratio fma | 100.00 |
Vectorization ratio div_sqrt | 100.00 |
Vectorization ratio other | 96.67 |
Vector-efficiency ratio all | 42.08 |
Vector-efficiency ratio load | 50.00 |
Vector-efficiency ratio store | 50.00 |
Vector-efficiency ratio mul | 50.00 |
Vector-efficiency ratio add_sub | 42.31 |
Vector-efficiency ratio fma | 50.00 |
Vector-efficiency ratio div_sqrt | 50.00 |
Vector-efficiency ratio other | 38.54 |
Path / |
Function | advec_mom_kernel_.DIR.OMP.PARALLEL.2 |
Source file and lines | advec_mom_kernel.f90:81-177 |
Module | exec |
nb instructions | 65 |
nb uops | 180 |
loop length | 387 |
used x86 registers | 10 |
used mmx registers | 0 |
used xmm registers | 12 |
used ymm registers | 17 |
used zmm registers | 0 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 1.00 |
micro-operation queue | 30.00 cycles |
front end | 30.00 cycles |
ALU0/BRU0 | ALU1 | ALU2 | ALU3 | BRU1 | AGU0 | AGU1 | AGU2 | FP0 | FP1 | FP2 | FP3 | FP4 | FP5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 0.75 | 0.75 | 0.50 | 0.50 | 0.50 | 1.00 | 1.00 | 1.00 | 17.75 | 17.75 | 17.83 | 17.67 | 15.50 | 15.50 |
cycles | 0.75 | 0.75 | 0.50 | 0.50 | 0.50 | 1.00 | 1.00 | 1.00 | 17.75 | 17.75 | 17.83 | 17.67 | 15.50 | 15.50 |
Cycles executing div or sqrt instructions | 15.00 |
Longest recurrence chain latency (RecMII) | 1.00 |
Front-end | 30.00 |
Dispatch | 17.83 |
DIV/SQRT | 15.00 |
Data deps. | 1.00 |
Overall L1 | 30.00 |
all | 94% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 88% |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 98% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 96% |
all | 29% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 37% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 22% |
all | 47% |
load | 50% |
store | 50% |
mul | 50% |
add-sub | 50% |
fma | 50% |
div/sqrt | 50% |
other | 45% |
all | 42% |
load | 50% |
store | 50% |
mul | 50% |
add-sub | 42% |
fma | 50% |
div/sqrt | 50% |
other | 38% |
Instruction | Nb FU | ALU0/BRU0 | ALU1 | ALU2 | ALU3 | BRU1 | AGU0 | AGU1 | AGU2 | FP0 | FP1 | FP2 | FP3 | FP4 | FP5 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VMOVUPD (%R15,%RDI,8),%YMM18 | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCMPPD $0x1,%YMM16,%YMM18,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
LEA (%RDI,%R13,1),%EAX | 1 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
VPBROADCASTD %EAX,%XMM19 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
VPADDD %XMM8,%XMM19,%XMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPADDD %XMM9,%XMM19,%XMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPADDD %XMM10,%XMM19,%XMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPBLENDMD %XMM22,%XMM20,%XMM23{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VMOVDQA32 %XMM20,%XMM22{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VPMOVSXDQ %XMM22,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM22,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KXNORW %K0,%K0,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VXORPD %XMM24,%XMM24,%XMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%RDX,%YMM22,8),%YMM24{%K2} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
KXNORW %K0,%K0,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VXORPD %XMM25,%XMM25,%XMM25 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%RCX,%YMM22,8),%YMM25{%K2} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VPBLENDMD %XMM20,%XMM21,%XMM20{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPADDD %XMM11,%XMM19,%XMM21{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VANDPD %YMM12,%YMM18,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPMOVSXDQ %XMM21,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM21,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KXNORW %K0,%K0,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VXORPD %XMM22,%XMM22,%XMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%RCX,%YMM21,8),%YMM22{%K1} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VDIVPD %YMM24,%YMM19,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 13 | 5 |
VSUBPD %YMM22,%YMM25,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VPMOVSXDQ %XMM23,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM22,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KXNORW %K0,%K0,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VPXORD %XMM23,%XMM23,%XMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VGATHERQPD (%RCX,%YMM22,8),%YMM23{%K1} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VSUBPD %YMM25,%YMM23,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM21,%YMM22,%YMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCMPPD $0x1,%YMM23,%YMM16,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVUPD (%R10,%RDI,8),%YMM23{%K1}{z} | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VANDPD %YMM12,%YMM21,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VANDPD %YMM12,%YMM22,%YMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VSUBPD %YMM19,%YMM13,%YMM26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM26,%YMM24,%YMM26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VDIVPD %YMM23,%YMM26,%YMM26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 13 | 5 |
VPMOVSXDQ %XMM20,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM20,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KMOVQ %K1,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
VXORPD %XMM27,%XMM27,%XMM27 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%R8,%YMM20,8),%YMM27{%K2} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VCMPPD $0x2,%YMM24,%YMM21,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVAPD %YMM21,%YMM24{%K2} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VFMADD213PD %YMM21,%YMM19,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVPD %YMM27,%YMM21,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 13 | 5 |
VADDPD %YMM26,%YMM20,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM15,%YMM23,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM20,%YMM21,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCMPPD $0x2,%YMM24,%YMM20,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVAPD %YMM20,%YMM24{%K2} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VCMPPD $0x2,%YMM16,%YMM22,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VXORPD %YMM17,%YMM24,%YMM24{%K2} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VMOVAPD %YMM24,%YMM20{%K1}{z} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VSUBPD %YMM19,%YMM14,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VFMADD213PD %YMM25,%YMM20,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD %YMM18,%YMM19,%YMM18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMOVUPD %YMM18,(%R9,%RDI,8) | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 4 | 1 |
ADD $0x4,%RDI | 1 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
CMP %R14,%RDI | 1 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
JL 430220 <advec_mom_kernel_mod_mp_advec_mom_kernel_.DIR.OMP.PARALLEL.2+0x3340> | 1 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50-1 |
Function | advec_mom_kernel_.DIR.OMP.PARALLEL.2 |
Source file and lines | advec_mom_kernel.f90:81-177 |
Module | exec |
nb instructions | 65 |
nb uops | 180 |
loop length | 387 |
used x86 registers | 10 |
used mmx registers | 0 |
used xmm registers | 12 |
used ymm registers | 17 |
used zmm registers | 0 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 1.00 |
micro-operation queue | 30.00 cycles |
front end | 30.00 cycles |
ALU0/BRU0 | ALU1 | ALU2 | ALU3 | BRU1 | AGU0 | AGU1 | AGU2 | FP0 | FP1 | FP2 | FP3 | FP4 | FP5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 0.75 | 0.75 | 0.50 | 0.50 | 0.50 | 1.00 | 1.00 | 1.00 | 17.75 | 17.75 | 17.83 | 17.67 | 15.50 | 15.50 |
cycles | 0.75 | 0.75 | 0.50 | 0.50 | 0.50 | 1.00 | 1.00 | 1.00 | 17.75 | 17.75 | 17.83 | 17.67 | 15.50 | 15.50 |
Cycles executing div or sqrt instructions | 15.00 |
Longest recurrence chain latency (RecMII) | 1.00 |
Front-end | 30.00 |
Dispatch | 17.83 |
DIV/SQRT | 15.00 |
Data deps. | 1.00 |
Overall L1 | 30.00 |
all | 94% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 100% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 88% |
all | 100% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 100% |
all | 98% |
load | 100% |
store | 100% |
mul | 100% |
add-sub | 100% |
fma | 100% |
div/sqrt | 100% |
other | 96% |
all | 29% |
load | NA (no load vectorizable/vectorized instructions) |
store | NA (no store vectorizable/vectorized instructions) |
mul | NA (no mul vectorizable/vectorized instructions) |
add-sub | 37% |
fma | NA (no fma vectorizable/vectorized instructions) |
other | 22% |
all | 47% |
load | 50% |
store | 50% |
mul | 50% |
add-sub | 50% |
fma | 50% |
div/sqrt | 50% |
other | 45% |
all | 42% |
load | 50% |
store | 50% |
mul | 50% |
add-sub | 42% |
fma | 50% |
div/sqrt | 50% |
other | 38% |
Instruction | Nb FU | ALU0/BRU0 | ALU1 | ALU2 | ALU3 | BRU1 | AGU0 | AGU1 | AGU2 | FP0 | FP1 | FP2 | FP3 | FP4 | FP5 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VMOVUPD (%R15,%RDI,8),%YMM18 | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCMPPD $0x1,%YMM16,%YMM18,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
LEA (%RDI,%R13,1),%EAX | 1 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
VPBROADCASTD %EAX,%XMM19 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 |
VPADDD %XMM8,%XMM19,%XMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPADDD %XMM9,%XMM19,%XMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPADDD %XMM10,%XMM19,%XMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPBLENDMD %XMM22,%XMM20,%XMM23{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VMOVDQA32 %XMM20,%XMM22{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VPMOVSXDQ %XMM22,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM22,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KXNORW %K0,%K0,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VXORPD %XMM24,%XMM24,%XMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%RDX,%YMM22,8),%YMM24{%K2} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
KXNORW %K0,%K0,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VXORPD %XMM25,%XMM25,%XMM25 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%RCX,%YMM22,8),%YMM25{%K2} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VPBLENDMD %XMM20,%XMM21,%XMM20{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPADDD %XMM11,%XMM19,%XMM21{%K1} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VANDPD %YMM12,%YMM18,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VPMOVSXDQ %XMM21,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM21,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KXNORW %K0,%K0,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VXORPD %XMM22,%XMM22,%XMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%RCX,%YMM21,8),%YMM22{%K1} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VDIVPD %YMM24,%YMM19,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 13 | 5 |
VSUBPD %YMM22,%YMM25,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VPMOVSXDQ %XMM23,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM22,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KXNORW %K0,%K0,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 1 | 0.50 |
VPXORD %XMM23,%XMM23,%XMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VGATHERQPD (%RCX,%YMM22,8),%YMM23{%K1} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VSUBPD %YMM25,%YMM23,%YMM22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM21,%YMM22,%YMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCMPPD $0x1,%YMM23,%YMM16,%K1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVUPD (%R10,%RDI,8),%YMM23{%K1}{z} | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VANDPD %YMM12,%YMM21,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VANDPD %YMM12,%YMM22,%YMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VSUBPD %YMM19,%YMM13,%YMM26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM26,%YMM24,%YMM26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VDIVPD %YMM23,%YMM26,%YMM26 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 13 | 5 |
VPMOVSXDQ %XMM20,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 2 | 0.50 |
VPSUBQ %YMM0,%YMM20,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
KMOVQ %K1,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
VXORPD %XMM27,%XMM27,%XMM27 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VGATHERQPD (%R8,%YMM20,8),%YMM27{%K2} | 24 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.75 | 1.42 | 1.42 | 1.42 | 3 | 3 | 0-16 | 4 |
VCMPPD $0x2,%YMM24,%YMM21,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVAPD %YMM21,%YMM24{%K2} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VFMADD213PD %YMM21,%YMM19,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVPD %YMM27,%YMM21,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 13 | 5 |
VADDPD %YMM26,%YMM20,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM15,%YMM23,%YMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULPD %YMM20,%YMM21,%YMM20 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCMPPD $0x2,%YMM24,%YMM20,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMOVAPD %YMM20,%YMM24{%K2} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VCMPPD $0x2,%YMM16,%YMM22,%K2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VXORPD %YMM17,%YMM24,%YMM24{%K2} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 1 | 0.25 |
VMOVAPD %YMM24,%YMM20{%K1}{z} | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VSUBPD %YMM19,%YMM14,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 3 | 0.50 |
VFMADD213PD %YMM25,%YMM20,%YMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VMULPD %YMM18,%YMM19,%YMM18 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMOVUPD %YMM18,(%R9,%RDI,8) | 1 | 0 | 0 | 0 | 0 | 0 | 0.33 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0.50 | 0.50 | 4 | 1 |
ADD $0x4,%RDI | 1 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
CMP %R14,%RDI | 1 | 0.25 | 0.25 | 0.25 | 0.25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.25 |
JL 430220 <advec_mom_kernel_mod_mp_advec_mom_kernel_.DIR.OMP.PARALLEL.2+0x3340> | 1 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.50-1 |