Loop Id: 2 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 0.01% |
---|
Loop Id: 2 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 0.01% |
---|
0x401bb0 VMOVSS (%RDX,%RDI,4),%XMM31 [2] |
0x401bb7 VMOVSS (%RSI,%RDI,4),%XMM18 [6] |
0x401bbe VXORPS %XMM24,%XMM24,%XMM24 |
0x401bc4 VMOVSS (%RCX,%RDI,4),%XMM26 [4] |
0x401bcb VSUBSS %XMM1,%XMM31,%XMM28 |
0x401bd1 VSUBSS %XMM7,%XMM18,%XMM29 |
0x401bd7 VSUBSS %XMM2,%XMM26,%XMM25 |
0x401bdd VMULSS %XMM28,%XMM28,%XMM0 |
0x401be3 VFMADD231SS %XMM29,%XMM29,%XMM0 |
0x401be9 VFMADD231SS %XMM25,%XMM25,%XMM0 |
0x401bef VCOMISS %XMM0,%XMM3 |
0x401bf3 JBE 401bfc |
0x401bf5 VMOVSS (%R8,%RDI,4),%XMM24 [8] |
0x401bfc VCOMISS %XMM6,%XMM0 |
0x401c00 JBE 401c62 |
0x401c02 VMOVAPS %XMM0,%XMM23 |
0x401c08 VADDSS %XMM0,%XMM4,%XMM5 |
0x401c0c VFMADD132SS %XMM8,%XMM9,%XMM23 |
0x401c12 VCVTSS2SD %XMM5,%XMM5,%XMM5 |
0x401c16 VFMADD132SS %XMM0,%XMM10,%XMM23 |
0x401c1c VFMADD132SS %XMM0,%XMM11,%XMM23 |
0x401c22 VFMADD132SS %XMM0,%XMM12,%XMM23 |
0x401c28 VFMADD132SS %XMM0,%XMM13,%XMM23 |
0x401c2e VSQRTSD %XMM5,%XMM5,%XMM0 |
0x401c32 VMULSD %XMM0,%XMM5,%XMM5 |
0x401c36 VCVTSS2SD %XMM23,%XMM23,%XMM20 |
0x401c3c VDIVSD %XMM5,%XMM14,%XMM0 |
0x401c40 VADDSD %XMM20,%XMM0,%XMM5 |
0x401c46 VCVTSD2SS %XMM5,%XMM5,%XMM0 |
0x401c4a VMULSS %XMM24,%XMM0,%XMM5 |
0x401c50 VFMADD231SS %XMM5,%XMM29,%XMM15 |
0x401c56 VFMADD231SS %XMM5,%XMM28,%XMM16 |
0x401c5c VFMADD231SS %XMM5,%XMM25,%XMM17 |
0x401c62 INC %RDI |
0x401c65 VMOVSS (%RDX,%RDI,4),%XMM23 [1] |
0x401c6c VMOVSS (%RSI,%RDI,4),%XMM25 [5] |
0x401c73 VXORPS %XMM19,%XMM19,%XMM19 |
0x401c79 VMOVSS (%RCX,%RDI,4),%XMM22 [3] |
0x401c80 VSUBSS %XMM1,%XMM23,%XMM20 |
0x401c86 VSUBSS %XMM7,%XMM25,%XMM24 |
0x401c8c VSUBSS %XMM2,%XMM22,%XMM30 |
0x401c92 VMULSS %XMM20,%XMM20,%XMM0 |
0x401c98 VFMADD231SS %XMM24,%XMM24,%XMM0 |
0x401c9e VFMADD231SS %XMM30,%XMM30,%XMM0 |
0x401ca4 VCOMISS %XMM0,%XMM3 |
0x401ca8 JBE 401cb1 |
0x401caa VMOVSS (%R8,%RDI,4),%XMM19 [7] |
0x401cb1 VCOMISS %XMM6,%XMM0 |
0x401cb5 JBE 401d17 |
0x401cb7 VADDSS %XMM0,%XMM4,%XMM5 |
0x401cbb VMOVAPS %XMM0,%XMM21 |
0x401cc1 VFMADD132SS %XMM8,%XMM9,%XMM21 |
0x401cc7 VCVTSS2SD %XMM5,%XMM5,%XMM5 |
0x401ccb VSQRTSD %XMM5,%XMM5,%XMM27 |
0x401cd1 VMULSD %XMM27,%XMM5,%XMM5 |
0x401cd7 VFMADD132SS %XMM0,%XMM10,%XMM21 |
0x401cdd VDIVSD %XMM5,%XMM14,%XMM5 |
0x401ce1 VFMADD132SS %XMM0,%XMM11,%XMM21 |
0x401ce7 VFMADD132SS %XMM0,%XMM12,%XMM21 |
0x401ced VFMADD132SS %XMM21,%XMM13,%XMM0 |
0x401cf3 VCVTSS2SD %XMM0,%XMM0,%XMM0 |
0x401cf7 VADDSD %XMM0,%XMM5,%XMM5 |
0x401cfb VCVTSD2SS %XMM5,%XMM5,%XMM0 |
0x401cff VMULSS %XMM19,%XMM0,%XMM5 |
0x401d05 VFMADD231SS %XMM5,%XMM24,%XMM15 |
0x401d0b VFMADD231SS %XMM5,%XMM20,%XMM16 |
0x401d11 VFMADD231SS %XMM5,%XMM30,%XMM17 |
0x401d17 INC %RDI |
0x401d1a CMP %EDI,%EAX |
0x401d1c JG 401bb0 |
/scratch_na/users/xoserete/qaas_runs/171-415-2042/intel/HACCmk/build/HACCmk/src/Step10_orig.c: 19 - 35 |
-------------------------------------------------------------------------------- |
19: for ( j = 0; j < count1; j++ ) |
20: { |
21: dxc = xx1[j] - xxi; |
22: dyc = yy1[j] - yyi; |
23: dzc = zz1[j] - zzi; |
24: |
25: r2 = dxc * dxc + dyc * dyc + dzc * dzc; |
26: |
27: m = ( r2 < fsrrmax2 ) ? mass1[j] : 0.0f; |
28: |
29: f = pow( r2 + mp_rsm2, -1.5 ) - ( ma0 + r2*(ma1 + r2*(ma2 + r2*(ma3 + r2*(ma4 + r2*ma5))))); |
30: |
31: f = ( r2 > 0.0f ) ? m * f : 0.0f; |
32: |
33: xi = xi + f * dxc; |
34: yi = yi + f * dyc; |
35: zi = zi + f * dzc; |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►63.33+ | main._omp_fn.1 | main.c:144 | exec |
○ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
►33.33+ | main._omp_fn.1 | main.c:144 | exec |
○ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
►3.33+ | main._omp_fn.1 | main.c:144 | exec |
○ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 2.59 |
CQA speedup if fully vectorized | 2.59 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.02 |
Bottlenecks | P0, |
Function | Step10_orig |
Source | Step10_orig.c:19-35 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 22.00 |
CQA cycles if no scalar integer | 22.00 |
CQA cycles if FP arith vectorized | 8.50 |
CQA cycles if fully vectorized | 8.50 |
Front-end cycles | 12.50 |
DIV/SQRT cycles | 22.00 |
P0 cycles | 21.50 |
P1 cycles | 2.67 |
P2 cycles | 2.67 |
P3 cycles | 0.00 |
P4 cycles | 12.50 |
P5 cycles | 5.00 |
P6 cycles | 0.00 |
P7 cycles | 0.00 |
P8 cycles | 0.00 |
P9 cycles | 0.00 |
P10 cycles | 2.67 |
P11 cycles | 17.00 |
Inter-iter dependencies cycles | NA |
FE+BE cycles (UFS) | 21.71 - 22.04 |
Stall cycles (UFS) | 8.52 - 8.85 |
Nb insns | 70.00 |
Nb uops | 75.00 |
Nb loads | 8.00 |
Nb stores | 0.00 |
Nb stack references | 0.00 |
FLOP/cycle | 2.73 |
Nb FLOP add-sub | 10.00 |
Nb FLOP mul | 6.00 |
Nb FLOP fma | 20.00 |
Nb FLOP div | 2.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 2.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 1.45 |
Bytes prefetched | 0.00 |
Bytes loaded | 32.00 |
Bytes stored | 0.00 |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | 6.45 |
Vectorization ratio load | 0.00 |
Vectorization ratio store | NA |
Vectorization ratio mul | 0.00 |
Vectorization ratio add_sub | 0.00 |
Vectorization ratio fma | 0.00 |
Vectorization ratio div_sqrt | 0.00 |
Vectorization ratio other | 28.57 |
Vector-efficiency ratio all | 8.47 |
Vector-efficiency ratio load | 6.25 |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | 8.33 |
Vector-efficiency ratio add_sub | 7.50 |
Vector-efficiency ratio fma | 6.25 |
Vector-efficiency ratio div_sqrt | 12.50 |
Vector-efficiency ratio other | 12.50 |
Metric | Value |
---|---|
CQA speedup if no scalar integer | 1.00 |
CQA speedup if FP arith vectorized | 2.59 |
CQA speedup if fully vectorized | 2.59 |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | 1.02 |
Bottlenecks | P0, |
Function | Step10_orig |
Source | Step10_orig.c:19-35 |
Source loop unroll info | not unrolled or unrolled with no peel/tail loop |
Source loop unroll confidence level | max |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | 22.00 |
CQA cycles if no scalar integer | 22.00 |
CQA cycles if FP arith vectorized | 8.50 |
CQA cycles if fully vectorized | 8.50 |
Front-end cycles | 12.50 |
DIV/SQRT cycles | 22.00 |
P0 cycles | 21.50 |
P1 cycles | 2.67 |
P2 cycles | 2.67 |
P3 cycles | 0.00 |
P4 cycles | 12.50 |
P5 cycles | 5.00 |
P6 cycles | 0.00 |
P7 cycles | 0.00 |
P8 cycles | 0.00 |
P9 cycles | 0.00 |
P10 cycles | 2.67 |
P11 cycles | 17.00 |
Inter-iter dependencies cycles | NA |
FE+BE cycles (UFS) | 21.71 - 22.04 |
Stall cycles (UFS) | 8.52 - 8.85 |
Nb insns | 70.00 |
Nb uops | 75.00 |
Nb loads | 8.00 |
Nb stores | 0.00 |
Nb stack references | 0.00 |
FLOP/cycle | 2.73 |
Nb FLOP add-sub | 10.00 |
Nb FLOP mul | 6.00 |
Nb FLOP fma | 20.00 |
Nb FLOP div | 2.00 |
Nb FLOP rcp | 0.00 |
Nb FLOP sqrt | 2.00 |
Nb FLOP rsqrt | 0.00 |
Bytes/cycle | 1.45 |
Bytes prefetched | 0.00 |
Bytes loaded | 32.00 |
Bytes stored | 0.00 |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | 6.45 |
Vectorization ratio load | 0.00 |
Vectorization ratio store | NA |
Vectorization ratio mul | 0.00 |
Vectorization ratio add_sub | 0.00 |
Vectorization ratio fma | 0.00 |
Vectorization ratio div_sqrt | 0.00 |
Vectorization ratio other | 28.57 |
Vector-efficiency ratio all | 8.47 |
Vector-efficiency ratio load | 6.25 |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | 8.33 |
Vector-efficiency ratio add_sub | 7.50 |
Vector-efficiency ratio fma | 6.25 |
Vector-efficiency ratio div_sqrt | 12.50 |
Vector-efficiency ratio other | 12.50 |
Path / |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |
nb instructions | 70 |
nb uops | 75 |
loop length | 370 |
used x86 registers | 6 |
used mmx registers | 0 |
used xmm registers | 32 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 1.67 |
micro-operation queue | 12.50 cycles |
front end | 12.50 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 22.00 | 21.50 | 2.67 | 2.67 | 0.00 | 12.50 | 5.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.67 |
cycles | 22.00 | 21.50 | 2.67 | 2.67 | 0.00 | 12.50 | 5.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.67 |
Cycles executing div or sqrt instructions | 17.00 |
FE+BE cycles | 21.71-22.04 |
Stall cycles | 8.52-8.85 |
PRF_FLOAT full (events) | 10.23-10.72 |
Front-end | 12.50 |
Dispatch | 22.00 |
DIV/SQRT | 17.00 |
Overall L1 | 22.00 |
all | 6% |
load | 0% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 0% |
add-sub | 0% |
fma | 0% |
div/sqrt | 0% |
other | 28% |
all | 8% |
load | 6% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 8% |
add-sub | 7% |
fma | 6% |
div/sqrt | 12% |
other | 12% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VMOVSS (%RDX,%RDI,4),%XMM31 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VMOVSS (%RSI,%RDI,4),%XMM18 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VXORPS %XMM24,%XMM24,%XMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VMOVSS (%RCX,%RDI,4),%XMM26 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VSUBSS %XMM1,%XMM31,%XMM28 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM7,%XMM18,%XMM29 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM2,%XMM26,%XMM25 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULSS %XMM28,%XMM28,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM29,%XMM29,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM25,%XMM25,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401bfc <Step10_orig+0x58c> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VMOVSS (%R8,%RDI,4),%XMM24 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401c62 <Step10_orig+0x5f2> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VMOVAPS %XMM0,%XMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.17 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VFMADD132SS %XMM8,%XMM9,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VFMADD132SS %XMM0,%XMM10,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM11,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM13,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSQRTSD %XMM5,%XMM5,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50 |
VMULSD %XMM0,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM23,%XMM23,%XMM20 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VDIVSD %XMM5,%XMM14,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-15 | 4 |
VADDSD %XMM20,%XMM0,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 1 |
VMULSS %XMM24,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM29,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM28,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM25,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
VMOVSS (%RDX,%RDI,4),%XMM23 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VMOVSS (%RSI,%RDI,4),%XMM25 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VXORPS %XMM19,%XMM19,%XMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VMOVSS (%RCX,%RDI,4),%XMM22 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VSUBSS %XMM1,%XMM23,%XMM20 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM7,%XMM25,%XMM24 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM2,%XMM22,%XMM30 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULSS %XMM20,%XMM20,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM24,%XMM24,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM30,%XMM30,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401cb1 <Step10_orig+0x641> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VMOVSS (%R8,%RDI,4),%XMM19 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401d17 <Step10_orig+0x6a7> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMOVAPS %XMM0,%XMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.17 |
VFMADD132SS %XMM8,%XMM9,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VSQRTSD %XMM5,%XMM5,%XMM27 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50 |
VMULSD %XMM27,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM10,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVSD %XMM5,%XMM14,%XMM5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-15 | 4 |
VFMADD132SS %XMM0,%XMM11,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM21,%XMM13,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM0,%XMM0,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VADDSD %XMM0,%XMM5,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 1 |
VMULSS %XMM19,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM24,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM20,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM30,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
CMP %EDI,%EAX | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
JG 401bb0 <Step10_orig+0x540> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |
nb instructions | 70 |
nb uops | 75 |
loop length | 370 |
used x86 registers | 6 |
used mmx registers | 0 |
used xmm registers | 32 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
ADD-SUB / MUL ratio | 1.67 |
micro-operation queue | 12.50 cycles |
front end | 12.50 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 22.00 | 21.50 | 2.67 | 2.67 | 0.00 | 12.50 | 5.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.67 |
cycles | 22.00 | 21.50 | 2.67 | 2.67 | 0.00 | 12.50 | 5.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.67 |
Cycles executing div or sqrt instructions | 17.00 |
FE+BE cycles | 21.71-22.04 |
Stall cycles | 8.52-8.85 |
PRF_FLOAT full (events) | 10.23-10.72 |
Front-end | 12.50 |
Dispatch | 22.00 |
DIV/SQRT | 17.00 |
Overall L1 | 22.00 |
all | 6% |
load | 0% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 0% |
add-sub | 0% |
fma | 0% |
div/sqrt | 0% |
other | 28% |
all | 8% |
load | 6% |
store | NA (no store vectorizable/vectorized instructions) |
mul | 8% |
add-sub | 7% |
fma | 6% |
div/sqrt | 12% |
other | 12% |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
VMOVSS (%RDX,%RDI,4),%XMM31 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VMOVSS (%RSI,%RDI,4),%XMM18 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VXORPS %XMM24,%XMM24,%XMM24 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VMOVSS (%RCX,%RDI,4),%XMM26 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VSUBSS %XMM1,%XMM31,%XMM28 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM7,%XMM18,%XMM29 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM2,%XMM26,%XMM25 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULSS %XMM28,%XMM28,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM29,%XMM29,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM25,%XMM25,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401bfc <Step10_orig+0x58c> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VMOVSS (%R8,%RDI,4),%XMM24 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401c62 <Step10_orig+0x5f2> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VMOVAPS %XMM0,%XMM23 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.17 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VFMADD132SS %XMM8,%XMM9,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VFMADD132SS %XMM0,%XMM10,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM11,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM13,%XMM23 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VSQRTSD %XMM5,%XMM5,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50 |
VMULSD %XMM0,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM23,%XMM23,%XMM20 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VDIVSD %XMM5,%XMM14,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-15 | 4 |
VADDSD %XMM20,%XMM0,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 1 |
VMULSS %XMM24,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM29,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM28,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM25,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
VMOVSS (%RDX,%RDI,4),%XMM23 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VMOVSS (%RSI,%RDI,4),%XMM25 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VXORPS %XMM19,%XMM19,%XMM19 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
VMOVSS (%RCX,%RDI,4),%XMM22 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VSUBSS %XMM1,%XMM23,%XMM20 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM7,%XMM25,%XMM24 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VSUBSS %XMM2,%XMM22,%XMM30 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMULSS %XMM20,%XMM20,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM24,%XMM24,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM30,%XMM30,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCOMISS %XMM0,%XMM3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401cb1 <Step10_orig+0x641> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VMOVSS (%R8,%RDI,4),%XMM19 | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 |
VCOMISS %XMM6,%XMM0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 1 |
JBE 401d17 <Step10_orig+0x6a7> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |
VADDSS %XMM0,%XMM4,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VMOVAPS %XMM0,%XMM21 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0-1 | 0.17 |
VFMADD132SS %XMM8,%XMM9,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM5,%XMM5,%XMM5 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VSQRTSD %XMM5,%XMM5,%XMM27 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-19 | 4.50 |
VMULSD %XMM27,%XMM5,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM10,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VDIVSD %XMM5,%XMM14,%XMM5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13-15 | 4 |
VFMADD132SS %XMM0,%XMM11,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM0,%XMM12,%XMM21 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD132SS %XMM21,%XMM13,%XMM0 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VCVTSS2SD %XMM0,%XMM0,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 4-5 | 1 |
VADDSD %XMM0,%XMM5,%XMM5 | 1 | 0 | 0.50 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0.50 |
VCVTSD2SS %XMM5,%XMM5,%XMM0 | 2 | 0.50 | 0.50 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 1 |
VMULSS %XMM19,%XMM0,%XMM5 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM24,%XMM15 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM20,%XMM16 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
VFMADD231SS %XMM5,%XMM30,%XMM17 | 1 | 0.50 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0.50 |
INC %RDI | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
CMP %EDI,%EAX | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
JG 401bb0 <Step10_orig+0x540> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 |