Loop Id: 5 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 91.38% |
---|
Loop Id: 5 | Module: exec | Source: Step10_orig.c:19-35 | Coverage: 91.38% |
---|
0x401ef0 VMOVUPS 0x60b1f0(,%R10,4),%YMM25 [1] |
0x401efb VSUBPS %YMM22,%YMM25,%YMM25 |
0x401f01 VMOVUPS 0x66cc70(,%R10,4),%YMM26 [1] |
0x401f0c VSUBPS %YMM23,%YMM26,%YMM26 |
0x401f12 VMOVUPS 0x6ce6f0(,%R10,4),%YMM27 [1] |
0x401f1d VSUBPS %YMM24,%YMM27,%YMM27 |
0x401f23 VMULPS %YMM25,%YMM25,%YMM28 |
0x401f29 VFMADD231PS %YMM26,%YMM26,%YMM28 |
0x401f2f VFMADD231PS %YMM27,%YMM27,%YMM28 |
0x401f35 VCMPPS $0x1,%YMM2,%YMM28,%K1 |
0x401f3c VMOVUPS 0x730170(,%R10,4),%YMM29{%K1}{z} [1] |
0x401f47 VADDPS %YMM4,%YMM28,%YMM30 |
0x401f4d VEXTRACTF32X4 $0x1,%YMM30,%XMM31 |
0x401f54 VCVTPS2PD %XMM31,%YMM31 |
0x401f5a VCVTPS2PD %XMM30,%YMM30 |
0x401f60 VSQRTPD %YMM30,%YMM0 |
0x401f66 VSQRTPD %YMM31,%YMM1 |
0x401f6c VMULPD %YMM30,%YMM30,%YMM30 |
0x401f72 VDIVPD %YMM30,%YMM5,%YMM30 |
0x401f78 VMULPD %YMM31,%YMM31,%YMM31 |
0x401f7e VMOVAPS %YMM13,%YMM6 |
0x401f82 VFMADD213PS %YMM7,%YMM28,%YMM6 |
0x401f88 VFMADD213PS %YMM8,%YMM28,%YMM6 |
0x401f8e VFMADD213PS %YMM9,%YMM28,%YMM6 |
0x401f94 VFMADD213PS %YMM10,%YMM28,%YMM6 |
0x401f9a VFMADD213PS %YMM11,%YMM28,%YMM6 |
0x401fa0 VCVTPS2PD %XMM6,%YMM14 |
0x401fa4 VDIVPD %YMM31,%YMM5,%YMM31 |
0x401faa VFMADD231PD %YMM30,%YMM0,%YMM14 |
0x401fb0 VEXTRACTF128 $0x1,%YMM6,%XMM0 |
0x401fb6 VCVTPS2PD %XMM0,%YMM0 |
0x401fba VFMADD231PD %YMM31,%YMM1,%YMM0 |
0x401fc0 VCVTPD2PS %YMM14,%XMM1 |
0x401fc5 VCVTPD2PS %YMM0,%XMM0 |
0x401fc9 VINSERTF128 $0x1,%XMM0,%YMM1,%YMM0 |
0x401fcf VCMPPS $0x1,%YMM28,%YMM3,%K1 |
0x401fd6 VMULPS %YMM0,%YMM29,%YMM0{%K1}{z} |
0x401fdc VFMADD231PS %YMM25,%YMM0,%YMM17 |
0x401fe2 VFMADD231PS %YMM26,%YMM0,%YMM18 |
0x401fe8 VFMADD231PS %YMM27,%YMM0,%YMM16 |
0x401fee ADD $0x8,%R10 |
0x401ff2 CMP %R9,%R10 |
0x401ff5 JB 401ef0 |
/scratch_na/users/xoserete/qaas_runs/171-319-3146/intel/HACCmk/build/HACCmk/src/Step10_orig.c: 19 - 35 |
-------------------------------------------------------------------------------- |
19: for ( j = 0; j < count1; j++ ) |
20: { |
21: dxc = xx1[j] - xxi; |
22: dyc = yy1[j] - yyi; |
23: dzc = zz1[j] - zzi; |
24: |
25: r2 = dxc * dxc + dyc * dyc + dzc * dzc; |
26: |
27: m = ( r2 < fsrrmax2 ) ? mass1[j] : 0.0f; |
28: |
29: f = pow( r2 + mp_rsm2, -1.5 ) - ( ma0 + r2*(ma1 + r2*(ma2 + r2*(ma3 + r2*(ma4 + r2*ma5))))); |
30: |
31: f = ( r2 > 0.0f ) ? m * f : 0.0f; |
32: |
33: xi = xi + f * dxc; |
34: yi = yi + f * dyc; |
35: zi = zi + f * dzc; |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | NA |
CQA speedup if FP arith vectorized | NA |
CQA speedup if fully vectorized | NA |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | NA |
Bottlenecks | NA |
Function | main.extracted.8 |
Source | Step10_orig.c:19-35 |
Source loop unroll info | NA |
Source loop unroll confidence level | NA |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | NA |
CQA cycles if no scalar integer | NA |
CQA cycles if FP arith vectorized | NA |
CQA cycles if fully vectorized | NA |
Front-end cycles | NA |
DIV/SQRT cycles | NA |
P0 cycles | NA |
P1 cycles | NA |
P2 cycles | NA |
P3 cycles | NA |
P4 cycles | NA |
P5 cycles | NA |
P6 cycles | NA |
P7 cycles | NA |
P8 cycles | NA |
P9 cycles | NA |
P10 cycles | NA |
P11 cycles | NA |
Inter-iter dependencies cycles | NA |
FE+BE cycles (UFS) | NA |
Stall cycles (UFS) | NA |
Nb insns | NA |
Nb uops | NA |
Nb loads | NA |
Nb stores | NA |
Nb stack references | NA |
FLOP/cycle | NA |
Nb FLOP add-sub | NA |
Nb FLOP mul | NA |
Nb FLOP fma | NA |
Nb FLOP div | NA |
Nb FLOP rcp | NA |
Nb FLOP sqrt | NA |
Nb FLOP rsqrt | NA |
Bytes/cycle | NA |
Bytes prefetched | NA |
Bytes loaded | NA |
Bytes stored | NA |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | NA |
Vectorization ratio load | NA |
Vectorization ratio store | NA |
Vectorization ratio mul | NA |
Vectorization ratio add_sub | NA |
Vectorization ratio fma | NA |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | NA |
Vector-efficiency ratio all | NA |
Vector-efficiency ratio load | NA |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | NA |
Vector-efficiency ratio add_sub | NA |
Vector-efficiency ratio fma | NA |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | NA |
Path / |
Function | main.extracted.8 |
Source file and lines | Step10_orig.c:19-35 |
Module | exec |