Loop Id: 3 | Module: exec | Source: Step10_orig.c:19-31 | Coverage: 89.05% |
---|
Loop Id: 3 | Module: exec | Source: Step10_orig.c:19-31 | Coverage: 89.05% |
---|
0x401a60 VMOVUPS (%RDI,%R11,1),%YMM8 [4] |
0x401a66 VMOVUPS (%RSI,%R11,1),%YMM0 [2] |
0x401a6c VMOVUPS (%RDX,%R11,1),%YMM5 [3] |
0x401a72 VSUBPS %YMM26,%YMM8,%YMM9 |
0x401a78 VSUBPS %YMM25,%YMM0,%YMM8 |
0x401a7e VSUBPS %YMM24,%YMM5,%YMM30 |
0x401a84 VMULPS %YMM8,%YMM8,%YMM28 |
0x401a8a VFMADD231PS %YMM9,%YMM9,%YMM28 |
0x401a90 VFMADD231PS %YMM30,%YMM30,%YMM28 |
0x401a96 VCMPPS $0x1,%YMM15,%YMM28,%K1 |
0x401a9d VMOVAPS %YMM28,%YMM6 |
0x401aa3 VADDPS %YMM23,%YMM28,%YMM5 |
0x401aa9 VCMPPS $0x1,%YMM15,%YMM6,%YMM0 |
0x401aaf VCMPPS $0xe,%YMM16,%YMM28,%K2 |
0x401ab6 VMOVUPS (%R8,%R11,1),%YMM13{%K1} [1] |
0x401abd VCVTPS2PD %XMM5,%YMM6 |
0x401ac1 VSQRTPD %YMM6,%YMM29 |
0x401ac7 ADD $0x20,%R11 |
0x401acb VANDPS %YMM13,%YMM0,%YMM27 |
0x401ad1 VEXTRACTF128 $0x1,%YMM5,%XMM0 |
0x401ad7 VCVTPS2PD %XMM0,%YMM5 |
0x401adb VSQRTPD %YMM5,%YMM31 |
0x401ae1 VMOVAPS %YMM28,%YMM0 |
0x401ae7 VMULPD %YMM29,%YMM6,%YMM6 |
0x401aed VFMADD132PS %YMM22,%YMM21,%YMM0 |
0x401af3 VMULPD %YMM31,%YMM5,%YMM5 |
0x401af9 VDIVPD %YMM6,%YMM14,%YMM6 |
0x401afd VFMADD132PS %YMM28,%YMM20,%YMM0 |
0x401b03 VFMADD132PS %YMM28,%YMM19,%YMM0 |
0x401b09 VFMADD132PS %YMM28,%YMM18,%YMM0 |
0x401b0f VFMADD132PS %YMM28,%YMM17,%YMM0 |
0x401b15 VDIVPD %YMM5,%YMM14,%YMM5 |
0x401b19 VCVTPS2PD %XMM0,%YMM28 |
0x401b1f VEXTRACTF128 $0x1,%YMM0,%XMM0 |
0x401b25 VCVTPS2PD %XMM0,%YMM0 |
0x401b29 VADDPD %YMM28,%YMM6,%YMM6 |
0x401b2f VCVTPD2PS %YMM6,%XMM6 |
0x401b33 VADDPD %YMM0,%YMM5,%YMM5 |
0x401b37 VCVTPD2PS %YMM5,%XMM0 |
0x401b3b VINSERTF128 $0x1,%XMM0,%YMM6,%YMM5 |
0x401b41 VMULPS %YMM27,%YMM5,%YMM6 |
0x401b47 VMULPS %YMM6,%YMM9,%YMM0{%K2}{z} |
0x401b4d VMULPS %YMM6,%YMM8,%YMM9{%K2}{z} |
0x401b53 VMULPS %YMM6,%YMM30,%YMM8{%K2}{z} |
0x401b59 VADDPS %YMM0,%YMM10,%YMM10 |
0x401b5d VADDPS %YMM9,%YMM11,%YMM11 |
0x401b62 VADDPS %YMM8,%YMM12,%YMM12 |
0x401b67 CMP %R11,%RBX |
0x401b6a JNE 401a60 |
/home/eoseret/qaas_runs_CPU_9468/171-112-4218/intel/HACCmk/build/HACCmk/src/Step10_orig.c: 19 - 31 |
-------------------------------------------------------------------------------- |
19: for ( j = 0; j < count1; j++ ) |
20: { |
21: dxc = xx1[j] - xxi; |
22: dyc = yy1[j] - yyi; |
23: dzc = zz1[j] - zzi; |
24: |
25: r2 = dxc * dxc + dyc * dyc + dzc * dzc; |
26: |
27: m = ( r2 < fsrrmax2 ) ? mass1[j] : 0.0f; |
28: |
29: f = pow( r2 + mp_rsm2, -1.5 ) - ( ma0 + r2*(ma1 + r2*(ma2 + r2*(ma3 + r2*(ma4 + r2*ma5))))); |
30: |
31: f = ( r2 > 0.0f ) ? m * f : 0.0f; |
Path / |
Metric | Value |
---|---|
CQA speedup if no scalar integer | NA |
CQA speedup if FP arith vectorized | NA |
CQA speedup if fully vectorized | NA |
CQA speedup if no inter-iteration dependency | NA |
CQA speedup if next bottleneck killed | NA |
Bottlenecks | NA |
Function | Step10_orig |
Source | Step10_orig.c:19-31 |
Source loop unroll info | NA |
Source loop unroll confidence level | NA |
Unroll/vectorization loop type | NA |
Unroll factor | NA |
CQA cycles | NA |
CQA cycles if no scalar integer | NA |
CQA cycles if FP arith vectorized | NA |
CQA cycles if fully vectorized | NA |
Front-end cycles | NA |
DIV/SQRT cycles | NA |
P0 cycles | NA |
P1 cycles | NA |
P2 cycles | NA |
P3 cycles | NA |
P4 cycles | NA |
P5 cycles | NA |
P6 cycles | NA |
P7 cycles | NA |
P8 cycles | NA |
P9 cycles | NA |
P10 cycles | NA |
P11 cycles | NA |
Inter-iter dependencies cycles | NA |
FE+BE cycles (UFS) | NA |
Stall cycles (UFS) | NA |
Nb insns | NA |
Nb uops | NA |
Nb loads | NA |
Nb stores | NA |
Nb stack references | NA |
FLOP/cycle | NA |
Nb FLOP add-sub | NA |
Nb FLOP mul | NA |
Nb FLOP fma | NA |
Nb FLOP div | NA |
Nb FLOP rcp | NA |
Nb FLOP sqrt | NA |
Nb FLOP rsqrt | NA |
Bytes/cycle | NA |
Bytes prefetched | NA |
Bytes loaded | NA |
Bytes stored | NA |
Stride 0 | NA |
Stride 1 | NA |
Stride n | NA |
Stride unknown | NA |
Stride indirect | NA |
Vectorization ratio all | NA |
Vectorization ratio load | NA |
Vectorization ratio store | NA |
Vectorization ratio mul | NA |
Vectorization ratio add_sub | NA |
Vectorization ratio fma | NA |
Vectorization ratio div_sqrt | NA |
Vectorization ratio other | NA |
Vector-efficiency ratio all | NA |
Vector-efficiency ratio load | NA |
Vector-efficiency ratio store | NA |
Vector-efficiency ratio mul | NA |
Vector-efficiency ratio add_sub | NA |
Vector-efficiency ratio fma | NA |
Vector-efficiency ratio div_sqrt | NA |
Vector-efficiency ratio other | NA |
Path / |
Function | Step10_orig |
Source file and lines | Step10_orig.c:19-31 |
Module | exec |