Loops
▶main.cpp: 117 - 158.52 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions |
| Loop Source Regions | Loop Source Regions |
| Loop Source Regions | ||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
26 | 10.02 | 10.74 | 78.66 | 57.89 | 18.86 | 78.6 | 24 | 4.62 | 4.83 | 79.86 | 80 | 42.08 | 394.2 | ||||||||||||||
Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 26) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||||||
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | ||||||||||||||||||||||||
Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||||
Low iteration count | 0 | Low iteration count | 1 | ||||||||||||||||||||||||
Control Flow Issues | Control Flow Issues | ||||||||||||||||||||||||||
Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 | ||||||||||||||||||||||||
Low iteration count | 0 | Low iteration count | 1 | ||||||||||||||||||||||||
Data Access Issues | Data Access Issues | ||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||
Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||||||
Presence of more than 4 paths | 1 | Presence of more than 4 paths | |||||||||||||||||||||||||
Inefficient Vectorization | Inefficient Vectorization | ||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||
Use of masked instructions | 0 | Use of masked instructions | 1 |
▶main.cpp: 71 - 91.74 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions |
| Loop Source Regions | Loop Source Regions | |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
28 | 11.91 | 11.53 | 89.01 | 19.15 | 16.49 | 86.47 | |||||||||||||||||||||
27 | 0.33 | 0.35 | 2.73 | 0 | 11.61 | 189.93 | |||||||||||||||||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 28, kmeans-icpx-Ofast - 27) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
▶main.cpp: 74 - 78.47 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
28 | 4.87 | 4.83 | 78.47 | 80 | 42.08 | 419.76 | |||||||||||||||||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 28) | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | |||||||||||||||||||||||||||
Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||||||
Low iteration count | 1 | ||||||||||||||||||||||||||
Control Flow Issues | |||||||||||||||||||||||||||
Low iteration count | 1 | ||||||||||||||||||||||||||
Data Access Issues | |||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||
Inefficient Vectorization | |||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||
Use of masked instructions | 1 |
▶main.cpp: 112 - 13.66 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions |
| |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
25 | 0.93 | 0.83 | 13.51 | 0 | 12.5 | 9.75 | |||||||||||||||||||||
23 | 0.04 | 0.01 | 0.15 | 100 | 37.5 | 0 | |||||||||||||||||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 25, kmeans-icpx-Ofast - 23) | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Data Access Issues | |||||||||||||||||||||||||||
Presence of indirect access | 1 | ||||||||||||||||||||||||||
Presence of expensive instructions: scatter/gather | 1 | ||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||
Vectorization Roadblocks | |||||||||||||||||||||||||||
Presence of indirect access | 1 | ||||||||||||||||||||||||||
Inefficient Vectorization | |||||||||||||||||||||||||||
Presence of expensive instructions: scatter/gather | 1 | ||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 |
▶main.cpp: 156 - 10.37 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions | Loop Source Regions |
| Loop Source Regions | |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | 0.94 | 0.63 | 10.37 | 0 | 11.61 | 8.8 | |||||||||||||||||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Loop Computation Issues | |||||||||||||||||||||||||||
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||||||||
Presence of a large number of scalar integer instructions | 1 | ||||||||||||||||||||||||||
Data Access Issues | |||||||||||||||||||||||||||
Presence of indirect access | 1 | ||||||||||||||||||||||||||
Vectorization Roadblocks | |||||||||||||||||||||||||||
Presence of indirect access | 1 |
▶<unknown>: 0 - 5.30 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions | Loop Source Regions | Loop Source Regions | ||||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | 0.48 | 0.31 | 2.27 | 0 | 0 | 9.51 | 25 | 0.53 | 0.39 | 3.04 | 0 | 0 | 9.99 | ||||||||||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
▶main.cpp: 92 - 0.13 %
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll | Run Skylake ICPX Ofast Manual Unroll + SoA | ||||||||||||||||||||||||
Loop Source Regions | Loop Source Regions |
| Loop Source Regions | Loop Source Regions | |||||||||||||||||||||||
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
23 | 0.09 | 0.02 | 0.13 | 100 | 37.5 | 0 | |||||||||||||||||||||
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 23) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||||||
Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||||||
Data Access Issues | |||||||||||||||||||||||||||
Presence of indirect access | 1 | ||||||||||||||||||||||||||
Presence of expensive instructions: scatter/gather | 1 | ||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 | ||||||||||||||||||||||||||
Vectorization Roadblocks | |||||||||||||||||||||||||||
Presence of indirect access | 1 | ||||||||||||||||||||||||||
Inefficient Vectorization | |||||||||||||||||||||||||||
Presence of expensive instructions: scatter/gather | 1 | ||||||||||||||||||||||||||
Presence of special instructions executing on a single port | 1 |