Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 117-123
| Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
26 | 10.02 | 10.74 | 78.66 | 57.89 | 18.86 | 78.6 | | 24 | 4.62 | 4.83 | 79.86 | 80 | 42.08 | 394.2 |
| | |
Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 26) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24) |
Analysis | Count | Analysis | Count | Analysis | Count |
Loop Computation Issues | | | | Loop Computation Issues | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 |
Presence of a large number of scalar integer instructions | 0 | | | Presence of a large number of scalar integer instructions | 1 |
Low iteration count | 0 | | | Low iteration count | 1 |
Control Flow Issues | | | | Control Flow Issues | |
Presence of more than 4 paths | 1 | | | Presence of more than 4 paths | 0 |
Low iteration count | 0 | | | Low iteration count | 1 |
Data Access Issues | | | | Data Access Issues | |
Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 |
Vectorization Roadblocks | | | | Vectorization Roadblocks | |
Presence of more than 4 paths | 1 | | | Presence of more than 4 paths | |
Inefficient Vectorization | | | | Inefficient Vectorization | |
Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 |
Use of masked instructions | 0 | | | Use of masked instructions | 1 |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 71-76
| Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 28 | 11.91 | 11.53 | 89.01 | 19.15 | 16.49 | 86.47 | |
| 27 | 0.33 | 0.35 | 2.73 | 0 | 11.61 | 189.93 | |
| | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 28, kmeans-icpx-Ofast - 27) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | 21 | 0.94 | 0.63 | 10.37 | 0 | 11.61 | 8.8 |
| | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21) |
Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Loop Computation Issues | |
| | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | | | Presence of a large number of scalar integer instructions | 1 |
| | | | Data Access Issues | |
| | | | Presence of indirect access | 1 |
| | | | Vectorization Roadblocks | |
| | | | Presence of indirect access | 1 |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
21 | 0.48 | 0.31 | 2.27 | 0 | 0 | 9.51 | 25 | 0.53 | 0.39 | 3.04 | 0 | 0 | 9.99 | |
| | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count |
Run Skylake ICPX Ofast AoS (base) | Run Skylake ICPX Ofast SoA | Run Skylake ICPX Ofast Manual Unroll |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 92-98
| Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 23 | 0.09 | 0.02 | 0.13 | 100 | 37.5 | 0 | |
| | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 23) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count |
| | Data Access Issues | | | |
| | Presence of indirect access | 1 | | |
| | Presence of expensive instructions: scatter/gather | 1 | | |
| | Presence of special instructions executing on a single port | 1 | | |
| | Vectorization Roadblocks | | | |
| | Presence of indirect access | 1 | | |
| | Inefficient Vectorization | | | |
| | Presence of expensive instructions: scatter/gather | 1 | | |
| | Presence of special instructions executing on a single port | 1 | | |