Run Skylake GCC Ofast Manual Unroll + SoA | Run Skylake Clang O3 + ffast-math Manual Unroll + SoA | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 74-87
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 74-87
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 74-87
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
0 | 12.24 | 11.99 | 94.88 | 0 | 11.61 | 86.03 | 8 | 11.69 | 11.31 | 93.26 | 0 | 11.25 | 93.85 | 28 | 4.87 | 4.83 | 78.47 | 80 | 42.08 | 419.76 |
| | |
Sum on 1 analyzed binary loop (kmeans-gcc-Ofast - 0) | Sum on 1 analyzed binary loop (kmeans-clang-O3-ffast-math - 8) | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 28) |
Analysis | Count | Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | | Presence of a large number of scalar integer instructions | 1 |
Low iteration count | | Low iteration count | | Low iteration count | 1 |
Control Flow Issues | | Control Flow Issues | | Control Flow Issues | |
Low iteration count | | Low iteration count | | Low iteration count | 1 |
Data Access Issues | | Data Access Issues | | Data Access Issues | |
Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | 1 |
Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | |
Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | | Presence of special instructions executing on a single port | 1 |
Use of masked instructions | | Use of masked instructions | | Use of masked instructions | 1 |
Run Skylake GCC Ofast Manual Unroll + SoA | Run Skylake Clang O3 + ffast-math Manual Unroll + SoA | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 111-117
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| | 25 | 0.93 | 0.83 | 13.51 | 0 | 12.5 | 9.75 |
| | 23 | 0.04 | 0.01 | 0.15 | 100 | 37.5 | 0 |
| | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 25, kmeans-icpx-Ofast - 23) |
Analysis | Count | Analysis | Count | Analysis | Count |
| | | | Data Access Issues | |
| | | | Presence of indirect access | 1 |
| | | | Presence of expensive instructions: scatter/gather | 1 |
| | | | Presence of special instructions executing on a single port | 1 |
| | | | Vectorization Roadblocks | |
| | | | Presence of indirect access | 1 |
| | | | Inefficient Vectorization | |
| | | | Presence of expensive instructions: scatter/gather | 1 |
| | | | Presence of special instructions executing on a single port | 1 |
Run Skylake GCC Ofast Manual Unroll + SoA | Run Skylake Clang O3 + ffast-math Manual Unroll + SoA | Run Skylake ICPX Ofast Manual Unroll + SoA |
Loop Source Regions | | Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
7 | 0.51 | 0.38 | 2.98 | 0 | 0 | 9.18 | 42 | 0.52 | 0.38 | 3.17 | 0 | 0 | 10.18 | |
| | |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count | Analysis | Count |