Run Skylake GCC Ofast Manual Unroll | Run Skylake Clang O3 + ffast-math Manual Unroll | Run Skylake ICPX Ofast Manual Unroll |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
0 | 101.06 | 101.13 | 96.17 | 42.86 | 16.96 | 90.88 | 8 | 88.02 | 89.11 | 94.40 | 37.5 | 15.33 | 88.55 | 24 | 21.74 | 22.58 | 80.25 | 80 | 42.08 | 391.09 |
| | |
Sum on 1 analyzed binary loop (kmeans-gcc-Ofast - 0) | Sum on 1 analyzed binary loop (kmeans-clang-O3-ffast-math - 8) | Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24) |
Analysis | Count | Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | | Loop Computation Issues | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 |
Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 0 | Presence of a large number of scalar integer instructions | 1 |
Low iteration count | 0 | Low iteration count | 0 | Low iteration count | 1 |
Control Flow Issues | | Control Flow Issues | | Control Flow Issues | |
Low iteration count | | Low iteration count | | Low iteration count | 1 |
Data Access Issues | | Data Access Issues | | Data Access Issues | |
Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
Inefficient Vectorization | | Inefficient Vectorization | | Inefficient Vectorization | |
Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
Use of masked instructions | 0 | Use of masked instructions | 0 | Use of masked instructions | 1 |