Perfect OpenMP/MPI/Pthread/TBB + Perfect Load Distribution
1.34
1.36
2.23
No Scalar Integer
Potential Speedup
1.01
1.01
1.13
Nb Loops to get 80%
1
1
1
FP Vectorised
Potential Speedup
2.24
2.32
1.49
Nb Loops to get 80%
1
1
1
Fully Vectorised
Potential Speedup
7.47
7.38
2.35
Nb Loops to get 80%
1
1
2
Only FP Arithmetic
Potential Speedup
1.01
1.02
1.52
Nb Loops to get 80%
1
1
2
Cumulated Speedup If No Scalar Integer
Cumulated Speedup If FP Vectorized
Cumulated Speedup If Fully Vectorized
Cumulated Speedup If Only FP Arithmetic
Loop Based Profiles
Innermost / Single Loops
Inbetween Loops
Outermost Loops
Cumulated Coverage With All Loops
Innermost Loop Based Profiles
Coverage
Count
Application Categorization
Time
Coverage
Compilation Options
Source Object
Issue
▼kmeans-gcc-Ofast–
▼main.cpp–
○
-funroll-loops is missing.
Source Object
Issue
▼kmeans-clang-O3-ffast-math–
▼main.cpp–
○
-g is missing for some functions (possibly ones added by the compiler), but debug locations are available. Some analysis may be inaccurate. Try to complement -g with -grecord-gcc-switches or -frecord-command-line.
○
-O2, -O3 or -Ofast is missing.
○
-march=(target) is missing.
Source Object
Issue
▼kmeans-icpx-Ofast–
▼main.cpp–
○
Path Count Profiles
Coverage
Count
Low Iteration Count Profiles
Coverage
Count
Average Number of Active Threads
Run 1 - Skylake GCC Ofast Manual Unroll + SoA
Run 2 - Skylake Clang O3 + ffast-math Manual Unroll + SoA