Run Neoverse V1 ACFL Ofast Manual Unroll ONLY (no Hoisting) (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Hoisting ONLY (no Unroll) (250 iterations, 64 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 72-86
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 72-78
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
9 | 16.23 | 13.06 | 70.97 | 0 | 24.72 | 5.65 | 7 | 17.03 | 13.98 | 68.96 | 0 | 23.63 | 5.06 |
| |
Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 9) | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 7) |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 |
Control Flow Issues | | Control Flow Issues | |
Data Access Issues | | Data Access Issues | |
Presence of constant non-unit stride data access | | Presence of constant non-unit stride data access | 1 |
Vectorization Roadblocks | | Vectorization Roadblocks | |
Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
Presence of constant non-unit stride data access | 0 | Presence of constant non-unit stride data access | 1 |
Run Neoverse V1 ACFL Ofast Manual Unroll ONLY (no Hoisting) (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Hoisting ONLY (no Unroll) (250 iterations, 64 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 115-119
| Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
39 | 1.76 | 1.37 | 7.44 | 0 | 19.79 | 1.72 | |
| |
Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 39) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | |
Presence of a large number of scalar integer instructions | 1 | | |
Data Access Issues | | | |
Presence of indirect access | 1 | | |
Vectorization Roadblocks | | | |
Presence of indirect access | 1 | | |
Run Neoverse V1 ACFL Ofast Manual Unroll ONLY (no Hoisting) (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Hoisting ONLY (no Unroll) (250 iterations, 64 threads) |
Loop Source Regions | | Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 96-100
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 40 | 1.76 | 1.18 | 5.83 | 0 | 19.79 | 0.94 |
| |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 40) |
Analysis | Count | Analysis | Count |
| | Loop Computation Issues | |
| | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
| | Presence of a large number of scalar integer instructions | 1 |
| | Data Access Issues | |
| | Presence of indirect access | 1 |
| | Vectorization Roadblocks | |
| | Presence of indirect access | 1 |
Run Neoverse V1 ACFL Ofast Manual Unroll ONLY (no Hoisting) (250 iterations, 64 threads) | Run Neoverse V1 ACFL Ofast Hoisting ONLY (no Unroll) (250 iterations, 64 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 68-72
- /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 86-86
- /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 93-98
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 68-72
- /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 80-80
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
7 | 0.07 | 0.06 | 0.32 | 0 | 21.88 | 6.03 | 8 | 0.55 | 0.46 | 2.27 | 0 | 19.79 | 5.02 |
| |
Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 7) | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 8) |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | |
Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | |
Control Flow Issues | | Control Flow Issues | |
Presence of calls | 1 | Presence of calls | 1 |
Vectorization Roadblocks | | Vectorization Roadblocks | |
Presence of calls | 1 | Presence of calls | 1 |
Presence of more than 4 paths | 1 | Presence of more than 4 paths | 1 |