Run Neoverse V2 GCC O3 Manual Unroll (250 iterations, 96 threads) | Run Neoverse V2 ACFL Ofast Manual Unroll (250 iterations, 96 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
2 | 17.60 | 13.27 | 58.19 | 12.5 | 54.69 | 101.73 | 8 | 4.85 | 4.89 | 47.33 | 35.71 | 69.64 | 111.26 |
| |
Sum on 1 analyzed binary loop (kmeans-gcc-O3 - 2) | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 8) |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 0 |
Control Flow Issues | | Control Flow Issues | |
Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | |
Vectorization Roadblocks | | Vectorization Roadblocks | |
Presence of 2 to 4 paths | 1 | Presence of 2 to 4 paths | |
Run Neoverse V2 GCC O3 Manual Unroll (250 iterations, 96 threads) | Run Neoverse V2 ACFL Ofast Manual Unroll (250 iterations, 96 threads) |
Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 157-160
| Loop Source Regions | - /home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-160
|
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
16 | 0.67 | 0.23 | 1.01 | 10 | 47.5 | 45.78 | 38 | 0.52 | 0.47 | 4.56 | 0 | 41.67 | 45.18 |
| |
Sum on 1 analyzed binary loop (kmeans-gcc-O3 - 16) | Sum on 1 analyzed binary loop (kmeans-acfl-Ofast - 38) |
Analysis | Count | Analysis | Count |
Loop Computation Issues | | Loop Computation Issues | |
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 |
Presence of a large number of scalar integer instructions | 1 | Presence of a large number of scalar integer instructions | 1 |
Data Access Issues | | Data Access Issues | |
Presence of indirect access | 1 | Presence of indirect access | 1 |
Vectorization Roadblocks | | Vectorization Roadblocks | |
Presence of indirect access | 1 | Presence of indirect access | 1 |
Run Neoverse V2 GCC O3 Manual Unroll (250 iterations, 96 threads) | Run Neoverse V2 ACFL Ofast Manual Unroll (250 iterations, 96 threads) |
Loop Source Regions | | Loop Source Regions | |
ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
13 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 | 12 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 |
| 13 | 0.00 | 0.00 | 0.00 | 0 | 0 | 0 |
| |
No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. |
Analysis | Count | Analysis | Count |