OV - Compare Loops

MAQAO

options

Loops

▶main.cpp: 118 - 270.82 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake GCC Ofast Manual Unroll							Run Skylake Clang O3 + ffast-math Manual Unroll							Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131						Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131						Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
0	101.06	101.13	96.17	42.86	16.96	90.88	8	88.02	89.11	94.40	37.5	15.33	88.55	24	21.74	22.58	80.25	80	42.08	391.09

Sum on 1 analyzed binary loop (kmeans-gcc-Ofast - 0)							Sum on 1 analyzed binary loop (kmeans-clang-O3-ffast-math - 8)							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24)
Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues							Loop Computation Issues							Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						0
Presence of a large number of scalar integer instructions						0	Presence of a large number of scalar integer instructions						0	Presence of a large number of scalar integer instructions						1
Low iteration count						0	Low iteration count						0	Low iteration count						1
Control Flow Issues							Control Flow Issues							Control Flow Issues
Low iteration count							Low iteration count							Low iteration count						1
Data Access Issues							Data Access Issues							Data Access Issues
Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
Inefficient Vectorization							Inefficient Vectorization							Inefficient Vectorization
Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1	Presence of special instructions executing on a single port						1
Use of masked instructions						0	Use of masked instructions						0	Use of masked instructions						1

▶main.cpp: 156 - 12.74 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake GCC Ofast Manual Unroll							Run Skylake Clang O3 + ffast-math Manual Unroll							Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions							Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-160						Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161
							42	3.83	2.10	2.22	0	11.61	9.51	21	4.70	2.96	10.51	0	11.61	9.62

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							Sum on 1 analyzed binary loop (kmeans-clang-O3-ffast-math - 42)							Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21)
Analysis						Count	Analysis						Count	Analysis						Count
							Loop Computation Issues							Loop Computation Issues
							Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1	Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
							Presence of a large number of scalar integer instructions						1	Presence of a large number of scalar integer instructions						1
							Data Access Issues							Data Access Issues
							Presence of indirect access						1	Presence of indirect access						1
							Vectorization Roadblocks							Vectorization Roadblocks
							Presence of indirect access						1	Presence of indirect access						1

▶main.cpp: 158 - 2.27 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake GCC Ofast Manual Unroll							Run Skylake Clang O3 + ffast-math Manual Unroll							Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 158-160						Loop Source Regions							Loop Source Regions
7	3.98	2.39	2.27	0	11.61	10.19

Sum on 1 analyzed binary loop (kmeans-gcc-Ofast - 7)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis						Count
Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
Presence of a large number of scalar integer instructions						1
Data Access Issues
Presence of indirect access						1
Vectorization Roadblocks
Presence of indirect access						1

×