OV - Compare Loops

Loops

▶main.cpp: 117 - 158.52 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA		Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 117-123						Loop Source Regions		Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 118-131
26	10.02	10.74	78.66	57.89	18.86	78.6			24	4.62	4.83	79.86	80	42.08	394.2

Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 26)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 24)
Analysis						Count	Analysis	Count	Analysis						Count
Loop Computation Issues									Loop Computation Issues
Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1			Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						0
Presence of a large number of scalar integer instructions						0			Presence of a large number of scalar integer instructions						1
Low iteration count						0			Low iteration count						1
Control Flow Issues									Control Flow Issues
Presence of more than 4 paths						1			Presence of more than 4 paths						0
Low iteration count						0			Low iteration count						1
Data Access Issues									Data Access Issues
Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1
Vectorization Roadblocks									Vectorization Roadblocks
Presence of more than 4 paths						1			Presence of more than 4 paths
Inefficient Vectorization									Inefficient Vectorization
Presence of special instructions executing on a single port						1			Presence of special instructions executing on a single port						1
Use of masked instructions						0			Use of masked instructions						1

▶main.cpp: 71 - 91.74 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run Skylake ICPX Ofast AoS (base)		Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions		Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 71-76						Loop Source Regions
		28	11.91	11.53	89.01	19.15	16.49	86.47
		27	0.33	0.35	2.73	0	11.61	189.93

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 2 analyzed binary loops (kmeans-icpx-Ofast - 28, kmeans-icpx-Ofast - 27)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis	Count	Analysis						Count	Analysis	Count

▶main.cpp: 156 - 10.37 %

ASM Loop ID	GFLOP/s	ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s
Run Skylake ICPX Ofast AoS (base)		Run Skylake ICPX Ofast SoA		Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions		Loop Source Regions		Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 156-161
				21	0.94	0.63	10.37	0	11.61	8.8

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 21)
Analysis	Count	Analysis	Count	Analysis						Count
				Loop Computation Issues
				Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA						1
				Presence of a large number of scalar integer instructions						1
				Data Access Issues
				Presence of indirect access						1
				Vectorization Roadblocks
				Presence of indirect access						1

▶<unknown>: 0 - 5.30 %

ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run Skylake ICPX Ofast AoS (base)							Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions							Loop Source Regions							Loop Source Regions
21	0.48	0.31	2.27	0	0	9.51	25	0.53	0.39	3.04	0	0	9.99

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis						Count	Analysis						Count	Analysis	Count

▶main.cpp: 92 - 0.13 %

ASM Loop ID	GFLOP/s	ASM Loop ID	Max Time Over Threads (s)	Time w.r.t. Wall Time (s)	Cov (%)	Vect. Ratio (%)	Vector Length Use (%)	GFLOP/s	ASM Loop ID	GFLOP/s
Run Skylake ICPX Ofast AoS (base)		Run Skylake ICPX Ofast SoA							Run Skylake ICPX Ofast Manual Unroll
Loop Source Regions		Loop Source Regions	/home/fmusial/KMEANS_Benchmarks/kmeans/main.cpp: 92-98						Loop Source Regions
		23	0.09	0.02	0.13	100	37.5	0

No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.		Sum on 1 analyzed binary loop (kmeans-icpx-Ofast - 23)							No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count.
Analysis	Count	Analysis						Count	Analysis	Count
		Data Access Issues
		Presence of indirect access						1
		Presence of expensive instructions: scatter/gather						1
		Presence of special instructions executing on a single port						1
		Vectorization Roadblocks
		Presence of indirect access						1
		Inefficient Vectorization
		Presence of expensive instructions: scatter/gather						1
		Presence of special instructions executing on a single port						1

Report Configuration

Loops

▶main.cpp: 117 - 158.52 %

▶main.cpp: 71 - 91.74 %

▶main.cpp: 156 - 10.37 %

▶<unknown>: 0 - 5.30 %

▶main.cpp: 92 - 0.13 %