OV - - Summary

engine_linuxa64_gf_ompi - 2024-10-17 10:08:29 - MAQAO 2.20.9

Help is available by moving the cursor above any symbol or by checking MAQAO website.

▼Stylizer

[ 4 / 4 ] Application profile is long enough (1421.90 s)

To have good quality measurements, it is advised that the application profiling time is greater than 10 seconds.

[ 3.00 / 3 ] Optimization level option is correctly used

[ 2.98 / 3 ] Most of time spent in analyzed modules comes from functions compiled with -g and -fno-omit-frame-pointer

-g option gives access to debugging informations, such are source locations. -fno-omit-frame-pointer improve the accuracy of callchains found during the application profiling.

[ 2.98 / 3 ] Architecture specific option -mcpu is used

[ 2 / 2 ] Application is correctly profiled ("Others" category represents 2.12 % of the execution time)

To have a representative profiling, it is advised that the category "Others" represents less than 20% of the execution time in order to analyze as much as possible of the user code

[ 0 / 0 ] Fastmath not used

Consider to add ffast-math to compilation flags (or replace -O3 with -Ofast) to unlock potential extra speedup by relaxing floating-point computation consistency. Warning: floating-point accuracy may be reduced and the compliance to IEEE/ISO rules/specifications for math functions will be relaxed, typically 'errno' will no longer be set after calling some math functions.

▶Strategizer

[ 4 / 4 ] Enough time of the experiment time spent in analyzed loops (72.44%)

If the time spent in analyzed loops is less than 30%, standard loop optimizations will have a limited impact on application performances.

[ 4 / 4 ] Loop profile is not flat

At least one loop coverage is greater than 4% (9.71%), representing an hotspot for the application

[ 4 / 4 ] Enough time of the experiment time spent in analyzed innermost loops (67.70%)

If the time spent in analyzed innermost loops is less than 15%, standard innermost loop optimizations such as vectorisation will have a limited impact on application performances.

[ 3 / 3 ] Less than 10% (0.00%) is spend in BLAS1 operations

It could be more efficient to inline by hand BLAS1 operations

[ 3 / 3 ] Cumulative Outermost/In between loops coverage (4.74%) lower than cumulative innermost loop coverage (67.70%)

Having cumulative Outermost/In between loops coverage greater than cumulative innermost loop coverage will make loop optimization more complex

[ 2 / 2 ] Less than 10% (1.18%) is spend in Libm/SVML (special functions)

[ 2 / 2 ] Less than 10% (0.00%) is spend in BLAS2 operations

BLAS2 calls usually could make a poor cache usage and could benefit from inlining.

▼Optimizer

Loop ID	Analysis	Penalty Score
►Loop 6195 - engine_linuxa64_gf_ompi+	Execution Time: 9 % - Vectorization Ratio: 71.43 % - Vector Length Use: 42.86 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Vectorization Roadblocks+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Loop 10581 - engine_linuxa64_gf_ompi+	Execution Time: 7 % - Vectorization Ratio: 46.15 % - Vector Length Use: 40.38 %
►Loop Computation Issues+		2
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Data Access Issues+		68
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 32 issues ( = data accesses) costing 2 point each.	64
○	[SA] Presence of indirect accesses - Use array restructuring or gather instructions to lower the cost. There are 1 issues ( = indirect data accesses) costing 4 point each.	4
►Vectorization Roadblocks+		68
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 32 issues ( = data accesses) costing 2 point each.	64
○	[SA] Presence of indirect accesses - Use array restructuring or gather instructions to lower the cost. There are 1 issues ( = indirect data accesses) costing 4 point each.	4
►Loop 10282 - engine_linuxa64_gf_ompi+	Execution Time: 3 % - Vectorization Ratio: 1.37 % - Vector Length Use: 25.34 %
►Loop Computation Issues+		2
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Data Access Issues+		82
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 41 issues ( = data accesses) costing 2 point each.	82
►Vectorization Roadblocks+		82
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 41 issues ( = data accesses) costing 2 point each.	82
►Loop 6410 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 100.00 % - Vector Length Use: 63.16 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		10
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 5 issues ( = data accesses) costing 2 point each.	10
►Vectorization Roadblocks+		10
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 5 issues ( = data accesses) costing 2 point each.	10
►Loop 29120 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 11.11 % - Vector Length Use: 27.08 %
►Loop Computation Issues+		6
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		2
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
►Data Access Issues+		8
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 4 issues ( = data accesses) costing 2 point each.	8
►Vectorization Roadblocks+		10
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 4 issues ( = data accesses) costing 2 point each.	8
►Loop 10533 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 0.00 % - Vector Length Use: 25.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		32
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 16 issues ( = data accesses) costing 2 point each.	32
►Vectorization Roadblocks+		32
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 16 issues ( = data accesses) costing 2 point each.	32
►Loop 37963 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 7.86 % - Vector Length Use: 26.23 %
►Loop Computation Issues+		116
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 5 issues (= instructions) costing 4 points each.	20
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Large loop body: over microp cache size - Perform loop splitting or reduce unrolling. There are 44 issues (= chunks of 50 instructions) costing 2 point each.	88
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
○	[SA] Bottleneck in the front end - If loop size is very small (rare occurrences), perform unroll and jam. If loop size is large, perform loop splitting. This issue costs 2 points.	2
►Control Flow Issues+		4
○	[SA] Presence of calls - Inline either by compiler or by hand and use SVML for libm calls. There are 2 issues (= calls) costing 1 point each.	2
○	[SA] Non innermost loop (InBetween) - Collapse loop with innermost ones. This issue costs 2 points.	2
►Vectorization Roadblocks+		1004
○	[SA] Presence of calls - Inline either by compiler or by hand and use SVML for libm calls. There are 2 issues (= calls) costing 1 point each.	2
○	[SA] Too many paths (at least 1000 paths) - Simplify control structure. There are at least 1000 issues ( = paths) costing 1 point.	1000
○	[SA] Non innermost loop (InBetween) - Collapse loop with innermost ones. This issue costs 2 points.	2
►Loop 10312 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 0.00 % - Vector Length Use: 25.00 %
►Control Flow Issues+		2
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
►Data Access Issues+		24
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 12 issues ( = data accesses) costing 2 point each.	24
►Vectorization Roadblocks+		26
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 12 issues ( = data accesses) costing 2 point each.	24
►Loop 10472 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		52
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 12 issues (= instructions) costing 4 points each.	48
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		18
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 9 issues ( = data accesses) costing 2 point each.	18
►Vectorization Roadblocks+		18
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 9 issues ( = data accesses) costing 2 point each.	18
►Loop 10349 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		86
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 43 issues ( = data accesses) costing 2 point each.	86
►Vectorization Roadblocks+		86
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 43 issues ( = data accesses) costing 2 point each.	86
►Loop 6475 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 100.00 % - Vector Length Use: 64.29 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Vectorization Roadblocks+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Loop 10457 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		34
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 17 issues ( = data accesses) costing 2 point each.	34
►Vectorization Roadblocks+		34
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 17 issues ( = data accesses) costing 2 point each.	34
►Loop 10304 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		82
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 41 issues ( = data accesses) costing 2 point each.	82
►Vectorization Roadblocks+		82
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 41 issues ( = data accesses) costing 2 point each.	82
►Loop 29119 - engine_linuxa64_gf_ompi+	Execution Time: 1 % - Vectorization Ratio: 11.11 % - Vector Length Use: 25.14 %
►Loop Computation Issues+		6
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		2
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
►Data Access Issues+		8
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 4 issues ( = data accesses) costing 2 point each.	8
►Vectorization Roadblocks+		10
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 4 issues ( = data accesses) costing 2 point each.	8
►Loop 29118 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 11.11 % - Vector Length Use: 27.08 %
►Loop Computation Issues+		6
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		2
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
►Data Access Issues+		10
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 5 issues ( = data accesses) costing 2 point each.	10
►Vectorization Roadblocks+		12
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 5 issues ( = data accesses) costing 2 point each.	10
►Loop 10347 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		72
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 36 issues ( = data accesses) costing 2 point each.	72
►Vectorization Roadblocks+		72
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 36 issues ( = data accesses) costing 2 point each.	72
►Loop 38138 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		22
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 11 issues ( = data accesses) costing 2 point each.	22
►Vectorization Roadblocks+		22
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 11 issues ( = data accesses) costing 2 point each.	22
►Loop 31053 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		20
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 4 issues (= instructions) costing 4 points each.	16
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		10
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 5 issues ( = data accesses) costing 2 point each.	10
►Vectorization Roadblocks+		10
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 5 issues ( = data accesses) costing 2 point each.	10
►Loop 38106 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 50.00 % - Vector Length Use: 37.50 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		32
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 16 issues ( = data accesses) costing 2 point each.	32
►Vectorization Roadblocks+		32
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 16 issues ( = data accesses) costing 2 point each.	32
►Loop 6477 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 64.29 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Vectorization Roadblocks+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Loop 30872 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 0.00 % - Vector Length Use: 25.00 %
►Loop Computation Issues+		18
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 3 issues (= instructions) costing 4 points each.	12
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		3
○	[SA] Several paths (3 paths) - Simplify control structure or force the compiler to use masked instructions. There are 3 issues ( = paths) costing 1 point each.	3
►Data Access Issues+		22
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 11 issues ( = data accesses) costing 2 point each.	22
►Vectorization Roadblocks+		25
○	[SA] Several paths (3 paths) - Simplify control structure or force the compiler to use masked instructions. There are 3 issues ( = paths) costing 1 point each.	3
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 11 issues ( = data accesses) costing 2 point each.	22
►Loop 30875 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 0.00 % - Vector Length Use: 25.00 %
►Loop Computation Issues+		6
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		1
○	[SA] Presence of calls - Inline either by compiler or by hand and use SVML for libm calls. There are 1 issues (= calls) costing 1 point each.	1
►Vectorization Roadblocks+		1001
○	[SA] Presence of calls - Inline either by compiler or by hand and use SVML for libm calls. There are 1 issues (= calls) costing 1 point each.	1
○	[SA] Too many paths (at least 1000 paths) - Simplify control structure. There are at least 1000 issues ( = paths) costing 1 point.	1000
►Loop 7665 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 27.38 % - Vector Length Use: 31.32 %
►Loop Computation Issues+		6
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		3
○	[SA] Several paths (3 paths) - Simplify control structure or force the compiler to use masked instructions. There are 3 issues ( = paths) costing 1 point each.	3
►Data Access Issues+		18
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 7 issues ( = data accesses) costing 2 point each.	14
○	[SA] Presence of indirect accesses - Use array restructuring or gather instructions to lower the cost. There are 1 issues ( = indirect data accesses) costing 4 point each.	4
►Vectorization Roadblocks+		21
○	[SA] Several paths (3 paths) - Simplify control structure or force the compiler to use masked instructions. There are 3 issues ( = paths) costing 1 point each.	3
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 7 issues ( = data accesses) costing 2 point each.	14
○	[SA] Presence of indirect accesses - Use array restructuring or gather instructions to lower the cost. There are 1 issues ( = indirect data accesses) costing 4 point each.	4
►Loop 10283 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Data Access Issues+		8
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 4 issues ( = data accesses) costing 2 point each.	8
►Vectorization Roadblocks+		8
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 4 issues ( = data accesses) costing 2 point each.	8
►Loop 30878 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		22
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 11 issues ( = data accesses) costing 2 point each.	22
►Vectorization Roadblocks+		22
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 11 issues ( = data accesses) costing 2 point each.	22
►Loop 5672 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 60.00 % - Vector Length Use: 40.00 %
►Loop Computation Issues+		10
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 1 issues (= instructions) costing 4 points each.	4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		2
○	[SA] Non innermost loop (InBetween) - Collapse loop with innermost ones. This issue costs 2 points.	2
►Vectorization Roadblocks+		1002
○	[SA] Too many paths (at least 1000 paths) - Simplify control structure. There are at least 1000 issues ( = paths) costing 1 point.	1000
○	[SA] Non innermost loop (InBetween) - Collapse loop with innermost ones. This issue costs 2 points.	2
►Loop 5677 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 25.00 % - Vector Length Use: 31.25 %
►Loop Computation Issues+		10
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 1 issues (= instructions) costing 4 points each.	4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		2
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
►Data Access Issues+		6
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Vectorization Roadblocks+		8
○	[SA] Several paths (2 paths) - Simplify control structure or force the compiler to use masked instructions. There are 2 issues ( = paths) costing 1 point each.	2
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 3 issues ( = data accesses) costing 2 point each.	6
►Loop 31047 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 0.00 % - Vector Length Use: 22.22 %
►Loop Computation Issues+		4
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Control Flow Issues+		10
○	[SA] Too many paths (6 paths) - Simplify control structure. There are 6 issues ( = paths) costing 1 point each with a malus of 4 points.	10
►Vectorization Roadblocks+		10
○	[SA] Too many paths (6 paths) - Simplify control structure. There are 6 issues ( = paths) costing 1 point each with a malus of 4 points.	10
►Loop 30849 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 8.20 % - Vector Length Use: 23.70 %
►Loop Computation Issues+		18
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 3 issues (= instructions) costing 4 points each.	12
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
○	[SA] Presence of a large number of scalar integer instructions - Simplify loop structure, perform loop splitting or perform unroll and jam. This issue costs 2 points.	2
►Control Flow Issues+		3
○	[SA] Presence of calls - Inline either by compiler or by hand and use SVML for libm calls. There are 1 issues (= calls) costing 1 point each.	1
○	[SA] Non innermost loop (InBetween) - Collapse loop with innermost ones. This issue costs 2 points.	2
►Vectorization Roadblocks+		1003
○	[SA] Presence of calls - Inline either by compiler or by hand and use SVML for libm calls. There are 1 issues (= calls) costing 1 point each.	1
○	[SA] Too many paths (at least 1000 paths) - Simplify control structure. There are at least 1000 issues ( = paths) costing 1 point.	1000
○	[SA] Non innermost loop (InBetween) - Collapse loop with innermost ones. This issue costs 2 points.	2
►Loop 10302 - engine_linuxa64_gf_ompi+	Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 50.00 %
►Loop Computation Issues+		20
○	[SA] Presence of expensive FP instructions - Perform hoisting, change algorithm, use SVML or proper numerical library or perform value profiling (count the number of distinct input values). There are 4 issues (= instructions) costing 4 points each.	16
○	[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.	4
►Data Access Issues+		40
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 20 issues ( = data accesses) costing 2 point each.	40
►Vectorization Roadblocks+		40
○	[SA] Presence of constant non unit stride data access - Use array restructuring, perform loop interchange or use gather instructions to lower a bit the cost. There are 20 issues ( = data accesses) costing 2 point each.	40

Report Configuration

engine_linuxa64_gf_ompi - 2024-10-17 10:08:29 - MAQAO 2.20.9

▼Stylizer

▶Strategizer

▼Optimizer