- r_1 - engine_NEON1M11-0001_o52_m1 - 10 analyzed loop(s)
- r_2 - engine_NEON1M11-0001_o52_m1_ifort - 10 analyzed loop(s)
Analysis | Count | Percentage | Weighted Count |
▼Loop Computation Issues– | 15 | | |
○Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 11 | 55.00 | 0.98 |
○Presence of a large number of scalar integer instructions | 4 | 20.00 | 0.70 |
▼Control Flow Issues– | 8 | | |
○Presence of calls | 4 | 20.00 | 0.22 |
○Presence of 2 to 4 paths | 2 | 10.00 | 0.06 |
○Presence of more than 4 paths | 2 | 10.00 | 0.04 |
▼Data Access Issues– | 19 | | |
○More than 20% of the loads are accessing the stack | 5 | 25.00 | 0.16 |
○Presence of indirect access | 4 | 20.00 | 0.10 |
○Presence of special instructions executing on a single port | 3 | 15.00 | 0.07 |
○Presence of constant non-unit stride data access | 3 | 15.00 | 0.07 |
○More than 10% of the vector loads instructions are unaligned | 3 | 15.00 | 0.07 |
○Presence of expensive instructions: scatter/gather | 1 | 5.00 | 0.03 |
▼Vectorization Roadblocks– | 17 | | |
○Presence of calls | 4 | 20.00 | 0.22 |
○Presence of indirect access | 4 | 20.00 | 0.10 |
○Presence of more than 4 paths | 4 | 20.00 | 0.22 |
○Presence of constant non-unit stride data access | 3 | 15.00 | 0.07 |
○Presence of 2 to 4 paths | 2 | 10.00 | 0.06 |
▼Inefficient Vectorization– | 4 | | |
○Presence of special instructions executing on a single port | 3 | 15.00 | 0.07 |
○Presence of expensive instructions: scatter/gather | 1 | 5.00 | 0.03 |
Analysis | r_1 | r_2 |
Loop Computation Issues | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 6 | 5 |
---|
Presence of a large number of scalar integer instructions | 1 | 3 |
Control Flow Issues | Presence of calls | 2 | 2 |
---|
Presence of 2 to 4 paths | 1 | 1 |
Presence of more than 4 paths | 1 | 1 |
Data Access Issues | Presence of constant non-unit stride data access | 1 | 2 |
---|
Presence of indirect access | 2 | 2 |
More than 10% of the vector loads instructions are unaligned | 3 | 0 |
Presence of expensive instructions: scatter/gather | 0 | 1 |
Presence of special instructions executing on a single port | 3 | 0 |
More than 20% of the loads are accessing the stack | 2 | 3 |
Vectorization Roadblocks | Presence of calls | 2 | 2 |
---|
Presence of 2 to 4 paths | 1 | 1 |
Presence of more than 4 paths | 2 | 2 |
Presence of constant non-unit stride data access | 1 | 2 |
Presence of indirect access | 2 | 2 |
Inefficient Vectorization | Presence of expensive instructions: scatter/gather | 0 | 1 |
---|
Presence of special instructions executing on a single port | 3 | 0 |