Function: hypre_LowerBound | Module: exec | Source: binsearch.c:95-108 | Coverage: 0.01% |
---|
Function: hypre_LowerBound | Module: exec | Source: binsearch.c:95-108 | Coverage: 0.01% |
---|
/scratch_na/users/xoserete/qaas_runs/171-587-0261/intel/AMG/build/AMG/AMG/utilities/binsearch.c: 95 - 108 |
-------------------------------------------------------------------------------- |
95: { |
96: HYPRE_Int *it; |
97: size_t count = last - first, step; |
98: |
99: while (count > 0) { |
100: it = first; step = count/2; it += step; |
101: if (*it < value) { |
102: first = ++it; |
103: count -= step + 1; |
104: } |
105: else count = step; |
106: } |
107: return first; |
108: } |
0x5aeb80 SUB %RDI,%RSI |
0x5aeb83 MOV %RDI,%RAX |
0x5aeb86 SAR $0x3,%RSI |
0x5aeb8a JMP 5aeba2 |
0x5aeb8c NOPL (%RAX) |
(3170) 0x5aeb90 MOV %RSI,%RCX |
(3170) 0x5aeb93 SHR $0x1,%RCX |
(3170) 0x5aeb96 LEA (%RAX,%RCX,8),%R8 |
(3170) 0x5aeb9a CMP %RDX,(%R8) |
(3170) 0x5aeb9d JL 5aebb0 |
(3170) 0x5aeb9f MOV %RCX,%RSI |
(3170) 0x5aeba2 TEST %RSI,%RSI |
(3170) 0x5aeba5 JNE 5aeb90 |
0x5aeba7 RET |
0x5aeba8 NOPL (%RAX,%RAX,1) |
(3170) 0x5aebb0 DEC %RSI |
(3170) 0x5aebb3 LEA 0x8(%R8),%RAX |
(3170) 0x5aebb7 SUB %RCX,%RSI |
(3170) 0x5aebba JMP 5aeba2 |
0x5aebbc NOPL (%RAX) |
Coverage (%) | Name | Source Location | Module |
---|---|---|---|
►85.71+ | hypre_CSRMatrixMatvecOutOfPlac[...] | csr_matvec.c:247 | exec |
○ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
►9.53+ | hypre_CSRMatrixMatvecOutOfPlac[...] | csr_matvec.c:248 | exec |
○ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
►4.76+ | hypre_CSRMatrixTranspose._omp_[...] | csr_matop.c:471 | exec |
○ | gomp_thread_start | team.c:130 | libgomp.so.1.0.0 |
Path / |
Source file and lines | binsearch.c:95-108 |
Module | exec |
nb instructions | 8 |
nb uops | 8 |
loop length | 29 |
used x86 registers | 3 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
micro-operation queue | 1.33 cycles |
front end | 1.33 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
cycles | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
Cycles executing div or sqrt instructions | NA |
FE+BE cycles | 2.04-3.04 |
Stall cycles | 0.00-0.86 |
Front-end | 1.33 |
Dispatch | 1.00 |
Overall L1 | 1.33 |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SUB %RDI,%RSI | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
MOV %RDI,%RAX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
SAR $0x3,%RSI | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0-2 | 0.50 |
JMP 5aeba2 <hypre_LowerBound+0x22> | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5.84 |
NOPL (%RAX) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 |
NOPL (%RAX,%RAX,1) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
NOPL (%RAX) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
Source file and lines | binsearch.c:95-108 |
Module | exec |
nb instructions | 8 |
nb uops | 8 |
loop length | 29 |
used x86 registers | 3 |
used mmx registers | 0 |
used xmm registers | 0 |
used ymm registers | 0 |
used zmm registers | 0 |
nb stack references | 0 |
micro-operation queue | 1.33 cycles |
front end | 1.33 cycles |
P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uops | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
cycles | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
Cycles executing div or sqrt instructions | NA |
FE+BE cycles | 2.04-3.04 |
Stall cycles | 0.00-0.86 |
Front-end | 1.33 |
Dispatch | 1.00 |
Overall L1 | 1.33 |
Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SUB %RDI,%RSI | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 1 | 0.20 |
MOV %RDI,%RAX | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.17 |
SAR $0x3,%RSI | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0-2 | 0.50 |
JMP 5aeba2 <hypre_LowerBound+0x22> | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5.84 |
NOPL (%RAX) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
RET | 1 | 0.50 | 0 | 0.33 | 0.33 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0.33 | 0 | 2.13 |
NOPL (%RAX,%RAX,1) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
NOPL (%RAX) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.17 |
Name | Coverage (%) | Time (s) |
---|---|---|
▼hypre_LowerBound– | 0.01 | 0 |
○Loop 3170 - binsearch.c:99-105 - exec | 0.01 | 0.01 |