| Function: _intel_fast_memset | Module: attention-gnr-256 | Source: :0-0 | Coverage (incl. loops): 0.05% | (excl. loops): 0.05% |
|---|
| Function: _intel_fast_memset | Module: attention-gnr-256 | Source: :0-0 | Coverage (incl. loops): 0.05% | (excl. loops): 0.05% |
|---|
*** This Panel is Intentionally Left Blank. *** It is due to a lack of debug symbols in the given object |
0x4097a0 ENDBR64 |
0x4097a4 MOV 0xee2d(%RIP),%RAX |
0x4097ab TEST %RAX,%RAX |
0x4097ae JE 4097c0 |
0x4097b4 JMP %RAX |
0x4097b6 NOPW %CS:(%RAX,%RAX,1) |
| Coverage (%) | Name | Source Location | Module |
|---|---|---|---|
| ○100.00 | main | attention_v2.cpp:59 | attention-gnr-256 |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| min | med | avg | max |
|---|---|---|---|
| Percentile Index | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
|---|---|---|---|---|---|---|---|---|---|---|
| Value |
| Path / |
The code analyzed by CQA in that panel excludes loops and represents 0.05% of application time for run run_0
| Source file and lines | |
| Module | attention-gnr-256 |
| nb instructions | 5 |
| nb uops | 4 |
| loop length | 22 |
| used x86 registers | 1 |
| used mmx registers | 0 |
| used xmm registers | 0 |
| used ymm registers | 0 |
| used zmm registers | 0 |
| nb stack references | 0 |
| micro-operation queue | 0.67 cycles |
| front end | 0.67 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
| cycles | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
| Cycles executing div or sqrt instructions | NA |
| Front-end | 0.67 |
| Dispatch | 1.00 |
| Overall L1 | 1.00 |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENDBR64 | N/A | |||||||||||||||
| MOV 0xee2d(%RIP),%RAX | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 | N/A |
| TEST %RAX,%RAX | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 2 | 0.20 | N/A |
| JE 4097c0 <__real_memset_impl_setup> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | N/A |
| JMP %RAX | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1.75 | N/A |
The code analyzed by CQA in that panel excludes loops and represents 0.05% of application time for run run_0
| Source file and lines | |
| Module | attention-gnr-256 |
| nb instructions | 5 |
| nb uops | 4 |
| loop length | 22 |
| used x86 registers | 1 |
| used mmx registers | 0 |
| used xmm registers | 0 |
| used ymm registers | 0 |
| used zmm registers | 0 |
| nb stack references | 0 |
| micro-operation queue | 0.67 cycles |
| front end | 0.67 cycles |
| P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| uops | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
| cycles | 1.00 | 0.40 | 0.33 | 0.33 | 0.00 | 0.40 | 1.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.33 |
| Cycles executing div or sqrt instructions | NA |
| Front-end | 0.67 |
| Dispatch | 1.00 |
| Overall L1 | 1.00 |
| Instruction | Nb FU | P0 | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | P11 | Latency | Recip. throughput | Vectorization |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENDBR64 | N/A | |||||||||||||||
| MOV 0xee2d(%RIP),%RAX | 1 | 0 | 0 | 0.33 | 0.33 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.33 | 1 | 0.33 | N/A |
| TEST %RAX,%RAX | 1 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0.20 | 0 | 0 | 0 | 0.20 | 0 | 2 | 0.20 | N/A |
| JE 4097c0 <__real_memset_impl_setup> | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.50 | N/A |
| JMP %RAX | 1 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0.50 | 0 | 0 | 0 | 0 | 0 | 0 | 1.75 | N/A |
| Name | Coverage (%) | Time (s) |
|---|---|---|
| ○_intel_fast_memset | 0.05 | 0.01 |
