|
2 | 2 | Display Core Debug tools
|
3 | 3 | ========================
|
4 | 4 |
|
| 5 | +In this section, you will find helpful information on debugging the amdgpu |
| 6 | +driver from the display perspective. This page introduces debug mechanisms and |
| 7 | +procedures to help you identify if some issues are related to display code. |
| 8 | + |
| 9 | +Narrow down display issues |
| 10 | +========================== |
| 11 | + |
| 12 | +Since the display is the driver's visual component, it is common to see users |
| 13 | +reporting issues as a display when another component causes the problem. This |
| 14 | +section equips users to determine if a specific issue was caused by the display |
| 15 | +component or another part of the driver. |
| 16 | + |
| 17 | +DC dmesg important messages |
| 18 | +--------------------------- |
| 19 | + |
| 20 | +The dmesg log is the first source of information to be checked, and amdgpu |
| 21 | +takes advantage of this feature by logging some valuable information. When |
| 22 | +looking for the issues associated with amdgpu, remember that each component of |
| 23 | +the driver (e.g., smu, PSP, dm, etc.) is loaded one by one, and this |
| 24 | +information can be found in the dmesg log. In this sense, look for the part of |
| 25 | +the log that looks like the below log snippet:: |
| 26 | + |
| 27 | + [ 4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8). |
| 28 | + [ 4.254718] [drm] register mmio base: 0xFCB00000 |
| 29 | + [ 4.254918] [drm] register mmio size: 1048576 |
| 30 | + [ 4.260095] [drm] add ip block number 0 <soc21_common> |
| 31 | + [ 4.260318] [drm] add ip block number 1 <gmc_v11_0> |
| 32 | + [ 4.260510] [drm] add ip block number 2 <ih_v6_0> |
| 33 | + [ 4.260696] [drm] add ip block number 3 <psp> |
| 34 | + [ 4.260878] [drm] add ip block number 4 <smu> |
| 35 | + [ 4.261057] [drm] add ip block number 5 <dm> |
| 36 | + [ 4.261231] [drm] add ip block number 6 <gfx_v11_0> |
| 37 | + [ 4.261402] [drm] add ip block number 7 <sdma_v6_0> |
| 38 | + [ 4.261568] [drm] add ip block number 8 <vcn_v4_0> |
| 39 | + [ 4.261729] [drm] add ip block number 9 <jpeg_v4_0> |
| 40 | + [ 4.261887] [drm] add ip block number 10 <mes_v11_0> |
| 41 | + |
| 42 | +From the above example, you can see the line that reports that `<dm>`, |
| 43 | +(**Display Manager**), was loaded, which means that display can be part of the |
| 44 | +issue. If you do not see that line, something else might have failed before |
| 45 | +amdgpu loads the display component, indicating that we don't have a |
| 46 | +display issue. |
| 47 | + |
| 48 | +After you identified that the DM was loaded correctly, you can check for the |
| 49 | +display version of the hardware in use, which can be retrieved from the dmesg |
| 50 | +log with the command:: |
| 51 | + |
| 52 | + dmesg | grep -i 'display core' |
| 53 | + |
| 54 | +This command shows a message that looks like this:: |
| 55 | + |
| 56 | + [ 4.655828] [drm] Display Core v3.2.285 initialized on DCN 3.2 |
| 57 | + |
| 58 | +This message has two key pieces of information: |
| 59 | + |
| 60 | +* **The DC version (e.g., v3.2.285)**: Display developers release a new DC version |
| 61 | + every week, and this information can be advantageous in a situation where a |
| 62 | + user/developer must find a good point versus a bad point based on a tested |
| 63 | + version of the display code. Remember from page :ref:`Display Core <amdgpu-display-core>`, |
| 64 | + that every week the new patches for display are heavily tested with IGT and |
| 65 | + manual tests. |
| 66 | +* **The DCN version (e.g., DCN 3.2)**: The DCN block is associated with the |
| 67 | + hardware generation, and the DCN version conveys the hardware generation that |
| 68 | + the driver is currently running. This information helps to narrow down the |
| 69 | + code debug area since each DCN version has its files in the DC folder per DCN |
| 70 | + component (from the example, the developer might want to focus on |
| 71 | + files/folders/functions/structs with the dcn32 label might be executed). |
| 72 | + However, keep in mind that DC reuses code across different DCN versions; for |
| 73 | + example, it is expected to have some callbacks set in one DCN that are the same |
| 74 | + as those from another DCN. In summary, use the DCN version just as a guide. |
| 75 | + |
| 76 | +From the dmesg file, it is also possible to get the ATOM bios code by using:: |
| 77 | + |
| 78 | + dmesg | grep -i 'ATOM BIOS' |
| 79 | + |
| 80 | +Which generates an output that looks like this:: |
| 81 | + |
| 82 | + [ 4.274534] amdgpu: ATOM BIOS: 113-D7020100-102 |
| 83 | + |
| 84 | +This type of information is useful to be reported. |
| 85 | + |
| 86 | +Avoid loading display core |
| 87 | +-------------------------- |
| 88 | + |
| 89 | +Sometimes, it might be hard to figure out which part of the driver is causing |
| 90 | +the issue; if you suspect that the display is not part of the problem and your |
| 91 | +bug scenario is simple (e.g., some desktop configuration) you can try to remove |
| 92 | +the display component from the equation. First, you need to identify `dm` ID |
| 93 | +from the dmesg log; for example, search for the following log:: |
| 94 | + |
| 95 | + [ 4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8). |
| 96 | + [..] |
| 97 | + [ 4.260095] [drm] add ip block number 0 <soc21_common> |
| 98 | + [ 4.260318] [drm] add ip block number 1 <gmc_v11_0> |
| 99 | + [..] |
| 100 | + [ 4.261057] [drm] add ip block number 5 <dm> |
| 101 | + |
| 102 | +Notice from the above example that the `dm` id is 5 for this specific hardware. |
| 103 | +Next, you need to run the following binary operation to identify the IP block |
| 104 | +mask:: |
| 105 | + |
| 106 | + 0xffffffff & ~(1 << [DM ID]) |
| 107 | + |
| 108 | +From our example the IP mask is:: |
| 109 | + |
| 110 | + 0xffffffff & ~(1 << 5) = 0xffffffdf |
| 111 | + |
| 112 | +Finally, to disable DC, you just need to set the below parameter in your |
| 113 | +bootloader:: |
| 114 | + |
| 115 | + amdgpu.ip_block_mask = 0xffffffdf |
| 116 | + |
| 117 | +If you can boot your system with the DC disabled and still see the issue, it |
| 118 | +means you can rule DC out of the equation. However, if the bug disappears, you |
| 119 | +still need to consider the DC part of the problem and keep narrowing down the |
| 120 | +issue. In some scenarios, disabling DC is impossible since it might be |
| 121 | +necessary to use the display component to reproduce the issue (e.g., play a |
| 122 | +game). |
| 123 | + |
| 124 | +**Note: This will probably lead to the absence of a display output.** |
| 125 | + |
| 126 | +Display flickering |
| 127 | +------------------ |
| 128 | + |
| 129 | +Display flickering might have multiple causes; one is the lack of proper power |
| 130 | +to the GPU or problems in the DPM switches. A good first generic verification |
| 131 | +is to set the GPU to use high voltage:: |
| 132 | + |
| 133 | + bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level" |
| 134 | + |
| 135 | +The above command sets the GPU/APU to use the maximum power allowed which |
| 136 | +disables DPM switches. If forcing DPM levels high does not fix the issue, it |
| 137 | +is less likely that the issue is related to power management. If the issue |
| 138 | +disappears, there is a good chance that other components might be involved, and |
| 139 | +the display should not be ignored since this could be a DPM issues. From the |
| 140 | +display side, if the power increase fixes the issue, it is worth debugging the |
| 141 | +clock configuration and the pipe split police used in the specific |
| 142 | +configuration. |
| 143 | + |
| 144 | +Display artifacts |
| 145 | +----------------- |
| 146 | + |
| 147 | +Users may see some screen artifacts that can be categorized into two different |
| 148 | +types: localized artifacts and general artifacts. The localized artifacts |
| 149 | +happen in some specific areas, such as around the UI window corners; if you see |
| 150 | +this type of issue, there is a considerable chance that you have a userspace |
| 151 | +problem, likely Mesa or similar. The general artifacts usually happen on the |
| 152 | +entire screen. They might be caused by a misconfiguration at the driver level |
| 153 | +of the display parameters, but the userspace might also cause this issue. One |
| 154 | +way to identify the source of the problem is to take a screenshot or make a |
| 155 | +desktop video capture when the problem happens; after checking the |
| 156 | +screenshot/video recording, if you don't see any of the artifacts, it means |
| 157 | +that the issue is likely on the the driver side. If you can still see the |
| 158 | +problem in the data collected, it is an issue that probably happened during |
| 159 | +rendering, and the display code just got the framebuffer already corrupted. |
| 160 | + |
| 161 | +Disabling/Enabling specific features |
| 162 | +==================================== |
| 163 | + |
| 164 | +DC has a struct named `dc_debug_options`, which is statically initialized by |
| 165 | +all DCE/DCN components based on the specific hardware characteristic. This |
| 166 | +structure usually facilitates the bring-up phase since developers can start |
| 167 | +with many disabled features and enable them individually. This is also an |
| 168 | +important debug feature since users can change it when debugging specific |
| 169 | +issues. |
| 170 | + |
| 171 | +For example, dGPU users sometimes see a problem where a horizontal fillet of |
| 172 | +flickering happens in some specific part of the screen. This could be an |
| 173 | +indication of Sub-Viewport issues; after the users identified the target DCN, |
| 174 | +they can set the `force_disable_subvp` field to true in the statically |
| 175 | +initialized version of `dc_debug_options` to see if the issue gets fixed. Along |
| 176 | +the same lines, users/developers can also try to turn off `fams2_config` and |
| 177 | +`enable_single_display_2to1_odm_policy`. In summary, the `dc_debug_options` is |
| 178 | +an interesting form for identifying the problem. |
| 179 | + |
5 | 180 | DC Visual Confirmation
|
6 | 181 | ======================
|
7 | 182 |
|
@@ -76,6 +251,18 @@ change in real-time by using something like::
|
76 | 251 | When reporting a bug related to DC, consider attaching this log before and
|
77 | 252 | after you reproduce the bug.
|
78 | 253 |
|
| 254 | +Collect Firmware information |
| 255 | +============================ |
| 256 | + |
| 257 | +When reporting issues, it is important to have the firmware information since |
| 258 | +it can be helpful for debugging purposes. To get all the firmware information, |
| 259 | +use the command:: |
| 260 | + |
| 261 | + cat /sys/kernel/debug/dri/0/amdgpu_firmware_info |
| 262 | + |
| 263 | +From the display perspective, pay attention to the firmware of the DMCU and |
| 264 | +DMCUB. |
| 265 | + |
79 | 266 | DMUB Firmware Debug
|
80 | 267 | ===================
|
81 | 268 |
|
|
0 commit comments