Skip to content

Commit 028a12f

Browse files
author
Ben Skeggs
committed
drm/nouveau/gr/gp107,gp108: implement workaround for HW hanging during init
Certain boards with GP107/GP108 chipsets hang (often, but randomly) for unknown reasons during GR initialisation. The first tell-tale symptom of this issue is: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ] appearing in dmesg, likely followed by many other failures being logged. Karol found this WAR for the issue a while back, but efforts to isolate the root cause and proper fix have not yielded success so far. I've modified the original patch to include a few more details, limit it to GP107/GP108 by default, and added a config option to override this choice. Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
1 parent 434fdb5 commit 028a12f

File tree

1 file changed

+26
-0
lines changed
  • drivers/gpu/drm/nouveau/nvkm/engine/gr

1 file changed

+26
-0
lines changed

drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1981,8 +1981,34 @@ gf100_gr_init_(struct nvkm_gr *base)
19811981
{
19821982
struct gf100_gr *gr = gf100_gr(base);
19831983
struct nvkm_subdev *subdev = &base->engine.subdev;
1984+
struct nvkm_device *device = subdev->device;
1985+
bool reset = device->chipset == 0x137 || device->chipset == 0x138;
19841986
u32 ret;
19851987

1988+
/* On certain GP107/GP108 boards, we trigger a weird issue where
1989+
* GR will stop responding to PRI accesses after we've asked the
1990+
* SEC2 RTOS to boot the GR falcons. This happens with far more
1991+
* frequency when cold-booting a board (ie. returning from D3).
1992+
*
1993+
* The root cause for this is not known and has proven difficult
1994+
* to isolate, with many avenues being dead-ends.
1995+
*
1996+
* A workaround was discovered by Karol, whereby putting GR into
1997+
* reset for an extended period right before initialisation
1998+
* prevents the problem from occuring.
1999+
*
2000+
* XXX: As RM does not require any such workaround, this is more
2001+
* of a hack than a true fix.
2002+
*/
2003+
reset = nvkm_boolopt(device->cfgopt, "NvGrResetWar", reset);
2004+
if (reset) {
2005+
nvkm_mask(device, 0x000200, 0x00001000, 0x00000000);
2006+
nvkm_rd32(device, 0x000200);
2007+
msleep(50);
2008+
nvkm_mask(device, 0x000200, 0x00001000, 0x00001000);
2009+
nvkm_rd32(device, 0x000200);
2010+
}
2011+
19862012
nvkm_pmu_pgob(gr->base.engine.subdev.device->pmu, false);
19872013

19882014
ret = nvkm_falcon_get(&gr->fecs.falcon, subdev);

0 commit comments

Comments
 (0)