You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello,When using the NVIDIA open-source driver version 560.35.03 on ubuntu 18.04, a graphics card disconnection issue occurred on the RTX 3060 GPU. The error system log is as follows:
May 29 16:38:33 root-PC kernel: [ 8516.915898] NVRM: GPU at PCI:0000:01:00: GPU-154c347a-1b02-a5c4-a983-321c822c643f May 29 16:38:33 root-PC kernel: [ 8516.915901] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. May 29 16:38:33 root-PC kernel: [ 8516.915912] NVRM: GPU 0000:01:00.0: ...GPU has fallen off the bus. and now pmc_boot_0 = 0xffffffff May 29 16:38:33 root-PC kernel: [ 8516.916036] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78! May 29 16:38:33 root-PC kernel: [ 8516.916038] NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274 May 29 16:38:33 root-PC kernel: [ 8516.916080] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78! May 29 16:38:33 root-PC kernel: [ 8516.916081] NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274 May 29 16:38:33 root-PC kernel: [ 8516.916085] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78! May 29 16:38:33 root-PC kernel: [ 8516.916089] NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274 May 29 16:38:33 root-PC kernel: [ 8516.916124] NVRM: RmLogGpuCrash: RmLogGpuCrash: failed to save GPU crash data May 29 16:38:33 root-PC kernel: [ 8516.916128] NVRM: _kgspLogRpcSanityCheckFailure: GPU0 sanity check failed 0xf waiting for RPC response from GSP. Expected function 76 (GSP_RM_CONTROL) (0x2080a0d1 0x658). May 29 16:38:33 root-PC kernel: [ 8516.916130] NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000. May 29 16:38:33 root-PC kernel: [ 8516.916131] NVRM: GPU0 RPC history (CPU -> GSP): May 29 16:38:33 root-PC kernel: [ 8516.916132] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling May 29 16:38:33 root-PC kernel: [ 8516.916134] NVRM: 0 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642391890b3 0x0000000000000000 y May 29 16:38:33 root-PC kernel: [ 8516.916136] NVRM: -1 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642391275da 0x0006364239128615 4155us May 29 16:38:33 root-PC kernel: [ 8516.916137] NVRM: -2 76 GSP_RM_CONTROL 0x000000002080a097 0x0000000000000490 0x00063642390ae706 0x00063642390af1ca 2756us May 29 16:38:33 root-PC kernel: [ 8516.916138] NVRM: -3 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642390578ae 0x0006364239057c0a 860us May 29 16:38:33 root-PC kernel: [ 8516.916142] NVRM: -4 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642390319f3 0x0006364239031dac 953us May 29 16:38:33 root-PC kernel: [ 8516.916144] NVRM: -5 76 GSP_RM_CONTROL 0x000000002080a097 0x0000000000000490 0x0006364238fba237 0x0006364238fbb31c 4325us May 29 16:38:33 root-PC kernel: [ 8516.916145] NVRM: -6 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x0006364238f3bb1a 0x0006364238f3c0e8 1486us May 29 16:38:33 root-PC kernel: [ 8516.916147] NVRM: -7 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x0006364238f25e54 0x0006364238f2641e 1482us May 29 16:38:33 root-PC kernel: [ 8516.916147] NVRM: GPU0 RPC event history (CPU <- GSP): May 29 16:38:33 root-PC kernel: [ 8516.916148] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc May 29 16:38:33 root-PC kernel: [ 8516.916150] NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000636403decf468 0x000636403decf468 May 29 16:38:33 root-PC kernel: [ 8516.916152] NVRM: -1 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000636403decf2fa 0x000636403decf2fb 1us May 29 16:38:33 root-PC kernel: [ 8516.916153] NVRM: -2 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000636403deccbbf 0x000636403deccbc0 1us May 29 16:38:33 root-PC kernel: [ 8516.916154] NVRM: -3 4098 GSP_RUN_CPU_SEQUENCER 0x000000000000061c 0x0000000000003fe2 0x000636403dec34bc 0x000636403dec4645 4489us May 29 16:38:33 root-PC kernel: [ 8516.916156] NVRM: -4 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000018fb82e 0x000636403de9f17c 0x000636403de9f17f 3us May 29 16:38:33 root-PC kernel: [ 8516.916160] NVRM: -5 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000016ecc1c 0x000636403de617c2 0x000636403de617c3 1us May 29 16:38:33 root-PC kernel: [ 8516.916162] NVRM: -6 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000016ecbee 0x000636403de613ce 0x000636403de613d1 3us May 29 16:38:33 root-PC kernel: [ 8516.916164] CPU: 4 PID: 4465 Comm: [vkps] Update Kdump: loaded Tainted: G OE 5.3.18+ #22 May 29 16:38:33 root-PC kernel: [ 8516.916165] Hardware name: Advantech EBC-GF68/EBC-GF68, BIOS GF68000Q060X019 10/11/2024 May 29 16:38:33 root-PC kernel: [ 8516.916166] Call Trace: May 29 16:38:33 root-PC kernel: [ 8516.916172] dump_stack+0x6d/0x95 May 29 16:38:33 root-PC kernel: [ 8516.916301] os_dump_stack+0xe/0x10 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916434] _kgspRpcRecvPoll+0x32a/0x5f0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916532] _issueRpcAndWait+0x71/0x360 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916625] rpcRmApiControl_GSP+0x757/0x9e0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916721] RmGssLegacyRpcCmd+0x190/0x360 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916778] ? os_acquire_spinlock+0x12/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916874] ? RmDeprecatedVidHeapControl+0x80/0x80 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916970] _nv04ControlWithSecInfo+0x47/0xa0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917077] ? rmapiControlWithSecInfoTls+0xf0/0xf0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917182] ? _rmAllocForDeprecatedApi+0x30/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917277] ? _rmControlForDeprecatedApi+0x30/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917372] ? _rmFreeForDeprecatedApi+0x20/0x20 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917467] ? RmCopyUserForDeprecatedApi+0xe0/0xe0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917561] ? _rmMapMemoryForDeprecatedApi+0x30/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917659] ? _rmAllocMemForDeprecatedApi+0x10/0x10 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917746] RmIoctl+0x64a/0xd60 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917801] ? os_get_current_tick+0x2c/0x50 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917857] ? os_acquire_spinlock+0x12/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917947] rm_ioctl+0x66/0x4f0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917950] ? get_futex_key+0x2ff/0x3c0 May 29 16:38:33 root-PC kernel: [ 8516.918004] nvidia_unlocked_ioctl+0x633/0x930 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.918007] ? __switch_to_asm+0x40/0x70 May 29 16:38:33 root-PC kernel: [ 8516.918011] ? __switch_to_asm+0x34/0x70 May 29 16:38:33 root-PC kernel: [ 8516.918013] do_vfs_ioctl+0xa9/0x640 May 29 16:38:33 root-PC kernel: [ 8516.918015] ? _copy_from_user+0x3e/0x60 May 29 16:38:33 root-PC kernel: [ 8516.918016] ksys_ioctl+0x75/0x80 May 29 16:38:33 root-PC kernel: [ 8516.918017] __x64_sys_ioctl+0x1a/0x20 May 29 16:38:33 root-PC kernel: [ 8516.918019] do_syscall_64+0x5a/0x130 May 29 16:38:33 root-PC kernel: [ 8516.918021] entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 29 16:38:33 root-PC kernel: [ 8516.918022] RIP: 0033:0x7fffac6cf347 May 29 16:38:33 root-PC kernel: [ 8516.918023] Code: b3 66 90 48 8b 05 41 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 11 4b 2d 00 f7 d8 64 89 01 48 May 29 16:38:33 root-PC kernel: [ 8516.918024] RSP: 002b:00007fff2f414628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 May 29 16:38:33 root-PC kernel: [ 8516.918025] RAX: ffffffffffffffda RBX: 00007fff2f4147e0 RCX: 00007fffac6cf347 May 29 16:38:33 root-PC kernel: [ 8516.918029] RDX: 00007fff2f4147e0 RSI: 00000000c020462a RDI: 0000000000000022 May 29 16:38:33 root-PC kernel: [ 8516.918030] RBP: 00000000c020462a R08: 00007fff2f4147e0 R09: 00007fff2f4147fc May 29 16:38:33 root-PC kernel: [ 8516.918031] R10: 00007fff2f415830 R11: 0000000000000246 R12: 0000000000000022 May 29 16:38:33 root-PC kernel: [ 8516.918031] R13: 00007fff2f4147fc R14: 0000000068381d09 R15: 00007fff2f414630 May 29 16:38:34 root-PC kernel: [ 8517.047700] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:34 root-PC kernel: [ 8517.047707] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:34 root-PC kernel: [ 8517.047710] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:35 root-PC kernel: [ 8518.722522] irq 16: nobody cared (try booting with the "irqpoll" option) May 29 16:38:35 root-PC kernel: [ 8518.729869] CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Tainted: G OE 5.3.18+ #22 May 29 16:38:35 root-PC kernel: [ 8518.729870] Hardware name: Advantech EBC-GF68/EBC-GF68, BIOS GF68000Q060X019 10/11/2024 May 29 16:38:35 root-PC kernel: [ 8518.729870] Call Trace: May 29 16:38:35 root-PC kernel: [ 8518.729871] <IRQ> May 29 16:38:35 root-PC kernel: [ 8518.729876] dump_stack+0x6d/0x95 May 29 16:38:35 root-PC kernel: [ 8518.729878] __report_bad_irq+0x35/0xc0 May 29 16:38:35 root-PC kernel: [ 8518.729879] note_interrupt+0x24b/0x2a0 May 29 16:38:35 root-PC kernel: [ 8518.729880] handle_irq_event_percpu+0x54/0x80 May 29 16:38:35 root-PC kernel: [ 8518.729881] handle_irq_event+0x3b/0x60 May 29 16:38:35 root-PC kernel: [ 8518.729882] handle_fasteoi_irq+0x7c/0x130 May 29 16:38:35 root-PC kernel: [ 8518.729883] handle_irq+0x20/0x30 May 29 16:38:35 root-PC kernel: [ 8518.729885] do_IRQ+0x50/0xe0 May 29 16:38:35 root-PC kernel: [ 8518.729886] common_interrupt+0xf/0xf May 29 16:38:35 root-PC kernel: [ 8518.729887] </IRQ> May 29 16:38:35 root-PC kernel: [ 8518.729889] RIP: 0010:cpuidle_enter_state+0xa9/0x440 May 29 16:38:35 root-PC kernel: [ 8518.729890] Code: 3d 5c a4 3e 70 e8 47 c5 4a ff 49 89 c7 0f 1f 44 00 00 31 ff e8 78 d0 4a ff 80 7d d3 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 ff 01 00 00 41 c7 44 24 10 00 00 00 00 48 83 c4 18 May 29 16:38:35 root-PC kernel: [ 8518.729891] RSP: 0018:ffffa810c0133e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde May 29 16:38:35 root-PC kernel: [ 8518.729892] RAX: ffff8b080baaa6c0 RBX: ffffffff90dc11e0 RCX: 000000000000001f May 29 16:38:35 root-PC kernel: [ 8518.729893] RDX: 000007bf6b6de6df RSI: 000000002819abac RDI: 0000000000000000 May 29 16:38:35 root-PC kernel: [ 8518.729893] RBP: ffffa810c0133e88 R08: 0000000000000002 R09: 0000000000029f40 May 29 16:38:35 root-PC kernel: [ 8518.729894] R10: ffffa810c0133e18 R11: 0000000000000006 R12: ffffc810bfc80300 May 29 16:38:35 root-PC kernel: [ 8518.729894] R13: 0000000000000001 R14: ffffffff90dc1258 R15: 000007bf6b6de6df May 29 16:38:35 root-PC kernel: [ 8518.729896] ? cpuidle_enter_state+0x98/0x440 May 29 16:38:35 root-PC kernel: [ 8518.729897] ? menu_select+0x370/0x600 May 29 16:38:35 root-PC kernel: [ 8518.729898] cpuidle_enter+0x2e/0x40 May 29 16:38:35 root-PC kernel: [ 8518.729900] call_cpuidle+0x23/0x40 May 29 16:38:35 root-PC kernel: [ 8518.729901] do_idle+0x1f6/0x270 May 29 16:38:35 root-PC kernel: [ 8518.729903] cpu_startup_entry+0x1d/0x20 May 29 16:38:35 root-PC kernel: [ 8518.729905] start_secondary+0x167/0x1c0 May 29 16:38:35 root-PC kernel: [ 8518.729906] secondary_startup_64+0xa4/0xb0 May 29 16:38:35 root-PC kernel: [ 8518.729907] handlers: May 29 16:38:35 root-PC kernel: [ 8518.732397] [<0000000026e0890e>] i801_isr May 29 16:38:35 root-PC kernel: [ 8518.736809] Disabling IRQ #16 May 29 16:38:38 root-PC kernel: [ 8521.816202] device wlan0 entered promiscuous mode May 29 16:38:38 root-PC kernel: [ 8521.847135] device wlan0 left promiscuous mode May 29 16:38:43 root-PC kernel: [ 8526.888758] device wlan0 entered promiscuous mode May 29 16:38:43 root-PC kernel: [ 8526.915206] device wlan0 left promiscuous mode May 29 16:38:46 root-PC kernel: [ 8529.831182] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831196] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831199] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831216] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831222] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831225] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831240] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831245] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831248] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831262] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831268] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831270] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831278] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831282] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831285] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831293] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831302] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831304] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831312] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831316] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831319] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831326] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831330] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831333] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.832762] NVRM: nvAssertOkFailedNoLog: Assertion failed: Current device is not valid [NV_ERR_INVALID_DEVICE] (0x00000026) returned from rmDeviceGpuLocksAcquire(pGpu, GPUS_LOCK_FLAGS_NONE, RM_LOCK_MODULES_MEM) @ video_mem.c:542 nvidia-bug-report.log.gz
This problem has never occurred before when using the 1660 Super graphics card, but it has happened many times on the 3060 graphics card. How can this problem be solved?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
hello,When using the NVIDIA open-source driver version 560.35.03 on ubuntu 18.04, a graphics card disconnection issue occurred on the RTX 3060 GPU. The error system log is as follows:
May 29 16:38:33 root-PC kernel: [ 8516.915898] NVRM: GPU at PCI:0000:01:00: GPU-154c347a-1b02-a5c4-a983-321c822c643f May 29 16:38:33 root-PC kernel: [ 8516.915901] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. May 29 16:38:33 root-PC kernel: [ 8516.915912] NVRM: GPU 0000:01:00.0: ...GPU has fallen off the bus. and now pmc_boot_0 = 0xffffffff May 29 16:38:33 root-PC kernel: [ 8516.916036] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78! May 29 16:38:33 root-PC kernel: [ 8516.916038] NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274 May 29 16:38:33 root-PC kernel: [ 8516.916080] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78! May 29 16:38:33 root-PC kernel: [ 8516.916081] NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274 May 29 16:38:33 root-PC kernel: [ 8516.916085] NVRM: _issueRpcAndWait: rpcSendMessage failed with status 0x0000000f for fn 78! May 29 16:38:33 root-PC kernel: [ 8516.916089] NVRM: nvCheckOkFailedNoLog: Check failed: GPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) returned from nvdEngineDumpCallbackHelper(pGpu, pPrbEnc, pNvDumpState, pEngineCallback) @ nv_debug_dump.c:274 May 29 16:38:33 root-PC kernel: [ 8516.916124] NVRM: RmLogGpuCrash: RmLogGpuCrash: failed to save GPU crash data May 29 16:38:33 root-PC kernel: [ 8516.916128] NVRM: _kgspLogRpcSanityCheckFailure: GPU0 sanity check failed 0xf waiting for RPC response from GSP. Expected function 76 (GSP_RM_CONTROL) (0x2080a0d1 0x658). May 29 16:38:33 root-PC kernel: [ 8516.916130] NVRM: GPU0 GSP RPC buffer contains function 78 (DUMP_PROTOBUF_COMPONENT) and data 0x0000000000000000 0x0000000000000000. May 29 16:38:33 root-PC kernel: [ 8516.916131] NVRM: GPU0 RPC history (CPU -> GSP): May 29 16:38:33 root-PC kernel: [ 8516.916132] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling May 29 16:38:33 root-PC kernel: [ 8516.916134] NVRM: 0 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642391890b3 0x0000000000000000 y May 29 16:38:33 root-PC kernel: [ 8516.916136] NVRM: -1 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642391275da 0x0006364239128615 4155us May 29 16:38:33 root-PC kernel: [ 8516.916137] NVRM: -2 76 GSP_RM_CONTROL 0x000000002080a097 0x0000000000000490 0x00063642390ae706 0x00063642390af1ca 2756us May 29 16:38:33 root-PC kernel: [ 8516.916138] NVRM: -3 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642390578ae 0x0006364239057c0a 860us May 29 16:38:33 root-PC kernel: [ 8516.916142] NVRM: -4 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x00063642390319f3 0x0006364239031dac 953us May 29 16:38:33 root-PC kernel: [ 8516.916144] NVRM: -5 76 GSP_RM_CONTROL 0x000000002080a097 0x0000000000000490 0x0006364238fba237 0x0006364238fbb31c 4325us May 29 16:38:33 root-PC kernel: [ 8516.916145] NVRM: -6 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x0006364238f3bb1a 0x0006364238f3c0e8 1486us May 29 16:38:33 root-PC kernel: [ 8516.916147] NVRM: -7 76 GSP_RM_CONTROL 0x000000002080a0d1 0x0000000000000658 0x0006364238f25e54 0x0006364238f2641e 1482us May 29 16:38:33 root-PC kernel: [ 8516.916147] NVRM: GPU0 RPC event history (CPU <- GSP): May 29 16:38:33 root-PC kernel: [ 8516.916148] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc May 29 16:38:33 root-PC kernel: [ 8516.916150] NVRM: 0 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000636403decf468 0x000636403decf468 May 29 16:38:33 root-PC kernel: [ 8516.916152] NVRM: -1 4108 UCODE_LIBOS_PRINT 0x0000000000000000 0x0000000000000000 0x000636403decf2fa 0x000636403decf2fb 1us May 29 16:38:33 root-PC kernel: [ 8516.916153] NVRM: -2 4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000027 0x000636403deccbbf 0x000636403deccbc0 1us May 29 16:38:33 root-PC kernel: [ 8516.916154] NVRM: -3 4098 GSP_RUN_CPU_SEQUENCER 0x000000000000061c 0x0000000000003fe2 0x000636403dec34bc 0x000636403dec4645 4489us May 29 16:38:33 root-PC kernel: [ 8516.916156] NVRM: -4 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000018fb82e 0x000636403de9f17c 0x000636403de9f17f 3us May 29 16:38:33 root-PC kernel: [ 8516.916160] NVRM: -5 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000016ecc1c 0x000636403de617c2 0x000636403de617c3 1us May 29 16:38:33 root-PC kernel: [ 8516.916162] NVRM: -6 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000016ecbee 0x000636403de613ce 0x000636403de613d1 3us May 29 16:38:33 root-PC kernel: [ 8516.916164] CPU: 4 PID: 4465 Comm: [vkps] Update Kdump: loaded Tainted: G OE 5.3.18+ #22 May 29 16:38:33 root-PC kernel: [ 8516.916165] Hardware name: Advantech EBC-GF68/EBC-GF68, BIOS GF68000Q060X019 10/11/2024 May 29 16:38:33 root-PC kernel: [ 8516.916166] Call Trace: May 29 16:38:33 root-PC kernel: [ 8516.916172] dump_stack+0x6d/0x95 May 29 16:38:33 root-PC kernel: [ 8516.916301] os_dump_stack+0xe/0x10 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916434] _kgspRpcRecvPoll+0x32a/0x5f0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916532] _issueRpcAndWait+0x71/0x360 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916625] rpcRmApiControl_GSP+0x757/0x9e0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916721] RmGssLegacyRpcCmd+0x190/0x360 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916778] ? os_acquire_spinlock+0x12/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916874] ? RmDeprecatedVidHeapControl+0x80/0x80 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.916970] _nv04ControlWithSecInfo+0x47/0xa0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917077] ? rmapiControlWithSecInfoTls+0xf0/0xf0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917182] ? _rmAllocForDeprecatedApi+0x30/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917277] ? _rmControlForDeprecatedApi+0x30/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917372] ? _rmFreeForDeprecatedApi+0x20/0x20 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917467] ? RmCopyUserForDeprecatedApi+0xe0/0xe0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917561] ? _rmMapMemoryForDeprecatedApi+0x30/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917659] ? _rmAllocMemForDeprecatedApi+0x10/0x10 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917746] RmIoctl+0x64a/0xd60 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917801] ? os_get_current_tick+0x2c/0x50 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917857] ? os_acquire_spinlock+0x12/0x30 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917947] rm_ioctl+0x66/0x4f0 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.917950] ? get_futex_key+0x2ff/0x3c0 May 29 16:38:33 root-PC kernel: [ 8516.918004] nvidia_unlocked_ioctl+0x633/0x930 [nvidia] May 29 16:38:33 root-PC kernel: [ 8516.918007] ? __switch_to_asm+0x40/0x70 May 29 16:38:33 root-PC kernel: [ 8516.918011] ? __switch_to_asm+0x34/0x70 May 29 16:38:33 root-PC kernel: [ 8516.918013] do_vfs_ioctl+0xa9/0x640 May 29 16:38:33 root-PC kernel: [ 8516.918015] ? _copy_from_user+0x3e/0x60 May 29 16:38:33 root-PC kernel: [ 8516.918016] ksys_ioctl+0x75/0x80 May 29 16:38:33 root-PC kernel: [ 8516.918017] __x64_sys_ioctl+0x1a/0x20 May 29 16:38:33 root-PC kernel: [ 8516.918019] do_syscall_64+0x5a/0x130 May 29 16:38:33 root-PC kernel: [ 8516.918021] entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 29 16:38:33 root-PC kernel: [ 8516.918022] RIP: 0033:0x7fffac6cf347 May 29 16:38:33 root-PC kernel: [ 8516.918023] Code: b3 66 90 48 8b 05 41 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 11 4b 2d 00 f7 d8 64 89 01 48 May 29 16:38:33 root-PC kernel: [ 8516.918024] RSP: 002b:00007fff2f414628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 May 29 16:38:33 root-PC kernel: [ 8516.918025] RAX: ffffffffffffffda RBX: 00007fff2f4147e0 RCX: 00007fffac6cf347 May 29 16:38:33 root-PC kernel: [ 8516.918029] RDX: 00007fff2f4147e0 RSI: 00000000c020462a RDI: 0000000000000022 May 29 16:38:33 root-PC kernel: [ 8516.918030] RBP: 00000000c020462a R08: 00007fff2f4147e0 R09: 00007fff2f4147fc May 29 16:38:33 root-PC kernel: [ 8516.918031] R10: 00007fff2f415830 R11: 0000000000000246 R12: 0000000000000022 May 29 16:38:33 root-PC kernel: [ 8516.918031] R13: 00007fff2f4147fc R14: 0000000068381d09 R15: 00007fff2f414630 May 29 16:38:34 root-PC kernel: [ 8517.047700] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:34 root-PC kernel: [ 8517.047707] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:34 root-PC kernel: [ 8517.047710] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:35 root-PC kernel: [ 8518.722522] irq 16: nobody cared (try booting with the "irqpoll" option) May 29 16:38:35 root-PC kernel: [ 8518.729869] CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Tainted: G OE 5.3.18+ #22 May 29 16:38:35 root-PC kernel: [ 8518.729870] Hardware name: Advantech EBC-GF68/EBC-GF68, BIOS GF68000Q060X019 10/11/2024 May 29 16:38:35 root-PC kernel: [ 8518.729870] Call Trace: May 29 16:38:35 root-PC kernel: [ 8518.729871] <IRQ> May 29 16:38:35 root-PC kernel: [ 8518.729876] dump_stack+0x6d/0x95 May 29 16:38:35 root-PC kernel: [ 8518.729878] __report_bad_irq+0x35/0xc0 May 29 16:38:35 root-PC kernel: [ 8518.729879] note_interrupt+0x24b/0x2a0 May 29 16:38:35 root-PC kernel: [ 8518.729880] handle_irq_event_percpu+0x54/0x80 May 29 16:38:35 root-PC kernel: [ 8518.729881] handle_irq_event+0x3b/0x60 May 29 16:38:35 root-PC kernel: [ 8518.729882] handle_fasteoi_irq+0x7c/0x130 May 29 16:38:35 root-PC kernel: [ 8518.729883] handle_irq+0x20/0x30 May 29 16:38:35 root-PC kernel: [ 8518.729885] do_IRQ+0x50/0xe0 May 29 16:38:35 root-PC kernel: [ 8518.729886] common_interrupt+0xf/0xf May 29 16:38:35 root-PC kernel: [ 8518.729887] </IRQ> May 29 16:38:35 root-PC kernel: [ 8518.729889] RIP: 0010:cpuidle_enter_state+0xa9/0x440 May 29 16:38:35 root-PC kernel: [ 8518.729890] Code: 3d 5c a4 3e 70 e8 47 c5 4a ff 49 89 c7 0f 1f 44 00 00 31 ff e8 78 d0 4a ff 80 7d d3 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 ff 01 00 00 41 c7 44 24 10 00 00 00 00 48 83 c4 18 May 29 16:38:35 root-PC kernel: [ 8518.729891] RSP: 0018:ffffa810c0133e48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde May 29 16:38:35 root-PC kernel: [ 8518.729892] RAX: ffff8b080baaa6c0 RBX: ffffffff90dc11e0 RCX: 000000000000001f May 29 16:38:35 root-PC kernel: [ 8518.729893] RDX: 000007bf6b6de6df RSI: 000000002819abac RDI: 0000000000000000 May 29 16:38:35 root-PC kernel: [ 8518.729893] RBP: ffffa810c0133e88 R08: 0000000000000002 R09: 0000000000029f40 May 29 16:38:35 root-PC kernel: [ 8518.729894] R10: ffffa810c0133e18 R11: 0000000000000006 R12: ffffc810bfc80300 May 29 16:38:35 root-PC kernel: [ 8518.729894] R13: 0000000000000001 R14: ffffffff90dc1258 R15: 000007bf6b6de6df May 29 16:38:35 root-PC kernel: [ 8518.729896] ? cpuidle_enter_state+0x98/0x440 May 29 16:38:35 root-PC kernel: [ 8518.729897] ? menu_select+0x370/0x600 May 29 16:38:35 root-PC kernel: [ 8518.729898] cpuidle_enter+0x2e/0x40 May 29 16:38:35 root-PC kernel: [ 8518.729900] call_cpuidle+0x23/0x40 May 29 16:38:35 root-PC kernel: [ 8518.729901] do_idle+0x1f6/0x270 May 29 16:38:35 root-PC kernel: [ 8518.729903] cpu_startup_entry+0x1d/0x20 May 29 16:38:35 root-PC kernel: [ 8518.729905] start_secondary+0x167/0x1c0 May 29 16:38:35 root-PC kernel: [ 8518.729906] secondary_startup_64+0xa4/0xb0 May 29 16:38:35 root-PC kernel: [ 8518.729907] handlers: May 29 16:38:35 root-PC kernel: [ 8518.732397] [<0000000026e0890e>] i801_isr May 29 16:38:35 root-PC kernel: [ 8518.736809] Disabling IRQ #16 May 29 16:38:38 root-PC kernel: [ 8521.816202] device wlan0 entered promiscuous mode May 29 16:38:38 root-PC kernel: [ 8521.847135] device wlan0 left promiscuous mode May 29 16:38:43 root-PC kernel: [ 8526.888758] device wlan0 entered promiscuous mode May 29 16:38:43 root-PC kernel: [ 8526.915206] device wlan0 left promiscuous mode May 29 16:38:46 root-PC kernel: [ 8529.831182] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831196] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831199] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831216] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831222] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831225] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831240] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831245] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831248] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831262] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831268] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831270] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831278] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831282] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831285] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831293] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831302] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831304] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831312] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831316] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831319] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.831326] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_client.c:843 May 29 16:38:46 root-PC kernel: [ 8529.831330] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:257 May 29 16:38:46 root-PC kernel: [ 8529.831333] NVRM: nvAssertFailedNoLog: Assertion failed: status == NV_OK @ rs_server.c:1287 May 29 16:38:46 root-PC kernel: [ 8529.832762] NVRM: nvAssertOkFailedNoLog: Assertion failed: Current device is not valid [NV_ERR_INVALID_DEVICE] (0x00000026) returned from rmDeviceGpuLocksAcquire(pGpu, GPUS_LOCK_FLAGS_NONE, RM_LOCK_MODULES_MEM) @ video_mem.c:542
nvidia-bug-report.log.gz
This problem has never occurred before when using the 1660 Super graphics card, but it has happened many times on the 3060 graphics card. How can this problem be solved?
Beta Was this translation helpful? Give feedback.
All reactions