Skip to content

rockchip: fix race condition and NULL ptr deref in PCIe threaded probe#9561

Closed
AlomeProg wants to merge 3 commits intoarmbian:mainfrom
AlomeProg:fix-threaded-pcie-init
Closed

rockchip: fix race condition and NULL ptr deref in PCIe threaded probe#9561
AlomeProg wants to merge 3 commits intoarmbian:mainfrom
AlomeProg:fix-threaded-pcie-init

Conversation

@AlomeProg
Copy link

@AlomeProg AlomeProg commented Mar 19, 2026

Description

This PR adds a patch to fix a critical kernel panic in the Rockchip PCIe driver (pcie-dw-rockchip.c) occurring on threaded initialization failure.

Documentation summary for feature / change

When CONFIG_PCIE_RK_THREADED_INIT is enabled, a race condition exists between the probe and remove paths. If the hardware initialization fails (e.g., power supply issues or PHY errors), rk_pcie_remove may execute concurrently with the error handling path in the probe thread.

The original implementation had multiple flaws:

  1. Unsafe Synchronization: It used a manual while loop with schedule_timeout instead of proper kernel synchronization primitives.
  2. Broken Timeout Logic: The loop condition time_before(start, start + timeout) was always true (due to start not being updated), potentially leading to infinite loops.
  3. NULL Pointer Dereference: The finish_probe flag was set to true even during error cleanup. This caused rk_pcie_remove to attempt accessing uninitialized structures (rk_pcie->pci->pp.bridge->bus), resulting in a Kernel Oops (Unable to handle kernel NULL pointer dereference at virtual address 00000000000002f8).

My solution:

The patch refactors the synchronization mechanism using standard kernel APIs:

  1. Replaces the busy-wait loop with struct completion (probe_done) to ensure safe thread synchronization.
  2. Replaces the ambiguous finish_probe flag with probe_ok to explicitly signal successful initialization.
  3. Adds a check for probe_ok in rk_pcie_remove to exit early if initialization failed, preventing access to invalid pointers.
  4. Ensures completion is initialized immediately after memory allocation to prevent usage of uninitialized data.

Testing

[x] Tested on RK3588 platform.
[x] Verified that the kernel boots successfully.
[x] Verified that no regression is introduced for successful probe scenarios.

dmesg log
[   27.170851] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   27.171511] pc : rk_pcie_remove+0x94/0x1bc
[   27.171915] lr : platform_remove+0x5c/0x74
[   27.172315] sp : ffff80000ccdbd50
[   27.172508] rk-pcie fe160000.pcie: can't get current limit.
[   27.172634] x29: ffff80000ccdbd50
[   27.173007] rk-pcie fe160000.pcie: host bridge /pcie@fe160000 ranges:
[   27.173161]  x28: 0000000000000000
[   27.173209] rk-pcie fe160000.pcie:       IO 0x00f1100000..0x00f11fffff -> 0x00f1100000
[   27.173231]  x27: 0000000000000000
[   27.173265] rk-pcie fe160000.pcie:      MEM 0x00f1200000..0x00f1ffffff -> 0x00f1200000
[   27.173306] 
[   27.173336] rk-pcie fe160000.pcie:      MEM 0x0940000000..0x097fffffff -> 0x0940000000
[   27.173382] x26: 0000000000000000
[   27.173449] rk-pcie fe160000.pcie: iATU unroll: enabled
[   27.173739]  x25: 00000000ffffffed
[   27.173764] rk-pcie fe160000.pcie: iATU regions: 8 ob, 8 ib, align 64K, limit 8G
[   27.173799]  x24: ffff00010149c000
[   27.179307] x23: 0000000000000001 x22: 0000000000000001 x21: ffff000107ee0000
[   27.179983] x20: ffff00010149c010 x19: ffff000100b04880 x18: 0000000000000000
[   27.180659] x17: 00656963702e3030 x16: 3030353165663a6d x15: 726f6674616c702b
[   27.181335] x14: 0000000000000000 x13: 64656c696166206f x12: 0000000000000000
[   27.182011] x11: ffff000100753120 x10: ffff0001003f7a10 x9 : ffff80000899f028
[   27.182687] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736d47ff6364626d
[   27.183363] x5 : 0000008000000000 x4 : 0000000000000006 x3 : 000000000000000c
[   27.184038] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000101b7a480
[   27.184715] Call trace:
[   27.184958]  rk_pcie_remove+0x94/0x1bc
[   27.185321]  platform_remove+0x5c/0x74
[   27.185681]  device_remove+0x54/0x7c
[   27.186039]  device_release_driver_internal+0x94/0x150
[   27.186535]  device_release_driver+0x20/0x2c
[   27.186949]  rk_pcie_really_probe+0x8f0/0x918
[   27.187373]  kthread+0xc4/0xd4
[   27.187684]  ret_from_fork+0x10/0x20
[   27.159256] Unable to handle kernel NULL pointer dereference at virtual address 00000000000002f8
[   27.159263] Mem abort info:
[   27.159266]   ESR = 0x0000000096000004
[   27.159270]   EC = 0x25: DABT (current EL), IL = 32 bits
[   27.159275]   SET = 0, FnV = 0
[   27.159279]   EA = 0, S1PTW = 0
[   27.159283]   FSC = 0x04: level 0 translation fault
[   27.159288] Data abort info:
[   27.159291]   ISV = 0, ISS = 0x00000004
[   27.159295]   CM = 0, WnR = 0
[   27.159299] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000109c4d000
[   27.159305] [00000000000002f8] pgd=0000000000000000, p4d=0000000000000000
[   27.159319] Internal error: Oops: 0000000096000004 [#1] SMP

Summary by CodeRabbit

  • Bug Fixes
    • Improved stability of PCIe device initialization and removal on Rockchip systems.
    • Better synchronization of probe sequences to reduce race conditions during startup/shutdown.
    • Safer device removal with timeout-based waits and additional validation to avoid hangs or invalid teardown.

Signed-off-by: AlomeProg <alomeprog@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Replaces a boolean probe-completion flag with a struct completion in the Rockchip PCIe driver; probe routine signals completion on success/failure and removal waits via wait_for_completion_timeout(), with additional early-return checks based on probe_ok and PCI bridge presence.

Changes

Cohort / File(s) Summary
Rockchip PCIe Driver Synchronization
patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch, drivers/pci/controller/dwc/pcie-dw-rockchip.c
Removes finish_probe; adds probe_ok and probe_done (struct completion) to struct rk_pcie; initializes and signals completion in rk_pcie_really_probe() using complete_all() on both success and failure; in rk_pcie_remove() switches from polling the boolean to wait_for_completion_timeout(&rk_pcie->probe_done, timeout) under CONFIG_PCIE_RK_THREADED_INIT, adds timeout/logging/early-return checks and verifies probe_ok and PCI bridge structures before proceeding.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I dug a burrow, fixed a race,

I swapped a flag for a safe embrace.
probe_done sings, complete_all cheers,
No more spins or waiting fears.
🥕 Hoppity sync — the driver clears.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: replacing the unsafe threaded-init mechanism with a completion-based synchronization to fix race conditions and NULL pointer dereferences in the Rockchip PCIe driver.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

@github-actions
Copy link
Contributor

Hey @AlomeProg! 👋

Thanks for submitting your first pull request to the Armbian project — we're excited to have you contributing! 🧡
Your effort doesn’t just improve Armbian — it benefits the entire community of users and developers.

If you'd like to stay informed about project updates or collaborate more closely with the team,
you can optionally share some personal contact preferences at armbian.com/update-data.
This helps us keep in touch without relying solely on GitHub notifications.

Also, don’t forget to ⭐ star the repo if you haven’t already — and welcome aboard! 🚀

@github-actions github-actions bot added 05 Milestone: Second quarter release size/medium PR with more then 50 and less then 250 lines Needs review Seeking for review Hardware Hardware related like kernel, U-Boot, ... Patches Patches related to kernel, U-Boot, ... labels Mar 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch`:
- Around line 94-102: The loop condition fix is correct (change
time_before(start, start + timeout) to time_before(jiffies, start + timeout)),
but the indentation on the while body (lines with set_current_state,
schedule_timeout, rk_pcie = dev_get_drvdata) contains mixed space+tab; update
those lines to use consistent tabs to match surrounding code style and avoid
checkpatch.pl warnings, ensuring the block under the while (references: rk_pcie,
time_before, jiffies, start, timeout, dev_get_drvdata) is indented uniformly.
- Around line 34-36: The indentation for the new struct field is using spaces
instead of the kernel-required tab; update the declaration of struct completion
probe_done so its leading indentation is a single tab (matching the surrounding
fields) and ensure alignment with the existing bool probe_ok field (symbols:
probe_ok, probe_done) to satisfy kernel coding style and avoid checkpatch.pl
warnings.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa8268f2-a4f8-4813-ac4c-ee71a05da668

📥 Commits

Reviewing files that changed from the base of the PR and between cb9509e and 9d3d5fc.

📒 Files selected for processing (1)
  • patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch

Signed-off-by: AlomeProg <alomeprog@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch (1)

48-49: Good placement of init_completion, but fix indentation.

The initialization is correctly placed immediately after the allocation check, ensuring probe_done is initialized before any error path that could call complete_all(). However, lines 48-49 use spaces instead of tabs for indentation.

Proposed fix

Ensure the added lines use a single tab character for indentation to match kernel coding style.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch` around lines 48 - 49,
The new init_completion call for rk_pcie->probe_done is correctly placed but is
indented with spaces; update the added line so it uses a single tab character
for indentation to match kernel style (adjust the line containing
init_completion(&rk_pcie->probe_done);), ensuring the rest of the surrounding
block keeps tab-based alignment with neighboring statements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch`:
- Around line 48-49: The new init_completion call for rk_pcie->probe_done is
correctly placed but is indented with spaces; update the added line so it uses a
single tab character for indentation to match kernel style (adjust the line
containing init_completion(&rk_pcie->probe_done);), ensuring the rest of the
surrounding block keeps tab-based alignment with neighboring statements.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c432ce02-9fea-47f9-9550-73c38b8532ad

📥 Commits

Reviewing files that changed from the base of the PR and between 9d3d5fc and 2656ad0.

📒 Files selected for processing (1)
  • patch/kernel/rk35xx-vendor-6.1/fix-threaded-init.patch

@EvilOlaf
Copy link
Member

Since we have our own rockchip bsp linux repo, you may want to pr directly against it: https://github.com/armbian/linux-rockchip

@AlomeProg
Copy link
Author

Thanks for the quick reply. I created a PR with my changes in linux-rockchip, here is the link: "armbian/linux-rockchip#458". I think it's worth closing this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

05 Milestone: Second quarter release Hardware Hardware related like kernel, U-Boot, ... Needs review Seeking for review Patches Patches related to kernel, U-Boot, ... size/medium PR with more then 50 and less then 250 lines

Development

Successfully merging this pull request may close these issues.

3 participants