Skip to content

fix(ndt_scan_matcher): remove setInputSource#837

Draft
SakodaShintaro wants to merge 5 commits intoautowarefoundation:mainfrom
tier4:fix/ndt_scan_matcher/remove_setInputSource
Draft

fix(ndt_scan_matcher): remove setInputSource#837
SakodaShintaro wants to merge 5 commits intoautowarefoundation:mainfrom
tier4:fix/ndt_scan_matcher/remove_setInputSource

Conversation

@SakodaShintaro
Copy link
Contributor

@SakodaShintaro SakodaShintaro commented Feb 12, 2026

Description

Problem

The NDT scan matcher node occasionally crashes, and a suspected data race between the sensor callback thread and the map update timer callback thread is a likely cause.

(

and

ndt_ptr_->setInputSource(sensor_points_in_baselink_frame);

)

MultiGridNormalDistributionsTransform inherits from pcl::Registration, which stores the input source point cloud as a member variable (input_; shared_ptr). The sensor callback writes to this shared_ptr via ndt_ptr_->setInputSource() under ndt_ptr_mtx_, while the map update module copies the entire NDT object (*secondary_ndt_ptr_ = *ndt_ptr_) without holding the mutex. If these operations overlap, concurrent read+write on a shared_ptr could corrupt the reference count, potentially leading to double-free or use-after-free crashes.

Solution

Remove pcl::Registration inheritance from MultiGridNormalDistributionsTransform and pass the input source point cloud as an argument to align() instead of storing it inside the NDT class. This eliminates the shared_ptr member from the NDT object, so the unsynchronized copy in the map update module no longer involves any shared_ptr operations.

How was this PR tested?

  • logging simulator

Notes for reviewers

None.

Interface changes

None.

Effects on system behavior

None.

Signed-off-by: Shintaro Sakoda <shintaro.sakoda@tier4.jp>
@SakodaShintaro SakodaShintaro added run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) component:localization Vehicle's position determination in its environment. (auto-assigned) labels Feb 12, 2026
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.65%. Comparing base (e232aec) to head (d783af3).

Files with missing lines Patch % Lines
...can_matcher/src/ndt_omp/multigrid_ndt_omp_impl.hpp 42.85% 15 Missing and 1 partial ⚠️
...oware/ndt_scan_matcher/ndt_omp/multigrid_ndt_omp.h 44.44% 4 Missing and 1 partial ⚠️
...are_ndt_scan_matcher/src/ndt_scan_matcher_core.cpp 28.57% 5 Missing ⚠️
...t_scan_matcher/src/ndt_omp/estimate_covariance.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #837      +/-   ##
==========================================
+ Coverage   50.16%   50.65%   +0.49%     
==========================================
  Files         356      349       -7     
  Lines       22892    22309     -583     
  Branches    10139     9908     -231     
==========================================
- Hits        11483    11300     -183     
+ Misses      10305     9946     -359     
+ Partials     1104     1063      -41     
Flag Coverage Δ *Carryforward flag
daily-humble 50.49% <ø> (-0.02%) ⬇️ Carriedforward from ce5e832
daily-jazzy 49.92% <ø> (-0.07%) ⬇️ Carriedforward from ce5e832
differential-humble 42.45% <40.00%> (?)
total 50.22% <ø> (+0.03%) ⬆️ Carriedforward from ce5e832
total-humble 50.48% <ø> (-0.06%) ⬇️ Carriedforward from ce5e832
total-jazzy 49.93% <ø> (-0.10%) ⬇️ Carriedforward from ce5e832

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shintaro Sakoda <shintaro.sakoda@tier4.jp>
Signed-off-by: Shintaro Sakoda <shintaro.sakoda@tier4.jp>
@sasakisasaki sasakisasaki self-assigned this Feb 12, 2026
Copy link
Contributor

@sasakisasaki sasakisasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the code itself. Looks good. I'll do tests by using my small robot. Please wait for a while to finish the review.

@anhnv3991
Copy link
Contributor

anhnv3991 commented Mar 3, 2026

@SakodaShintaro Thank you for this PR. Let me summarize your PR.

  • The copy *secondary_ndt_ptr_ = *ndt_ptr_; (map_update_module.cpp) copies the shared pointer to the source point cloud in the NDT structure
  • The setInputSource() changes the shared pointer to the source point cloud in the NDT structure
  • If the two operations above occurs at the same time, they may mess up the shared pointer to the source point cloud. This may result in double-free memory or segmentation fault errors.
  • By removing the setInputSource, we may avoid the issue above, thus fixing the errors.

Because the copy operation on shared ptr is super fast, the chance of those two operations overlap is super rare. That may explain why the error is very hard to re-create on local PCs.

But I have a question: if we want those two operations to not overlap, why not just use a mutex for every NDT structure? The mutex is acquired when the ndt_ptr is copied, so . Is that the easier way to fix the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:localization Vehicle's position determination in its environment. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants