fix(ndt_scan_matcher): remove setInputSource#837
fix(ndt_scan_matcher): remove setInputSource#837SakodaShintaro wants to merge 5 commits intoautowarefoundation:mainfrom
setInputSource#837Conversation
Signed-off-by: Shintaro Sakoda <shintaro.sakoda@tier4.jp>
|
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #837 +/- ##
==========================================
+ Coverage 50.16% 50.65% +0.49%
==========================================
Files 356 349 -7
Lines 22892 22309 -583
Branches 10139 9908 -231
==========================================
- Hits 11483 11300 -183
+ Misses 10305 9946 -359
+ Partials 1104 1063 -41
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Shintaro Sakoda <shintaro.sakoda@tier4.jp>
Signed-off-by: Shintaro Sakoda <shintaro.sakoda@tier4.jp>
sasakisasaki
left a comment
There was a problem hiding this comment.
Reviewed the code itself. Looks good. I'll do tests by using my small robot. Please wait for a while to finish the review.
|
@SakodaShintaro Thank you for this PR. Let me summarize your PR.
Because the copy operation on shared ptr is super fast, the chance of those two operations overlap is super rare. That may explain why the error is very hard to re-create on local PCs. But I have a question: if we want those two operations to not overlap, why not just use a mutex for every NDT structure? The mutex is acquired when the ndt_ptr is copied, so . Is that the easier way to fix the issue? |
Description
Problem
The NDT scan matcher node occasionally crashes, and a suspected data race between the sensor callback thread and the map update timer callback thread is a likely cause.
(
autoware_core/localization/autoware_ndt_scan_matcher/src/map_update_module.cpp
Line 221 in bd6dcc6
and
autoware_core/localization/autoware_ndt_scan_matcher/src/ndt_scan_matcher_core.cpp
Line 415 in bd6dcc6
)
MultiGridNormalDistributionsTransforminherits frompcl::Registration, which stores the input source point cloud as a member variable (input_; shared_ptr). The sensor callback writes to this shared_ptr viandt_ptr_->setInputSource()underndt_ptr_mtx_, while the map update module copies the entire NDT object (*secondary_ndt_ptr_ = *ndt_ptr_) without holding the mutex. If these operations overlap, concurrent read+write on a shared_ptr could corrupt the reference count, potentially leading to double-free or use-after-free crashes.Solution
Remove
pcl::Registrationinheritance fromMultiGridNormalDistributionsTransformand pass the input source point cloud as an argument toalign()instead of storing it inside the NDT class. This eliminates the shared_ptr member from the NDT object, so the unsynchronized copy in the map update module no longer involves any shared_ptr operations.How was this PR tested?
Notes for reviewers
None.
Interface changes
None.
Effects on system behavior
None.