-
Notifications
You must be signed in to change notification settings - Fork 7
Make Mbient.close() safe against live streaming callbacks #650
Description
Background
At normal session end, mbients are still actively streaming data when TerminateServerRequest arrives. Confirmed by tracing the session flow:
- CTR sends
PerformTaskRequestper task, followed byTasksFinished - Per-task
StopRecordingcallsstop_recording()→device_manager.stop_recording_devices()→_get_camera_devices()— only cameras are stopped - Mbients run continuously from
create_streams()through toclose_streams()on terminate - When the GUI shows "safe to terminate" and the user clicks terminate, mbient data callbacks are still firing on native threads
This means our current Mbient.close() flow (after PR #648):
- Unsubscribe from the fusion processor — while data callbacks are still firing on native threads
stop()→stop_inertial_sampling()→disable_inertial_sampling()- Disconnect
Step 1 races against in-flight native callbacks. PR #648 fixed the SIGILL from unsubscribing after disable, but did not address the unsubscribe-vs-callback race.
Three improvements
1. Stop data flow before unsubscribe (local, highest impact)
Split stop() into two phases with unsubscribe between them:
def close(self) -> None:
# Phase 1: stop data generation (processor still valid, callbacks drain)
try:
if self.streaming:
self.device_wrapper.stop_inertial_sampling()
time.sleep(0.05) # let in-flight callbacks drain
except Exception: ...
# Phase 2: unsubscribe (no active callbacks, processor still valid)
for signal in self.subscribed_signals:
try:
libmetawear.mbl_mw_datasignal_unsubscribe(signal)
except Exception: ...
self.subscribed_signals.clear()
# Phase 3: power down sensors (nothing references processor)
try:
if self.streaming:
self.device_wrapper.disable_inertial_sampling()
self.streaming = False
self.state = DeviceState.STOPPED
except Exception: ...
self.disconnect()This matches the safest MetaWear lifecycle: stop → drain → unsubscribe → disable.
2. Call mbl_mw_dataprocessor_remove after unsubscribe (local, needs verification)
The fusion processor created at mbient.py:304 via mbl_mw_dataprocessor_fuser_create is never freed. mbl_mw_datasignal_unsubscribe only detaches the callback — the processor object continues to exist on the native side. The SDK exposes mbl_mw_dataprocessor_remove(processor) for this (present in MetaWear cbindings).
One-line addition in the unsubscribe loop:
for signal in self.subscribed_signals:
try:
libmetawear.mbl_mw_datasignal_unsubscribe(signal)
libmetawear.mbl_mw_dataprocessor_remove(signal)
except Exception: ...Needs verification against MetaWear SDK examples before shipping — calling this incorrectly could introduce a new crash mode.
3. Stop mbients before close_streams (architectural, low risk)
Currently the GUI shows "safe to terminate" purely because tasks are done — but mbient streams are still live. We should explicitly stop mbients before entering the crash-prone close_streams() path.
Simplest approach: the ACQ terminate handler stops mbients synchronously before calling close_streams():
if device_manager is not None:
for name, stream in device_manager.get_mbient_streams().items():
try:
stream.stop() # stops + disables, sets streaming=False
except Exception as e:
logger.error(f'Error stopping mbient {name}: {e}')
device_manager.close_streams()By the time close_streams() runs, mbients are already stopped and no callbacks are in flight.
Alternative: new StopWearables message from CTR to each ACQ before TerminateServerRequest. More elaborate but cleaner separation.
Recommendation
Ship #1 and #3 together. They combine to give belt-and-suspenders protection:
- Read task parameters from database #3 ensures mbients are stopped before close_streams() runs
- Folder structure #1 ensures close() itself handles a live-streaming mbient safely (covers the edge case where Read task parameters from database #3 failed for some reason)
Defer #2 until we can verify the correct usage against MetaWear SDK examples — it could be the right cleanup or could introduce a new crash.
Related
- Harden shutdown path against native crashes and SSH tunnel leaks #648 (merged) — prior close() reordering and shutdown work
docs/arch/crash_analysis_mbient_2026-04.md— crash site 4 context- Switch to file logging before device teardown on terminate #649 — fallback file logging during device teardown (complementary)