-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[BUG] Mem leak handles in scheduler #5590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Fix Memory Leak in Scheduler by Tracking Task Handles and Adding Cleanup Guard This pull request addresses a memory leak in the scheduler module by overhauling how task handles are managed. The implementation transitions from storing a Key Changes• Replaced Affected Areas• This summary was automatically generated by @propel-code-bot |
async fn test_handle_cleaned_up() { | ||
let system = System::new(); | ||
let counter = Arc::new(AtomicUsize::new(0)); | ||
let handles_empty_after = Arc::new(AtomicBool::new(false)); | ||
let component = OneMessageComponent::new(10, counter.clone(), handles_empty_after.clone()); | ||
let _handle = system.start_component(component); | ||
// Wait for the 100ms schedule to trigger | ||
tokio::time::sleep(Duration::from_millis(500)).await; | ||
// yield to allow the component to process the messages | ||
tokio::task::yield_now().await; | ||
assert!(handles_empty_after.load(Ordering::SeqCst)); | ||
// We should have scheduled the message once | ||
system.join().await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CriticalError]
There's a potential race condition in this test. The test verifies that the handles
map is empty by checking a flag set from within the OneMessageComponent::handle
method.
However, the handle
method runs in the component's task, while the HandleGuard
's drop
implementation (which cleans up the map) runs at the end of the scheduler's spawned task. These are two different tasks, and their execution order isn't guaranteed. It's possible for handle
to be called and check is_empty()
before the scheduler task has finished and removed its handle from the map. This could lead to a flaky test.
A more robust approach would be to check the state of the scheduler's handles
map directly from the test function. This provides a deterministic way to verify the state before and after the scheduled task has executed.
With the handles
field made pub(crate)
(as suggested in another comment), you could rewrite the test like this:
#[tokio::test]
async fn test_handle_cleaned_up() {
let system = System::new();
let counter = Arc::new(AtomicUsize::new(0));
// The OneMessageComponent can be simplified to not need `handles_empty_after`
let component = OneMessageComponent::new(10, counter.clone());
let handle = system.start_component(component);
// Allow on_start to run and schedule the task.
tokio::task::yield_now().await;
assert_eq!(handle.ctx.scheduler.handles.read().len(), 1, "Handle should be present after scheduling");
// Wait for the schedule to trigger and the task to be cleaned up.
tokio::time::sleep(Duration::from_millis(500)).await;
assert!(handle.ctx.scheduler.handles.read().is_empty(), "Handles map should be empty after task completion");
assert_eq!(counter.load(Ordering::SeqCst), 1, "Message should have been handled once");
system.join().await;
}
This would also allow simplifying OneMessageComponent
by removing the handles_empty_after
field and its related logic.
Context for Agents
[**CriticalError**]
There's a potential race condition in this test. The test verifies that the `handles` map is empty by checking a flag set from within the `OneMessageComponent::handle` method.
However, the `handle` method runs in the component's task, while the `HandleGuard`'s `drop` implementation (which cleans up the map) runs at the end of the scheduler's spawned task. These are two different tasks, and their execution order isn't guaranteed. It's possible for `handle` to be called and check `is_empty()` *before* the scheduler task has finished and removed its handle from the map. This could lead to a flaky test.
A more robust approach would be to check the state of the scheduler's `handles` map directly from the test function. This provides a deterministic way to verify the state before and after the scheduled task has executed.
With the `handles` field made `pub(crate)` (as suggested in another comment), you could rewrite the test like this:
```rust
#[tokio::test]
async fn test_handle_cleaned_up() {
let system = System::new();
let counter = Arc::new(AtomicUsize::new(0));
// The OneMessageComponent can be simplified to not need `handles_empty_after`
let component = OneMessageComponent::new(10, counter.clone());
let handle = system.start_component(component);
// Allow on_start to run and schedule the task.
tokio::task::yield_now().await;
assert_eq!(handle.ctx.scheduler.handles.read().len(), 1, "Handle should be present after scheduling");
// Wait for the schedule to trigger and the task to be cleaned up.
tokio::time::sleep(Duration::from_millis(500)).await;
assert!(handle.ctx.scheduler.handles.read().is_empty(), "Handles map should be empty after task completion");
assert_eq!(counter.load(Ordering::SeqCst), 1, "Message should have been handled once");
system.join().await;
}
```
This would also allow simplifying `OneMessageComponent` by removing the `handles_empty_after` field and its related logic.
File: rust/system/src/scheduler.rs
Line: 350
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it to you to decide if this is worth the lift.
Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments. Neither needs to be addressed, but I'd address both if they were asked of me.
async fn test_handle_cleaned_up() { | ||
let system = System::new(); | ||
let counter = Arc::new(AtomicUsize::new(0)); | ||
let handles_empty_after = Arc::new(AtomicBool::new(false)); | ||
let component = OneMessageComponent::new(10, counter.clone(), handles_empty_after.clone()); | ||
let _handle = system.start_component(component); | ||
// Wait for the 100ms schedule to trigger | ||
tokio::time::sleep(Duration::from_millis(500)).await; | ||
// yield to allow the component to process the messages | ||
tokio::task::yield_now().await; | ||
assert!(handles_empty_after.load(Ordering::SeqCst)); | ||
// We should have scheduled the message once | ||
system.join().await; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it to you to decide if this is worth the lift.
#[derive(Clone, Debug)] | ||
pub struct Scheduler { | ||
handles: Arc<RwLock<Vec<SchedulerTaskHandle>>>, | ||
handles: Arc<RwLock<HashMap<TaskId, SchedulerTaskHandle>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend not using RwLock here unless we have a heavy read path. An RwLock is typically more expensive per access if it's 100% write, and I only see read calls from tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thats fine, can do, was just leaving it as it was before to minimize churn.
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
Added a test to ensure map is empty after schedule finished
pytest
for python,yarn test
for js,cargo test
for rustMigration plan
None required
Observability plan
None required
Documentation Changes
None required