-
Notifications
You must be signed in to change notification settings - Fork 51
feat: Expose shutdown options as RunConfig #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR exposes auto-shutdown configuration options to the DataDesigner.create() method, allowing users to control or disable the early shutdown feature that terminates dataset generation when error rates exceed a threshold. Previously, these parameters were hardcoded and could cause large-scale jobs to fail due to temporary error spikes.
- Adds three new parameters to
create():enable_early_shutdown,shutdown_error_rate, andshutdown_error_window - Threads these parameters through
_create_dataset_builder()toColumnWiseDatasetBuilder - Implements conditional logic to disable early shutdown by setting
shutdown_error_rate=1.0when disabled
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/data_designer/interface/data_designer.py | Adds three new parameters to create() method and passes them through to dataset builder initialization |
| src/data_designer/engine/dataset_builders/column_wise_builder.py | Accepts and stores shutdown parameters, applies them when creating concurrent thread executor |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I have read the DCO document and I hereby sign the DCO. |
johnnygreco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for exposing these parameters @eric-tramel!
One thing I'm not sure about, though, is whether we want to put these on create just yet. Let's check with @mikeknep on whether there are potential issues with the MS-side, which we want to mirror as much as possible.
If we want to hold off on the create method, we can still expose them with a helper method like we do with the buffer size.
src/data_designer/engine/dataset_builders/column_wise_builder.py
Outdated
Show resolved
Hide resolved
nabinchha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit. Do we also need to expose it for validation generators.
|
Actually, I like @johnnygreco's suggestion to expose public setters for DataDesigner and keep the |
create(...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Resolves merge conflicts by combining: - run_config support from this branch (early shutdown control) - seed reader refactoring from main (SeedSource, SeedReaderRegistry) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Okay, I've made some significant changes to the PR to try to address concerns. The |
This is now solved in the PR update. |
johnnygreco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛸
Description
This feature exposes early shutdown settings via a new
RunConfigclass. Previously, these parameters were hard-coded, so there was no way to adjust them on a case-by-case basis.Users can now configure
RunConfigand apply it viaset_run_config(). Settings apply to both column generation and validation tasks.Why?
For large-scale generation tasks, streaks of malformed inputs, intermittent backpressure from servers, or bad luck can cause momentarily high error rates. Given the tight hardcoded default window, large jobs can be blocked by unpredictable short runs of errors. Users should have the ability to turn off this feature and accept dropped records without killing entire jobs.
Usage
Closes #185