Skip to content

Conversation

@eric-tramel
Copy link
Contributor

@eric-tramel eric-tramel commented Jan 8, 2026

Description

This feature exposes early shutdown settings via a new RunConfig class. Previously, these parameters were hard-coded, so there was no way to adjust them on a case-by-case basis.

Users can now configure RunConfig and apply it via set_run_config(). Settings apply to both column generation and validation tasks.

Why?

For large-scale generation tasks, streaks of malformed inputs, intermittent backpressure from servers, or bad luck can cause momentarily high error rates. Given the tight hardcoded default window, large jobs can be blocked by unpredictable short runs of errors. Users should have the ability to turn off this feature and accept dropped records without killing entire jobs.

Usage

from data_designer.essentials import DataDesigner, RunConfig

dd = DataDesigner()

# Default behavior (unchanged)
dd.create(config, num_records=1000)

# Disable early shutdown entirely
dd.set_run_config(RunConfig(disable_early_shutdown=True))
dd.create(config, num_records=1000)

# Custom thresholds
dd.set_run_config(RunConfig(shutdown_error_rate=0.9, shutdown_error_window=50))
dd.create(config, num_records=1000)

Closes #185

@eric-tramel eric-tramel self-assigned this Jan 8, 2026
@eric-tramel eric-tramel marked this pull request as draft January 8, 2026 04:29
@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes auto-shutdown configuration options to the DataDesigner.create() method, allowing users to control or disable the early shutdown feature that terminates dataset generation when error rates exceed a threshold. Previously, these parameters were hardcoded and could cause large-scale jobs to fail due to temporary error spikes.

  • Adds three new parameters to create(): enable_early_shutdown, shutdown_error_rate, and shutdown_error_window
  • Threads these parameters through _create_dataset_builder() to ColumnWiseDatasetBuilder
  • Implements conditional logic to disable early shutdown by setting shutdown_error_rate=1.0 when disabled

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/data_designer/interface/data_designer.py Adds three new parameters to create() method and passes them through to dataset builder initialization
src/data_designer/engine/dataset_builders/column_wise_builder.py Accepts and stores shutdown parameters, applies them when creating concurrent thread executor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eric-tramel
Copy link
Contributor Author

I have read the DCO document and I hereby sign the DCO.

@eric-tramel eric-tramel marked this pull request as ready for review January 8, 2026 04:36
@eric-tramel eric-tramel added the enhancement New feature or request label Jan 8, 2026
Copy link
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for exposing these parameters @eric-tramel!

One thing I'm not sure about, though, is whether we want to put these on create just yet. Let's check with @mikeknep on whether there are potential issues with the MS-side, which we want to mirror as much as possible.

If we want to hold off on the create method, we can still expose them with a helper method like we do with the buffer size.

nabinchha
nabinchha previously approved these changes Jan 8, 2026
Copy link
Contributor

@nabinchha nabinchha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit. Do we also need to expose it for validation generators.

@nabinchha
Copy link
Contributor

Actually, I like @johnnygreco's suggestion to expose public setters for DataDesigner and keep the create(...) api as is.

@eric-tramel eric-tramel changed the title feat: Expose shutdown options to create(...) feat: Expose shutdown options as RunConfig Jan 8, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

eric-tramel and others added 6 commits January 8, 2026 13:44
Resolves merge conflicts by combining:
- run_config support from this branch (early shutdown control)
- seed reader refactoring from main (SeedSource, SeedReaderRegistry)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@eric-tramel
Copy link
Contributor Author

Okay, I've made some significant changes to the PR to try to address concerns. The create(...) is now left as is, but we have an additional helper for DataDesigner that takes a typed config to update run settings generically. This gives us a spot to put other user-controllable execution settings in the future.

@eric-tramel
Copy link
Contributor Author

One small nit. Do we also need to expose it for validation generators.

This is now solved in the PR update.

Copy link
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛸

@eric-tramel eric-tramel merged commit de417c8 into main Jan 8, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failure Threshold Shutdown Miscalculation

4 participants