Replies: 1 comment
-
|
Looks like @andreatgretel has started looking into this as part of issue #265! Please feel free to continue the discussion here or as part of the issue if you have any other questions / feedback 🙌 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working on a plugin (https://github.com/JaoMarcos/data_designer_lambda_column
) and ran into an issue with the strict verification in the DatasetBatchManager.update_records function. Currently, it enforces that the number of incoming records matches the current buffer size.
DataDesigner/packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/dataset_batch_manager.py
Line 194 in 184348a
The Use Case I need to support cases where a single input record produces multiple output records (1:N), essentially "exploding" the dataframe.
The main driver for this is cost and efficiency with LLMs. For complex prompts with large input contexts, if I need multiple variations (e.g., "Generate 5 variations of X"), it is significantly cheaper and faster to ask the model to generate all 5 in a single API call rather than making 5 separate calls with the same large input.
Generating them in a single pass also often improves quality/variance, as the model has "in-context" awareness of the other variations it is generating, preventing duplicates.
Question What is the best way to handle this in DatasetBatchManager?
Beta Was this translation helpful? Give feedback.
All reactions