-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
What happened?
When using MIPROv2 with metric_threshold parameter, the threshold is not applied to one of the bootstrap candidate sets (specifically the "unshuffled few-shot" case with seed == -1). This causes examples with metrics below the threshold to be incorrectly included as "full traces" during bootstrapping.
Steps to reproduce
- Configure MIPROv2 with
metric_threshold=1.0 - Run optimization with training data where examples have metrics < 1.0 (e.g., 0.8)
- Observe that bootstrapping reports success even though no examples should pass the threshold
Example code:
from dspy.teleprompt import MIPROv2
teleprompter = MIPROv2(
metric=your_metric,
prompt_model=teacher_lm,
task_model=student_lm,
metric_threshold=1.0, # ← Set threshold to 1.0
num_candidates=10,
max_bootstrapped_demos=4,
max_labeled_demos=4,
seed=42,
)
optimized_agent = teleprompter.compile(
base_agent,
trainset=trainset,
valset=valset,
num_trials=10,
)Expected output:
Bootstrapped 0 full traces after X examples...
Actual output (when examples have metric=0.8):
Bootstrapped 2 full traces after 2 examples...
Root cause
In dspy/teleprompt/utils.py, the function create_n_fewshot_demo_sets() creates multiple candidate sets with different seeds:
seed = -3: zero-shot (no bootstrap)seed = -2: labeled examples onlyseed = -1: unshuffled bootstrap ← BUG: metric_threshold not passedseed >= 0: shuffled bootstrap (metric_threshold is passed)
The bug is on lines ~382-391:
elif seed == -1:
# unshuffled few-shot
program = BootstrapFewShot(
metric=metric,
max_errors=max_errors,
max_bootstrapped_demos=max_bootstrapped_demos,
max_labeled_demos=max_labeled_demos,
teacher_settings=teacher_settings,
max_rounds=max_rounds,
# ❌ metric_threshold is NOT passed here!
)
program2 = program.compile(student, teacher=teacher, trainset=trainset_copy)Compare this to the correct implementation for seed >= 0 (lines ~394-403):
else:
# shuffled few-shot
rng.shuffle(trainset_copy)
size = rng.randint(min_num_samples, max_bootstrapped_demos)
teleprompter = BootstrapFewShot(
metric=metric,
max_errors=max_errors,
metric_threshold=metric_threshold, # ✓ Correctly passed
max_bootstrapped_demos=size,
max_labeled_demos=max_labeled_demos,
teacher_settings=teacher_settings,
max_rounds=max_rounds,
)Expected behavior
The metric_threshold parameter should be passed to all BootstrapFewShot instances, including the unshuffled case (seed == -1).
Suggested fix
Add metric_threshold=metric_threshold to the BootstrapFewShot initialization in the seed == -1 case:
elif seed == -1:
# unshuffled few-shot
teleprompter = BootstrapFewShot(
metric=metric,
max_errors=max_errors,
metric_threshold=metric_threshold, # ← ADD THIS LINE
max_bootstrapped_demos=max_bootstrapped_demos,
max_labeled_demos=max_labeled_demos,
teacher_settings=teacher_settings,
max_rounds=max_rounds,
)
program2 = teleprompter.compile(student, teacher=teacher, trainset=trainset_copy)DSPy version
3.1.3