Conversation
|
Thanks for the PR @agdhruv. Can you share some sample results here? |
Modifications from original PR
ResultsAdded a new example The optimization progress is shown below.
Note The optimization achieved a significant jump from a baseline score of 0.0 to 0.875 in just two iterations. It would probably go even higher, but I didn't run GEPA for longer to limit development cost. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| {"input": ([(1, 2), (3, 4), (5, 6)],), "expected": [(1, 2), (3, 4), (5, 6)]}, | ||
| {"input": ([(1, 10), (2, 3), (4, 5)],), "expected": [(1, 10)]}, | ||
| {"input": ([(5, 7), (1, 3), (2, 4)],), "expected": [(1, 4), (5, 7)]}, | ||
| ], |
There was a problem hiding this comment.
JSON serialization loses tuple types in merge_intervals tests
Medium Severity
The merge_intervals test cases use tuples for interval pairs (e.g., expected: [(1, 6), (8, 10)]), but serialize_test_cases converts them via json.dumps, which turns all tuples into JSON arrays. When deserialize_test_cases uses json.loads, these become Python lists ([[1, 6], [8, 10]]). Since (1, 6) != [1, 6] in Python, the result == test["expected"] comparison in run_tests always fails for correct tuple-returning implementations, causing 5 of 6 merge_intervals test cases to always score zero.
Additional Locations (1)
| return_completions_object=self.return_completions_object, | ||
| ) | ||
| self.backend = backend | ||
| self.backend_params = backend_params |
There was a problem hiding this comment.
Mutated backend_params stored, polluting clone's state
Low Severity
self.backend_params = backend_params is assigned after _RequestProcessorFactory.create() mutates the backend_params dict in-place (adding model, generation_params, and return_completions_object keys). The clone method then deepcopy-ies this polluted dict. While currently harmless since the factory overwrites these keys, the stored backend_params doesn't reflect the user's original input, and any future change to the factory's mutation behavior could break clone.
Additional Locations (1)
|
@shreyaspimpalgaonkar thoughts? |


This PR introduces an integration between Curator and GEPA to enable automated prompt optimization.
Key Changes
The PR implements a new "optimizer" block that allows users to refine their LLM prompts through an evolutionary search process rather than manual trial and error.
gepaas an optional dependency inpyproject.toml.CuratorAdapter) that maps Curator'sLLMclasses to GEPA’s optimization loop. It handles:EvaluationResultto pair numerical scores with natural language feedback, which helps the "Reflection LLM" understand how to improve the prompt.Example Workflow
The PR includes a comprehensive example (
gepa_example.py) demonstrating how to optimize a math word problem generator:curator.LLMclass with a basic "seed" prompt.Other Approaches Considered
Points for Discussion
reflection_lmrequired bygepa.compile()follows GEPA's model naming schema, which may differ from Curator's. How do we deal with this?Given the design decisions outlined above, this implementation should be seen as a first step. My goal is to provide a functional foundation so the team can experiment with the workflow. This will help us determine the most intuitive API to expose to users in the final release.
Note
Medium Risk
Introduces a new execution path that repeatedly calls LLMs and temporarily forces
CURATOR_DISABLE_CACHE, plus a newLLM.clone()construction path; incorrect cloning or cache toggling could cause subtle behavior differences or performance issues.Overview
Adds an optional GEPA integration to Curator by introducing
CuratorAdapter(curator.blocks.gepa) that lets GEPA optimize a CuratorLLM’ssystem_prompt, run candidate prompts against a dataset, score via a user-provided metric, and optionally emit reflection trajectories.Extends
LLMwith aclone()method (and storesbackend/backend_params) so the adapter can create per-candidate LLM instances; also updatescurator.blocksexports to be conditional when theoptimizerextra (new optionalgepadependency) is not installed. Includes newexamples/optimizer/*showing GEPA optimization for an LLM-judged math problem generator and for code generation scored by test execution + static style checks.Written by Cursor Bugbot for commit 4c7e800. This will update automatically on new commits. Configure here.