feat: add parameter sweep command for variant generation and ranking#118
feat: add parameter sweep command for variant generation and ranking#118claytonlin1110 wants to merge 5 commits intollmsresearch:mainfrom
Conversation
|
@dippatel1994 Would you please review this? |
|
@dippatel1994 please review |
|
@dippatel1994 Any update about this feature? |
dippatel1994
left a comment
There was a problem hiding this comment.
CI passes, nice work. A few things to fix:
-
Settingsconstructed per-variant inside the loop —load_dotenv()should be called once before the loop, and Settings should be built once then copied withmodel_copy(update=overrides)per variant. Currently re-parses YAML on every iteration. -
Missing
--pdf-pagesoption — Every other command that accepts--inputwith PDF support also exposes--pdf-pages. The sweep command callsload_methodology_source(input_path)without it. -
No non-dry-run test — Only
--dry-runand validation are tested. Add a test that mocks the pipeline and verifies the sweep report structure (status, ranked results, timing).test_ablate_retrieval_writes_reportis a good template.
Non-blocking: The quality proxy formula (100 - 12.5 * suggestions) is undocumented and fragile — consider at minimum documenting it in --help. Also missing --budget and --auto-download-data flags for parity with generate/batch.
|
Thank you for yuor feedback, @dippatel1994 |
dippatel1994
left a comment
There was a problem hiding this comment.
All 3 points addressed. Settings built once with model_copy per variant, --pdf-pages added, non-dry-run test added. CI green. LGTM.
|
Thanks @dippatel1994 |
Summary
Closes #119
paperbanana sweepCLI command to run a cartesian sweep across providers, models, iterations, and optimization/auto-refine modespaperbanana/core/sweep.pywith structured variant planning, CSV axis parsing, ranking, and summary helperssweep_<id>/variant_<id>/and writesweep_report.jsonwith per-variant status, runtime, and ranked resultsMotivation
One diagram depends on many settings (providers, models, iterations, optimize/auto-refine). Changing flags and comparing runs by hand is slow and easy to lose track of. A sweep runs those combinations in one go, writes a single ranked report, and supports dry-run so you can plan before spending API quota.