feat: auto-generate figure captions from final pipeline output#125
Conversation
|
Hi, @dippatel1994 |
dippatel1994
left a comment
There was a problem hiding this comment.
Good feature design (opt-in, graceful degradation), but CI is fully broken:
-
Missing prompt template files —
prompts/diagram/caption.txtandprompts/plot/caption.txtare referenced in code and tests but not included in the PR. This causes FileNotFoundError in 9 test failures. -
7 lint violations — Two E501 (line too long in cli.py), two extraneous f-string prefixes (no placeholders), two isort violations in test file, one unused import (
import time as _time). -
MCP return type change is breaking —
generate_diagram/generate_plotcurrently returnImage. With caption enabled they returnlist[TextContent | Image]. MCP clients expecting a single Image will break. Needs a cleaner approach (always return same shape, or separate endpoint). -
40 lines of duplicated caption logic — Identical block in
generate()andcontinue_run(). Extract to a private method.
Please add the missing prompt files and fix lint first, then address the MCP contract issue.
|
Good catch. 👍 |
|
All fixes were finished completely. |
dippatel1994
left a comment
There was a problem hiding this comment.
All 4 points addressed. Prompt templates added, lint fixed, MCP return type preserved via PNG metadata embedding, caption logic extracted to private method. CI fully green. LGTM.
Summary
Adds automatic figure caption generation as a post-pipeline step, enabling users to get a publication-ready 1–3 sentence caption alongside their generated diagram without any extra manual work.
Closes #98
Why
The pipeline already has everything needed to write a caption — the source context, the communicative intent, the final styled description, and the output image itself — but users still had to write the caption by hand after the run finished.
This adds a single optional VLM+vision call at the end that does exactly that.
Disabled by default to avoid unnecessary API costs.
Changes
New agent
paperbanana/agents/caption.py:CaptionAgentmakes one VLM+vision call after the final iterationNone— the pipeline never fails because of a captionNew prompt templates
prompts/diagram/caption.txtandprompts/plot/caption.txt:Updated
GenerationOutputmodel with optionalgenerated_captionfieldNew setting
generate_caption: bool = FalseinSettingsand YAML config supportPipeline (
generate()andcontinue_run()): callsCaptionAgentafter final imageis written; saves caption to
metadata.json; addscaption_secondsto timing blockCLI
--generate-captionflag added to bothgenerateandplotcommands;prints caption below the output path when set
MCP
generate_diagramandgenerate_plotexposegenerate_captionparam;returns
[TextContent(caption), Image]when enabledUsage