feat: auto-generate figure captions from final pipeline output by dataCenter430 · Pull Request #125 · llmsresearch/paperbanana

dataCenter430 · 2026-03-26T04:29:15Z

Summary

Adds automatic figure caption generation as a post-pipeline step, enabling users to get a publication-ready 1–3 sentence caption alongside their generated diagram without any extra manual work.

Closes #98

Why

The pipeline already has everything needed to write a caption — the source context, the communicative intent, the final styled description, and the output image itself — but users still had to write the caption by hand after the run finished.
This adds a single optional VLM+vision call at the end that does exactly that.
Disabled by default to avoid unnecessary API costs.

Changes

New agent paperbanana/agents/caption.py:
- CaptionAgent makes one VLM+vision call after the final iteration
- Errors degrade gracefully to None — the pipeline never fails because of a caption
New prompt templates prompts/diagram/caption.txt and prompts/plot/caption.txt:
- Enforces 1–3 sentence academic style, no figure numbers, no meta-language
Updated GenerationOutput model with optional generated_caption field
New setting generate_caption: bool = False in Settings and YAML config support
Pipeline (generate() and continue_run()): calls CaptionAgent after final image
is written; saves caption to metadata.json; adds caption_seconds to timing block
CLI --generate-caption flag added to both generate and plot commands;
prints caption below the output path when set
MCP generate_diagram and generate_plot expose generate_caption param;
returns [TextContent(caption), Image] when enabled

Usage

# Caption enabled
paperbanana generate --input method.txt --caption "Overview of our framework" \
  --generate-caption

# Output:
# ✓ Done! 42.3s total · 2 iterations
#   Output: outputs/run_20260326_123456_abc123/final_output.png
#   Run ID: run_20260326_123456_abc123

…aptions

dataCenter430 · 2026-03-26T04:32:40Z

Hi, @dippatel1994
Sorry to ping you, would you please review the PR?
If you wonder how it works, I can attach a short video that shows the behavior and usage.
I really appreciate your review with this.

dippatel1994

Good feature design (opt-in, graceful degradation), but CI is fully broken:

Missing prompt template files — prompts/diagram/caption.txt and prompts/plot/caption.txt are referenced in code and tests but not included in the PR. This causes FileNotFoundError in 9 test failures.
7 lint violations — Two E501 (line too long in cli.py), two extraneous f-string prefixes (no placeholders), two isort violations in test file, one unused import (import time as _time).
MCP return type change is breaking — generate_diagram/generate_plot currently return Image. With caption enabled they return list[TextContent | Image]. MCP clients expecting a single Image will break. Needs a cleaner approach (always return same shape, or separate endpoint).
40 lines of duplicated caption logic — Identical block in generate() and continue_run(). Extract to a private method.

Please add the missing prompt files and fix lint first, then address the MCP contract issue.

dataCenter430 · 2026-04-02T23:56:25Z

Good catch. 👍
Ookay, I will fix the errors including lint as well.
Thank you.

dataCenter430 · 2026-04-03T04:03:56Z

All fixes were finished completely.
Now it will work properly as you wanted.
Thanks. 👍

dippatel1994

All 4 points addressed. Prompt templates added, lint fixed, MCP return type preserved via PNG metadata embedding, caption logic extracted to private method. CI fully green. LGTM.

feat: add CaptionAgent for auto-generating publication-ready figure c…

43d46a4

…aptions

dippatel1994 requested changes Apr 2, 2026

View reviewed changes

fix: address review and CI failures

e4ba4dc

dataCenter430 requested a review from dippatel1994 April 3, 2026 03:53

dippatel1994 approved these changes Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-generate figure captions from final pipeline output#125

feat: auto-generate figure captions from final pipeline output#125
dataCenter430 wants to merge 2 commits intollmsresearch:mainfrom
dataCenter430:feat/auto-generate-figure-captions-from-final-pipeline-output

dataCenter430 commented Mar 26, 2026

Uh oh!

dataCenter430 commented Mar 26, 2026 •

edited

Loading

Uh oh!

dippatel1994 left a comment

Uh oh!

dataCenter430 commented Apr 2, 2026

Uh oh!

dataCenter430 commented Apr 3, 2026

Uh oh!

dippatel1994 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dataCenter430 commented Mar 26, 2026

Uh oh!

dataCenter430 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dippatel1994 left a comment

Choose a reason for hiding this comment

Uh oh!

dataCenter430 commented Apr 2, 2026

Uh oh!

dataCenter430 commented Apr 3, 2026

Uh oh!

dippatel1994 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dataCenter430 commented Mar 26, 2026 •

edited

Loading