Skip to content

feat: auto-generate figure captions from final pipeline output#125

Open
dataCenter430 wants to merge 2 commits intollmsresearch:mainfrom
dataCenter430:feat/auto-generate-figure-captions-from-final-pipeline-output
Open

feat: auto-generate figure captions from final pipeline output#125
dataCenter430 wants to merge 2 commits intollmsresearch:mainfrom
dataCenter430:feat/auto-generate-figure-captions-from-final-pipeline-output

Conversation

@dataCenter430
Copy link
Copy Markdown

Summary

Adds automatic figure caption generation as a post-pipeline step, enabling users to get a publication-ready 1–3 sentence caption alongside their generated diagram without any extra manual work.

Closes #98

Why

The pipeline already has everything needed to write a caption — the source context, the communicative intent, the final styled description, and the output image itself — but users still had to write the caption by hand after the run finished.
This adds a single optional VLM+vision call at the end that does exactly that.
Disabled by default to avoid unnecessary API costs.

Changes

  • New agent paperbanana/agents/caption.py:

    • CaptionAgent makes one VLM+vision call after the final iteration
    • Errors degrade gracefully to None — the pipeline never fails because of a caption
  • New prompt templates prompts/diagram/caption.txt and prompts/plot/caption.txt:

    • Enforces 1–3 sentence academic style, no figure numbers, no meta-language
  • Updated GenerationOutput model with optional generated_caption field

  • New setting generate_caption: bool = False in Settings and YAML config support

  • Pipeline (generate() and continue_run()): calls CaptionAgent after final image
    is written; saves caption to metadata.json; adds caption_seconds to timing block

  • CLI --generate-caption flag added to both generate and plot commands;
    prints caption below the output path when set

  • MCP generate_diagram and generate_plot expose generate_caption param;
    returns [TextContent(caption), Image] when enabled

Usage

# Caption enabled
paperbanana generate --input method.txt --caption "Overview of our framework" \
  --generate-caption

# Output:
# ✓ Done! 42.3s total · 2 iterations
#   Output: outputs/run_20260326_123456_abc123/final_output.png
#   Run ID: run_20260326_123456_abc123

@dataCenter430
Copy link
Copy Markdown
Author

dataCenter430 commented Mar 26, 2026

Hi, @dippatel1994
Sorry to ping you, would you please review the PR?
If you wonder how it works, I can attach a short video that shows the behavior and usage.
I really appreciate your review with this.

Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature design (opt-in, graceful degradation), but CI is fully broken:

  1. Missing prompt template filesprompts/diagram/caption.txt and prompts/plot/caption.txt are referenced in code and tests but not included in the PR. This causes FileNotFoundError in 9 test failures.

  2. 7 lint violations — Two E501 (line too long in cli.py), two extraneous f-string prefixes (no placeholders), two isort violations in test file, one unused import (import time as _time).

  3. MCP return type change is breakinggenerate_diagram/generate_plot currently return Image. With caption enabled they return list[TextContent | Image]. MCP clients expecting a single Image will break. Needs a cleaner approach (always return same shape, or separate endpoint).

  4. 40 lines of duplicated caption logic — Identical block in generate() and continue_run(). Extract to a private method.

Please add the missing prompt files and fix lint first, then address the MCP contract issue.

@dataCenter430
Copy link
Copy Markdown
Author

Good catch. 👍
Ookay, I will fix the errors including lint as well.
Thank you.

@dataCenter430
Copy link
Copy Markdown
Author

All fixes were finished completely.
Now it will work properly as you wanted.
Thanks. 👍

Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 4 points addressed. Prompt templates added, lint fixed, MCP return type preserved via PNG metadata embedding, caption logic extracted to private method. CI fully green. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Auto-generate figure captions from the final pipeline output

2 participants