feat: expand reference dataset with 25 new diagrams across 6 new domain categories by dev-miro26 · Pull Request #117 · llmsresearch/paperbanana

dev-miro26 · 2026-03-24T17:33:29Z

Summary

Closes #90

The Retriever currently only has 13 reference diagrams across 4 categories (agent_reasoning, vision_perception, generative_learning, science_applications). Papers outside those domains get poor few-shot examples, which degrades Planner output.

This PR hand-picks 25 new reference diagrams from PaperBananaBench and adds 6 new domain categories, bringing the total to 38 examples across 10 categories.

New categories and entries

Category	Count	Example papers
`healthcare_medical`	5	CSBrain (EEG decoding), CHEFNMR (NMR molecular structure), C³M (gene + EHR), multiscale protein GNN, NOBLE (bio-informed neural operator)
`robotics_control`	4	DexGarmentLab (dexterous manipulation), DynaNav (visual navigation), ROBOT-R1 (RL embodied reasoning), RoboScape (physics-informed world model)
`nlp_language`	4	Adversarial paraphrasing, InSUR (instruction uncertainty), RRM (reward reasoning), SEAL (text anonymization)
`multimodal_fusion`	4	DynamicVerse (4D generation), FakeVLM (synthetic image detection), Cauvis (causal visual prompting), OmniResponse (multimodal conversation)
`systems_networking`	4	StarTrail (distributed GPU attention), GRIFFIN (speculative decoding), FreqExit (early-exit inference), IneqSearch (theorem proving pipeline)
`optimization_theory`	4	Geometric neural combinatorial optimization, TITAN (VQE parameter freezing), TensorRL-QAS (quantum circuit search), STRAP (spatio-temporal retrieval)

What changed

data/reference_sets/index.json — 25 new ReferenceExample entries following the existing schema (id, source_context, caption, image_path, category, aspect_ratio, structure_hints). Metadata bumped: version 2.0.0 → 3.0.0, total_examples 13 → 38, categories 4 → 10.
data/reference_sets/images/ — 25 new diagram images extracted from PaperBananaBench.
prompts/diagram/retriever.txt — Line 21 domain list extended with the 6 new domains so the VLM understands the expanded domain space during ranking.

Selection criteria

Each diagram was chosen because it:

Is clear and readable (no cluttered multi-panel figures)
Shows a distinct visual pattern the Planner can learn from (pipelines, block diagrams, multi-branch architectures, hierarchical layouts, loop diagrams)
Comes from a domain not already well-covered by the existing 13 references

Note on category naming

The existing curated_expansion.json on this branch uses slightly different category names for some overlapping concepts (e.g. systems_architecture vs systems_networking, multimodal_learning vs multimodal_fusion). These should be reconciled — happy to align in either direction based on reviewer preference.

Test plan

All 59 existing tests pass (pytest tests/test_pipeline/ tests/test_agents/ tests/test_reference/ tests/test_data/)
ReferenceStore loads all 38 entries and get_by_category returns correct counts for each new category
All 25 new images exist at their referenced image_path
Reviewer spot-checks a few new entries to verify diagram quality and category fit

dev-miro26 · 2026-03-24T17:34:32Z

@dippatel1994
Could you please check my first PR? Leave your feedback kindly.
I appreciate you.

dippatel1994 · 2026-03-25T02:43:26Z

Thanks @dev-miro26 venue-specific guidelines + reference.venue + CLI --venue are a clean addition, and the resolution order (venue dir → root files → embedded defaults) preserves backward compatibility. The curated manifest + preset plumbing is directionally right.

Before merge, please address:

preset=curated + missing/unreadable manifest: _load_curated_manifest() can return None, and _import_from_bench(..., manifest=None) currently imports the full benchmark. That’s a serious footgun — fail fast with a clear error instead of silently doing a full import.
metadata.target_total: not enforced in code while max_per_category is — either implement a cap or adjust the manifest/docs so users aren’t promised ~40 examples when the logic allows more.

Follow-ups (non-blocking): document order-dependent “balanced” sampling; consider a warning when --venue doesn’t match any venue file; add a few unit tests for manifest filtering and venue loading.

Happy to re-check after the curated/manifest behavior is tightened.

dev-miro26 · 2026-03-25T02:43:37Z

@dippatel1994
I am fixing this conflicts now. I will push updated result soon

dippatel1994 · 2026-03-25T02:45:15Z

Thanks @dev-miro26 love the passion! Appreciate your contribution to paperBana.

dev-miro26 · 2026-03-25T04:28:58Z

@dippatel1994
Could you please check this PR again?

dev-miro26 · 2026-03-25T04:43:06Z

@dippatel1994
I have discussed with @statxc for this issue.
I will update again soon.
Please wait a bit.
Thank you.

…r of curated methodology diagrams from 13 to 38, updated version to 3.0.0, and expanded categories. Added multiple new images related to various research topics.

dev-miro26 · 2026-03-25T05:44:14Z

@dippatel1994
I have update this PR.
As you know #89 was divided with 3 PRs( #90 #91 #99)
This PR is for the #90.
Please review and leave your feedback kindly.
Thank you.

dippatel1994

CI passes, good dataset expansion. Two things to fix:

Inconsistent ID format — Existing 13 entries use arxiv IDs (e.g., 2601.03570v1). Issue #90 explicitly says "id is the arxiv ID." New entries use pb_ref_42, pb_ref_24, etc. These show up as "Paper ID" in the retriever prompt — pb_ref_42 is less meaningful than an arxiv ID. Please use arxiv IDs.
Missing source_paper field — All 13 original entries include "source_paper". None of the 25 new entries have it. Add for consistency and provenance tracking.

Non-blocking: No tests added to validate the new entries load correctly. A lightweight test that loads real index.json and checks counts/image existence would prevent regressions.

dippatel1994

All 3 points addressed: arxiv IDs used, source_paper added, 10 tests added. CI green. LGTM.

dev-miro26 · 2026-04-02T21:25:45Z

Could you please merge this PR?
Or is there anything to update more?

dippatel1994 assigned dev-miro26 Mar 25, 2026

dippatel1994 added the enhancement New feature or request label Mar 25, 2026

statxc mentioned this pull request Mar 25, 2026

[Feature]: Expand built-in reference set and add multi-venue style support #89

Open

1 task

dev-miro26 changed the title ~~Add multi-venue style support and curated reference expansion infrastructure~~ feat: expand reference dataset with 25 new diagrams across 6 new domain categories Mar 25, 2026

Update reference set metadata and add new images. Increased the numbe…

8721e72

…r of curated methodology diagrams from 13 to 38, updated version to 3.0.0, and expanded categories. Added multiple new images related to various research topics.

dev-miro26 force-pushed the feat/venue-styles-and-reference-expansion branch from 58c0b44 to 8721e72 Compare March 25, 2026 05:38

Merge branch 'main' into feat/venue-styles-and-reference-expansion

a01bb00

dippatel1994 requested changes Apr 2, 2026

View reviewed changes

Update reference set metadata, add new images, and implement tests.

ab2a48f

dev-miro26 requested a review from dippatel1994 April 2, 2026 21:15

dippatel1994 approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expand reference dataset with 25 new diagrams across 6 new domain categories#117

feat: expand reference dataset with 25 new diagrams across 6 new domain categories#117
dev-miro26 wants to merge 3 commits intollmsresearch:mainfrom
dev-miro26:feat/venue-styles-and-reference-expansion

dev-miro26 commented Mar 24, 2026 •

edited

Loading

Uh oh!

dev-miro26 commented Mar 24, 2026

Uh oh!

dippatel1994 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dippatel1994 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dippatel1994 left a comment

Uh oh!

dippatel1994 left a comment

Uh oh!

dev-miro26 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dev-miro26 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New categories and entries

What changed

Selection criteria

Note on category naming

Test plan

Uh oh!

dev-miro26 commented Mar 24, 2026

Uh oh!

dippatel1994 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dippatel1994 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dev-miro26 commented Mar 25, 2026

Uh oh!

dippatel1994 left a comment

Choose a reason for hiding this comment

Uh oh!

dippatel1994 left a comment

Choose a reason for hiding this comment

Uh oh!

dev-miro26 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dev-miro26 commented Mar 24, 2026 •

edited

Loading