Skip to content

feat: expand reference dataset with 25 new diagrams across 6 new domain categories#117

Open
dev-miro26 wants to merge 3 commits intollmsresearch:mainfrom
dev-miro26:feat/venue-styles-and-reference-expansion
Open

feat: expand reference dataset with 25 new diagrams across 6 new domain categories#117
dev-miro26 wants to merge 3 commits intollmsresearch:mainfrom
dev-miro26:feat/venue-styles-and-reference-expansion

Conversation

@dev-miro26
Copy link
Copy Markdown

@dev-miro26 dev-miro26 commented Mar 24, 2026

Summary

Closes #90

The Retriever currently only has 13 reference diagrams across 4 categories (agent_reasoning, vision_perception, generative_learning, science_applications). Papers outside those domains get poor few-shot examples, which degrades Planner output.

This PR hand-picks 25 new reference diagrams from PaperBananaBench and adds 6 new domain categories, bringing the total to 38 examples across 10 categories.

New categories and entries

Category Count Example papers
healthcare_medical 5 CSBrain (EEG decoding), CHEFNMR (NMR molecular structure), C³M (gene + EHR), multiscale protein GNN, NOBLE (bio-informed neural operator)
robotics_control 4 DexGarmentLab (dexterous manipulation), DynaNav (visual navigation), ROBOT-R1 (RL embodied reasoning), RoboScape (physics-informed world model)
nlp_language 4 Adversarial paraphrasing, InSUR (instruction uncertainty), RRM (reward reasoning), SEAL (text anonymization)
multimodal_fusion 4 DynamicVerse (4D generation), FakeVLM (synthetic image detection), Cauvis (causal visual prompting), OmniResponse (multimodal conversation)
systems_networking 4 StarTrail (distributed GPU attention), GRIFFIN (speculative decoding), FreqExit (early-exit inference), IneqSearch (theorem proving pipeline)
optimization_theory 4 Geometric neural combinatorial optimization, TITAN (VQE parameter freezing), TensorRL-QAS (quantum circuit search), STRAP (spatio-temporal retrieval)

What changed

  • data/reference_sets/index.json — 25 new ReferenceExample entries following the existing schema (id, source_context, caption, image_path, category, aspect_ratio, structure_hints). Metadata bumped: version 2.0.0 → 3.0.0, total_examples 13 → 38, categories 4 → 10.
  • data/reference_sets/images/ — 25 new diagram images extracted from PaperBananaBench.
  • prompts/diagram/retriever.txt — Line 21 domain list extended with the 6 new domains so the VLM understands the expanded domain space during ranking.

Selection criteria

Each diagram was chosen because it:

  • Is clear and readable (no cluttered multi-panel figures)
  • Shows a distinct visual pattern the Planner can learn from (pipelines, block diagrams, multi-branch architectures, hierarchical layouts, loop diagrams)
  • Comes from a domain not already well-covered by the existing 13 references

Note on category naming

The existing curated_expansion.json on this branch uses slightly different category names for some overlapping concepts (e.g. systems_architecture vs systems_networking, multimodal_learning vs multimodal_fusion). These should be reconciled — happy to align in either direction based on reviewer preference.

Test plan

  • All 59 existing tests pass (pytest tests/test_pipeline/ tests/test_agents/ tests/test_reference/ tests/test_data/)
  • ReferenceStore loads all 38 entries and get_by_category returns correct counts for each new category
  • All 25 new images exist at their referenced image_path
  • Reviewer spot-checks a few new entries to verify diagram quality and category fit

@dev-miro26
Copy link
Copy Markdown
Author

@dippatel1994
Could you please check my first PR? Leave your feedback kindly.
I appreciate you.

@dippatel1994
Copy link
Copy Markdown
Member

Thanks @dev-miro26 venue-specific guidelines + reference.venue + CLI --venue are a clean addition, and the resolution order (venue dir → root files → embedded defaults) preserves backward compatibility. The curated manifest + preset plumbing is directionally right.

Before merge, please address:

  1. preset=curated + missing/unreadable manifest: _load_curated_manifest() can return None, and _import_from_bench(..., manifest=None) currently imports the full benchmark. That’s a serious footgun — fail fast with a clear error instead of silently doing a full import.

  2. metadata.target_total: not enforced in code while max_per_category is — either implement a cap or adjust the manifest/docs so users aren’t promised ~40 examples when the logic allows more.

Follow-ups (non-blocking): document order-dependent “balanced” sampling; consider a warning when --venue doesn’t match any venue file; add a few unit tests for manifest filtering and venue loading.

Happy to re-check after the curated/manifest behavior is tightened.

@dev-miro26
Copy link
Copy Markdown
Author

@dippatel1994
I am fixing this conflicts now. I will push updated result soon

@dippatel1994 dippatel1994 added the enhancement New feature or request label Mar 25, 2026
@dippatel1994
Copy link
Copy Markdown
Member

Thanks @dev-miro26 love the passion! Appreciate your contribution to paperBana.

@dev-miro26
Copy link
Copy Markdown
Author

@dippatel1994
Could you please check this PR again?

@dev-miro26
Copy link
Copy Markdown
Author

@dippatel1994
I have discussed with @statxc for this issue.
I will update again soon.
Please wait a bit.
Thank you.

@dev-miro26 dev-miro26 changed the title Add multi-venue style support and curated reference expansion infrastructure feat: expand reference dataset with 25 new diagrams across 6 new domain categories Mar 25, 2026
…r of curated methodology diagrams from 13 to 38, updated version to 3.0.0, and expanded categories. Added multiple new images related to various research topics.
@dev-miro26 dev-miro26 force-pushed the feat/venue-styles-and-reference-expansion branch from 58c0b44 to 8721e72 Compare March 25, 2026 05:38
@dev-miro26
Copy link
Copy Markdown
Author

@dippatel1994
I have update this PR.
As you know #89 was divided with 3 PRs( #90 #91 #99)
This PR is for the #90.
Please review and leave your feedback kindly.
Thank you.

Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI passes, good dataset expansion. Two things to fix:

  1. Inconsistent ID format — Existing 13 entries use arxiv IDs (e.g., 2601.03570v1). Issue #90 explicitly says "id is the arxiv ID." New entries use pb_ref_42, pb_ref_24, etc. These show up as "Paper ID" in the retriever prompt — pb_ref_42 is less meaningful than an arxiv ID. Please use arxiv IDs.

  2. Missing source_paper field — All 13 original entries include "source_paper". None of the 25 new entries have it. Add for consistency and provenance tracking.

Non-blocking: No tests added to validate the new entries load correctly. A lightweight test that loads real index.json and checks counts/image existence would prevent regressions.

@dev-miro26 dev-miro26 requested a review from dippatel1994 April 2, 2026 21:15
Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 3 points addressed: arxiv IDs used, source_paper added, 10 tests added. CI green. LGTM.

@dev-miro26
Copy link
Copy Markdown
Author

Could you please merge this PR?
Or is there anything to update more?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Curate new reference diagrams for underrepresented domains

2 participants