Skip to content

Conversation

@jsochava
Copy link

@jsochava jsochava commented Oct 27, 2025

Closes #14085

This Draft PR introduces RelatedWorkAnnotator, a helper class in jablib that appends contextual summaries from a citing paper’s “Related Work” section into a target BibEntry.

What and why

  1. RelatedWorkAnnotator.java
  • Appends contextual summaries from a citing paper’s “Related Work” section into a target BibEntry.
  • Uses JabRef’s comment- convention (resolved as UserSpecificCommentField).
  1. HeuristicRelatedWorkExtractor.java
  • Deterministic parser that locates author–year citations (e.g. (Vesce et al., 2016), (Bianchi, 2021)) within Related Work text.
  • Extracts descriptive snippets surrounding each citation.
  • Matches each citation to an existing BibEntry by first author surname + year.
  • Implemented without AI dependencies; designed for reliability and transparent logic.
  1. RelatedWorkHarvester.java
  • High-level orchestrator that connects the extractor and annotator:
  • Accepts PDF-extracted or plain text input.
  • Calls the extractor to identify citation–context pairs.
  • Invokes RelatedWorkAnnotator.appendSummaryToEntry(...) for each match.

Next steps

  1. PDF text extraction
  • Integrate JabRef’s existing PDF parsing utilities or the LangChain4j interface to automatically extract the “Related Work” section from PDFs.
  • Focus on reliable section header detection (e.g., Related Work, Literature Review).
  1. Reference lookup
  • For each parsed in-text citation:
    - Match to an existing library entry.
    - If missing, create a new BibEntry and annotate it.

Steps to test

  1. Run the unit tests for the new feature only:
    ./gradlew :jablib:test --tests "org.jabref.logic.importer.RelatedWorkAnnotatorTest"
    ./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.HeuristicRelatedWorkExtractorTest"
    ./gradlew :jablib:test --tests "org.jabref.logic.importer.relatedwork.RelatedWorkHarvesterTest"

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • I manually tested my changes in running JabRef (always required)
  • I added JUnit tests for changes (if applicable)
  • [/] I added screenshots in the PR description (if change is visible to the user)
  • [/] I described the change in CHANGELOG.md in a way that is understandable for the average user (if change is visible to the user)
  • [/] I checked the user documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request updating file(s) in https://github.com/JabRef/user-documentation/tree/main/en.

…ries to comment-<username> (JabRef#14085)

This helper takes a BibEntry, a username, the citing paper's key,
and a summary sentence, and appends a block like:

  [LunaOstos_2024]: <summary>

to the field comment-<username>. If that field already has content,
the new block is appended after a blank line.

Includes unit tests verifying first append and multi-append behavior.
@github-actions
Copy link
Contributor

Hey @jsochava!

Thank you for contributing to JabRef! Your help is truly appreciated ❤️.

We have automatic checks in place, based on which you will soon get automated feedback if any of them are failing. We also use TragBot with custom rules that scans your changes and provides some preliminary comments, before a maintainer takes a look. TragBot is still learning, and may not always be accurate. In the "Files changed" tab, you can go through its comments and just click on "Resolve conversation" if you are sure that it is incorrect, or comment on the conversation if you are doubtful.

Please re-check our contribution guide in case of any other doubts related to our contribution workflow.

@jsochava jsochava force-pushed the feature/related-work-annotator branch from 21d4bac to 711c3a9 Compare October 30, 2025 00:47
…f#14085)

Implements a deterministic extractor for author–year style citations
in "Related Work" sections and integrates it with RelatedWorkAnnotator.

- Added org.jabref.logic.importer.relatedwork package
- Introduced RelatedWorkExtractor interface
- Implemented HeuristicRelatedWorkExtractor for author–year citation parsing
- Implemented RelatedWorkHarvester orchestrator that uses the extractor
  and appends summaries via RelatedWorkAnnotator
- Added comprehensive JUnit tests verifying extraction and annotation behavior

This change completes the non-AI (LangChain4j-free) MVP for issue JabRef#14085.
Future work may introduce an AI-based RelatedWorkExtractor using LangChain4j.
…critics(JabRef#14085)

- Updated AUTHOR_YEAR_INNER regex to allow all-caps acronyms (e.g., "CIA")
  and Unicode names (e.g., "Šimić").
- Added acronym indexing in buildIndex() so corporate or multi-word authors
  (e.g., "Central Intelligence Agency") map to their acronyms.
- Ensures citations like (CIA, 2021) correctly match entries such as
  "Central Intelligence Agency, 2021".
- Keeps deterministic behavior while improving coverage of real-world
  citation formats in Related Work sections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract text about papers from "related work" sections

1 participant