Skip to content

Release/0.1.6#7

Merged
aritraroy24 merged 5 commits into
mainfrom
release/0.1.6
Apr 2, 2026
Merged

Release/0.1.6#7
aritraroy24 merged 5 commits into
mainfrom
release/0.1.6

Conversation

@aritraroy24
Copy link
Copy Markdown
Owner

@aritraroy24 aritraroy24 commented Apr 2, 2026

Summary by CodeRabbit

  • Documentation

    • Introduced API Key Guide with detailed setup instructions for configuring credentials across all supported LLM, publisher, and embedding providers.
  • Changed

    • Updated all citations to reference the published Digital Discovery journal article rather than the previous arXiv preprint.
    • Standardized model identifier prefixes across major LLM providers for improved compatibility.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

ComProScanner version 0.1.6 release PR updating citation references from arXiv preprint to published Digital Discovery article, standardizing LLM model prefix patterns, adding comprehensive API key documentation, and updating version numbers and changelog formats across documentation and metadata files.

Changes

Cohort / File(s) Summary
Release & Version Metadata
CHANGELOG.md, pyproject.toml, CITATION.cff, docs/about/changelog.md, docs/about/citation.md
Version bumped to 0.1.6 (2026-04-02); updated all citation/changelog references from arXiv 2025 preprint to Digital Discovery peer-reviewed publication with DOI; reformatted changelog headers and added horizontal separators between release sections.
Publication Citation Updates
README.md, docs/about/citation.md
Updated BibTeX entries from @misc arXiv format (2025) to @Article format (2026 Digital Discovery, RSC publisher, DOI-based URL, "Accepted" pages status).
API Key Documentation
docs/getting-started/api-key-guide.md
New comprehensive guide documenting external credential requirements for publisher providers (Elsevier, Springer, Wiley, IOP), LLM providers (OpenAI, Gemini, Anthropic, DeepSeek, OpenRouter, Together, Cohere, Fireworks), embedding providers (Hugging Face), with step-by-step instructions and recommended .env template.
Model Prefix Standardization
src/comproscanner/extract_flow/tools/rag_tool.py, docs/rag-config.md
Standardized LLM model identifier prefix routing: OpenAI (openai/ prefix), DeepSeek (deepseek/ prefix), Google Gemini (gemini/ prefix), Anthropic Claude (claude/ prefix), Fireworks (narrowed to fireworks/ only); updated example documentation to reflect new prefix patterns.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 Hops of joy for v0.1.6!
From preprint to published, our paper takes flight,
API keys documented with clarity bright,
Model prefixes aligned, the code shines just right,
Digital Discovery spreads our ComProScanner light! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Release/0.1.6' is partially related to the changeset—it indicates a version release but does not clearly summarize the main changes (paper publication updates, new API key guide, model prefix standardization, and version bump).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch release/0.1.6

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
.gitignore (1)

189-191: Remove duplicated ignore patterns to keep the file clean.

CLAUDE.md and .claude are already present at Line 179 and Line 178. Keeping a single copy improves maintainability.

Suggested cleanup
-# Claude files
-CLAUDE.md
-.claude
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 189 - 191, Remove the duplicated ignore patterns for
CLAUDE.md and .claude: locate the repeated entries named "CLAUDE.md" and
".claude" in the .gitignore and delete the duplicate occurrences so only a
single entry for each pattern remains to keep the file clean and maintainable.
docs/about/changelog.md (1)

1-1: Clarify date format convention.

Same as the root CHANGELOG.md, the date "02-04-2026" is ambiguous. Consider using ISO 8601 format "2026-04-02" for consistency and clarity.

📅 Proposed fix
-## [0.1.6] - 02-04-2026
+## [0.1.6] - 2026-04-02
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/about/changelog.md` at line 1, Replace the ambiguous date in the
changelog header by updating the line that currently reads "## [0.1.6] -
02-04-2026" in docs/about/changelog.md to use ISO 8601 format "## [0.1.6] -
2026-04-02" so it matches the root CHANGELOG.md convention and removes month/day
ambiguity.
CHANGELOG.md (1)

1-1: Clarify date format convention.

The date "02-04-2026" is ambiguous - it could be interpreted as either February 4th or April 2nd depending on locale. Consider using an unambiguous format like "2026-04-02" (ISO 8601) or "April 2, 2026" for clarity.

📅 Proposed fix for ISO 8601 format
-## [0.1.6] - 02-04-2026
+## [0.1.6] - 2026-04-02
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 1, Update the ambiguous release date in the changelog
header "## [0.1.6] - 02-04-2026" to an unambiguous format (e.g., ISO 8601
"2026-04-02" or a full month format "April 2, 2026"); edit that header line so
the date is replaced with the chosen clear format across the file to avoid
locale confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/getting-started/api-key-guide.md`:
- Line 235: Update the docs/getting-started/api-key-guide.md entry that lists
"fireworks_ai/..." as a typical model prefix: replace that incorrect prefix with
the actual prefixes used by the implementation (see
src/comproscanner/extract_flow/tools/rag_tool.py) — specifically use
"fireworks/" and "accounts/fireworks" as the documented model prefixes so the
guide matches the code.
- Line 139: The docs incorrectly show the Google Gemini model prefix as
"gemini/"; update the documentation text and any examples to use the correct
prefix "gemini-" to match the implementation in
src/comproscanner/extract_flow/tools/rag_tool.py (where Gemini models are
recognized by the "gemini-" prefix).
- Line 155: The docs entry showing "Typical model prefixes: `anthropic/...`" is
incorrect; update the documentation to reflect the actual Anthropic model prefix
used in the implementation (see rag_tool.py where Anthropic models are
referenced) by replacing `anthropic/...` with the correct `claude-` prefix and
provide an example (e.g., `claude-2`), ensuring consistency with the logic in
the extract_flow/tools/rag_tool.py implementation.
- Line 122: The docs line stating "Typical model prefixes: `openai/...` or
OpenAI model names directly" is incorrect; update the text to reflect the actual
OpenAI model prefix logic used by the code (see the logic in rag_tool.py that
checks for prefixes `gpt-`, `text-`, `o1`, and `o3`). Replace the example prefix
with a concise list such as "Typical OpenAI model prefixes: `gpt-`, `text-`,
`o1`, `o3` (or full OpenAI model names)" so the documentation matches the
behavior in the model detection code (refer to the model-identification checks
in rag_tool.py).
- Line 203: The documentation line "Typical model prefixes: `together_ai/...`"
is incorrect; update docs/getting-started/api-key-guide.md to use the actual
prefix `together/...` (replace `together_ai/` with `together/`) to match the
implementation referenced in src/comproscanner/extract_flow/tools/rag_tool.py
(lines ~120-124) where Together AI models are constructed with the `together/`
prefix.
- Line 171: Update the docs line that currently shows the model prefix as
`deepseek/...` to match the implementation's actual prefix `deepseek` (no
trailing slash); locate the string "Typical model prefixes: `deepseek/...`" in
the API key guide and replace it with the corrected prefix format so it aligns
with the implementation that checks for the `deepseek` prefix (as referenced in
rag_tool.py where the prefix is used).
- Around line 261-283: Update the HF_TOKEN docs to state it's optional: change
the "Default Embedding Provider - Hugging Face" section to clarify that HF_TOKEN
is only required for gated/private model downloads or rate-limited APIs and not
needed for using public models; reference the embedding implementation's use of
AutoTokenizer.from_pretrained() and AutoModel.from_pretrained() to explain that
those calls may use HF_TOKEN implicitly when accessing gated models but do not
require it for public models, and remove any wording that implies HF_TOKEN is
always required.

---

Nitpick comments:
In @.gitignore:
- Around line 189-191: Remove the duplicated ignore patterns for CLAUDE.md and
.claude: locate the repeated entries named "CLAUDE.md" and ".claude" in the
.gitignore and delete the duplicate occurrences so only a single entry for each
pattern remains to keep the file clean and maintainable.

In `@CHANGELOG.md`:
- Line 1: Update the ambiguous release date in the changelog header "## [0.1.6]
- 02-04-2026" to an unambiguous format (e.g., ISO 8601 "2026-04-02" or a full
month format "April 2, 2026"); edit that header line so the date is replaced
with the chosen clear format across the file to avoid locale confusion.

In `@docs/about/changelog.md`:
- Line 1: Replace the ambiguous date in the changelog header by updating the
line that currently reads "## [0.1.6] - 02-04-2026" in docs/about/changelog.md
to use ISO 8601 format "## [0.1.6] - 2026-04-02" so it matches the root
CHANGELOG.md convention and removes month/day ambiguity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e25d1e9-97bf-44ba-8c94-8e7a1f2d96db

📥 Commits

Reviewing files that changed from the base of the PR and between c1dcbb4 and 5123a41.

📒 Files selected for processing (8)
  • .gitignore
  • CHANGELOG.md
  • CITATION.cff
  • README.md
  • docs/about/changelog.md
  • docs/about/citation.md
  • docs/getting-started/api-key-guide.md
  • pyproject.toml

Comment thread docs/getting-started/api-key-guide.md

Environment variable: `GEMINI_API_KEY`

Typical model prefixes: `gemini/...`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect model prefix documentation for Google Gemini.

The documented prefix gemini/... does not match the actual implementation. According to the code in src/comproscanner/extract_flow/tools/rag_tool.py:95-99, Gemini models use the prefix gemini-, not gemini/.

📝 Proposed fix
-Typical model prefixes: `gemini/...`
+Typical model prefixes: `gemini-*`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Typical model prefixes: `gemini/...`
Typical model prefixes: `gemini-*`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/getting-started/api-key-guide.md` at line 139, The docs incorrectly show
the Google Gemini model prefix as "gemini/"; update the documentation text and
any examples to use the correct prefix "gemini-" to match the implementation in
src/comproscanner/extract_flow/tools/rag_tool.py (where Gemini models are
recognized by the "gemini-" prefix).


Environment variable: `ANTHROPIC_API_KEY`

Typical model prefixes: `anthropic/...`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect model prefix documentation for Anthropic.

The documented prefix anthropic/... does not match the actual implementation. According to the code in src/comproscanner/extract_flow/tools/rag_tool.py:102-106, Anthropic models use the prefix claude-, not anthropic/.

📝 Proposed fix
-Typical model prefixes: `anthropic/...`
+Typical model prefixes: `claude-*`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Typical model prefixes: `anthropic/...`
Typical model prefixes: `claude-*`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/getting-started/api-key-guide.md` at line 155, The docs entry showing
"Typical model prefixes: `anthropic/...`" is incorrect; update the documentation
to reflect the actual Anthropic model prefix used in the implementation (see
rag_tool.py where Anthropic models are referenced) by replacing `anthropic/...`
with the correct `claude-` prefix and provide an example (e.g., `claude-2`),
ensuring consistency with the logic in the extract_flow/tools/rag_tool.py
implementation.


Environment variable: `DEEPSEEK_API_KEY`

Typical model prefixes: `deepseek/...`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect model prefix documentation for DeepSeek.

The documented prefix deepseek/... does not match the actual implementation. According to the code in src/comproscanner/extract_flow/tools/rag_tool.py:94, DeepSeek models use the prefix deepseek, not deepseek/ (no trailing slash).

📝 Proposed fix
-Typical model prefixes: `deepseek/...`
+Typical model prefixes: `deepseek*`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Typical model prefixes: `deepseek/...`
Typical model prefixes: `deepseek*`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/getting-started/api-key-guide.md` at line 171, Update the docs line that
currently shows the model prefix as `deepseek/...` to match the implementation's
actual prefix `deepseek` (no trailing slash); locate the string "Typical model
prefixes: `deepseek/...`" in the API key guide and replace it with the corrected
prefix format so it aligns with the implementation that checks for the
`deepseek` prefix (as referenced in rag_tool.py where the prefix is used).

Comment thread docs/getting-started/api-key-guide.md Outdated
Comment thread docs/getting-started/api-key-guide.md Outdated
Comment thread docs/getting-started/api-key-guide.md
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.16%. Comparing base (c1dcbb4) to head (2df5f0a).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main       #7   +/-   ##
=======================================
  Coverage   51.16%   51.16%           
=======================================
  Files          47       47           
  Lines        8906     8906           
=======================================
  Hits         4557     4557           
  Misses       4349     4349           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/about/changelog.md (1)

22-22: Correct heading nesting for markdownlint compliance

Line 22 should be ## Fixed (not ### Fixed) because it is directly under # Unreleased.

Proposed markdown fix
-### Fixed
+## Fixed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/about/changelog.md` at line 22, The "Fixed" heading under the "#
Unreleased" section uses one too many # characters; change the heading text "###
Fixed" to "## Fixed" so it nests immediately under the top-level "Unreleased"
heading and satisfies markdownlint; update the heading string "### Fixed" to "##
Fixed" in the changelog.
CHANGELOG.md (1)

22-22: Fix heading level jump under Unreleased

Line 22 uses ### Fixed directly under # Unreleased, which skips one heading level and triggers markdownlint MD001. Use ## Fixed here.

Proposed markdown fix
-### Fixed
+## Fixed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 22, Change the heading level under the "# Unreleased"
section from "### Fixed" to "## Fixed" to avoid skipping a heading level
(MD001); locate the "### Fixed" heading in CHANGELOG.md and replace it with "##
Fixed" so the hierarchy directly under "# Unreleased" is correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 30: Replace the unhyphenated phrase in the paper title entry
"[ComProScanner: a multi-agent based framework for composition-property
structured data extraction from scientific
literature](https://doi.org/10.1039/D5DD00521C)" by changing "multi-agent based"
to "multi-agent-based" so the title reads "ComProScanner: a multi-agent-based
framework for composition-property structured data extraction from scientific
literature".

In `@docs/about/changelog.md`:
- Line 30: The changelog entry string "ComProScanner: a multi-agent based
framework for composition-property structured data extraction from scientific
literature" uses an unhyphenated compound modifier; update that string to read
"ComProScanner: a multi-agent-based framework for composition-property
structured data extraction from scientific literature" by replacing "multi-agent
based" with "multi-agent-based".

---

Nitpick comments:
In `@CHANGELOG.md`:
- Line 22: Change the heading level under the "# Unreleased" section from "###
Fixed" to "## Fixed" to avoid skipping a heading level (MD001); locate the "###
Fixed" heading in CHANGELOG.md and replace it with "## Fixed" so the hierarchy
directly under "# Unreleased" is correct.

In `@docs/about/changelog.md`:
- Line 22: The "Fixed" heading under the "# Unreleased" section uses one too
many # characters; change the heading text "### Fixed" to "## Fixed" so it nests
immediately under the top-level "Unreleased" heading and satisfies markdownlint;
update the heading string "### Fixed" to "## Fixed" in the changelog.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8abc9ec2-cdc8-4ca9-b1a1-46952b51c8c4

📥 Commits

Reviewing files that changed from the base of the PR and between 5123a41 and 9c91661.

📒 Files selected for processing (3)
  • .gitignore
  • CHANGELOG.md
  • docs/about/changelog.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitignore

Comment thread CHANGELOG.md
## [0.1.6] - 02-04-2026
### Changed
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hyphenate compound adjective in paper title text

Line 30 should use “multi-agent-based” for correct grammar/readability.

Proposed text fix
-- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) 
+- [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) 
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
- [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
🧰 Tools
🪛 LanguageTool

[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ccess: - [ComProScanner: a multi-agent based framework for composition-property...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 30, Replace the unhyphenated phrase in the paper title
entry "[ComProScanner: a multi-agent based framework for composition-property
structured data extraction from scientific
literature](https://doi.org/10.1039/D5DD00521C)" by changing "multi-agent based"
to "multi-agent-based" so the title reads "ComProScanner: a multi-agent-based
framework for composition-property structured data extraction from scientific
literature".

Comment thread docs/about/changelog.md
## [0.1.6] - 02-04-2026
### Changed
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use hyphenated compound modifier

Line 30 should read “multi-agent-based framework”.

Proposed text fix
-- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) 
+- [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) 
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
- [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
🧰 Tools
🪛 LanguageTool

[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ccess: - [ComProScanner: a multi-agent based framework for composition-property...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/about/changelog.md` at line 30, The changelog entry string
"ComProScanner: a multi-agent based framework for composition-property
structured data extraction from scientific literature" uses an unhyphenated
compound modifier; update that string to read "ComProScanner: a
multi-agent-based framework for composition-property structured data extraction
from scientific literature" by replacing "multi-agent based" with
"multi-agent-based".

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/comproscanner/extract_flow/tools/rag_tool.py (1)

92-110: ⚠️ Potential issue | 🔴 Critical

Breaking change: Old model name formats will cause runtime errors.

The prefix changes to slash-based formats (deepseek/, gemini/, claude/) break backward compatibility. Model names in the existing codebase and documentation using hyphenated formats like gemini-2.0-flash, claude-3-5-sonnet-20241022, and deepseek-chat will no longer match and will raise "Unrecognized or unsupported model name" errors at runtime.

Examples found in the codebase that will break:

  • docs/rag-config.md: gemini-2.0-flash, claude-3-5-sonnet-20241022
  • README.md: gemini-2.5-pro
  • examples/: deepseek-chat (appears twice)

Consider supporting both old and new formats:

Proposed fix: Accept both old and new model name formats
         # Deepseek models
-        if model.startswith("deepseek/"):
+        elif model.startswith(("deepseek/", "deepseek-")):
             self._check_package_exists("langchain_deepseek", model)
             from langchain_deepseek import ChatDeepSeek

             return ChatDeepSeek(model=model, request_timeout=1000, **common_params)

         # Google Gemini models
-        elif model.startswith("gemini/"):
+        elif model.startswith(("gemini/", "gemini-")):
             self._check_package_exists("langchain_google_genai", model)
             from langchain_google_genai import ChatGoogleGenerativeAI

             return ChatGoogleGenerativeAI(model=model, **common_params)

         # Anthropic Claude models
-        elif model.startswith("claude/"):
+        elif model.startswith(("claude/", "claude-")):
             self._check_package_exists("langchain_anthropic", model)
             from langchain_anthropic import ChatAnthropic

             return ChatAnthropic(model=model, **common_params)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/comproscanner/extract_flow/tools/rag_tool.py` around lines 92 - 110, The
model-name checks in the factory logic (the branch handling
deepseek/gemini/claude in rag_tool.py) only recognize slash-based prefixes and
will reject older hyphenated names; update the conditional logic in the factory
function (the method containing the shown if/elif branches) to accept both
formats (e.g., model.startswith("gemini/") OR model.startswith("gemini-") etc.)
or normalize incoming names (replace the first '-' with '/' for known providers)
before the existing _check_package_exists calls and imports (retain
ChatDeepSeek, ChatGoogleGenerativeAI, ChatAnthropic usages). Ensure both the
package checks and returned constructors use the normalized model value so
backward-compatible names like "gemini-2.0-flash", "claude-3-5-sonnet-20241022",
and "deepseek-chat" work alongside "gemini/...", "claude/...", "deepseek/...".
🧹 Nitpick comments (2)
src/comproscanner/extract_flow/tools/rag_tool.py (2)

92-92: Inconsistent control flow: if should be elif for consistency.

Line 92 uses a standalone if statement while subsequent provider checks (lines 99, 106, 113, etc.) use elif. Although functionally correct due to early returns, this creates confusing control flow structure where the elif chain starting at line 99 is relative to line 92, not line 85.

♻️ Proposed fix
         # Deepseek models
-        if model.startswith("deepseek/"):
+        elif model.startswith("deepseek/"):
             self._check_package_exists("langchain_deepseek", model)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/comproscanner/extract_flow/tools/rag_tool.py` at line 92, Change the
standalone "if model.startswith('deepseek/')" to "elif
model.startswith('deepseek/')" so it participates in the same conditional chain
as the subsequent provider checks; update the conditional in the function
containing the model dispatch (the branch where model is matched against
providers using startswith and subsequent elifs) to keep control flow consistent
and preserve existing return behavior.

146-146: Unnecessary parentheses around single-element string.

("fireworks/") is not a tuple—it's just a parenthesized string. The parentheses are redundant here.

♻️ Proposed fix
-        elif model.startswith(("fireworks/")):
+        elif model.startswith("fireworks/"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/comproscanner/extract_flow/tools/rag_tool.py` at line 146, The condition
uses unnecessary parentheses around a single string: locate the branch using
model.startswith(("fireworks/")) in rag_tool.py and replace it with
model.startswith("fireworks/") (or, if you intended multiple prefixes, use a
proper tuple like ("fireworks/", "other/")). Keep the same branch (the elif that
checks model.startswith) and only adjust the argument to startswith to remove
the redundant parentheses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/about/changelog.md`:
- Around line 27-34: Add a new "### Fixed" subsection under version 0.1.6 in
docs/about/changelog.md containing the two missing bullets: one noting that
"Model prefix handling in rag_tool.py standardized to reflect the docs" and the
other clarifying "HF_TOKEN documentation clarified as optional — only required
for gated or private Hugging Face models"; ensure the section header is exactly
"### Fixed" and the two items reference rag_tool.py and HF_TOKEN as written so
they match existing CHANGELOG.md entries.

---

Outside diff comments:
In `@src/comproscanner/extract_flow/tools/rag_tool.py`:
- Around line 92-110: The model-name checks in the factory logic (the branch
handling deepseek/gemini/claude in rag_tool.py) only recognize slash-based
prefixes and will reject older hyphenated names; update the conditional logic in
the factory function (the method containing the shown if/elif branches) to
accept both formats (e.g., model.startswith("gemini/") OR
model.startswith("gemini-") etc.) or normalize incoming names (replace the first
'-' with '/' for known providers) before the existing _check_package_exists
calls and imports (retain ChatDeepSeek, ChatGoogleGenerativeAI, ChatAnthropic
usages). Ensure both the package checks and returned constructors use the
normalized model value so backward-compatible names like "gemini-2.0-flash",
"claude-3-5-sonnet-20241022", and "deepseek-chat" work alongside "gemini/...",
"claude/...", "deepseek/...".

---

Nitpick comments:
In `@src/comproscanner/extract_flow/tools/rag_tool.py`:
- Line 92: Change the standalone "if model.startswith('deepseek/')" to "elif
model.startswith('deepseek/')" so it participates in the same conditional chain
as the subsequent provider checks; update the conditional in the function
containing the model dispatch (the branch where model is matched against
providers using startswith and subsequent elifs) to keep control flow consistent
and preserve existing return behavior.
- Line 146: The condition uses unnecessary parentheses around a single string:
locate the branch using model.startswith(("fireworks/")) in rag_tool.py and
replace it with model.startswith("fireworks/") (or, if you intended multiple
prefixes, use a proper tuple like ("fireworks/", "other/")). Keep the same
branch (the elif that checks model.startswith) and only adjust the argument to
startswith to remove the redundant parentheses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c7f45f3e-0163-4f99-b568-de0af298a32d

📥 Commits

Reviewing files that changed from the base of the PR and between 9c91661 and 2df5f0a.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • docs/about/changelog.md
  • docs/getting-started/api-key-guide.md
  • docs/rag-config.md
  • src/comproscanner/extract_flow/tools/rag_tool.py
✅ Files skipped from review due to trivial changes (2)
  • docs/rag-config.md
  • docs/getting-started/api-key-guide.md

Comment thread docs/about/changelog.md
Comment on lines +27 to +34
## [0.1.6] - 2026-04-02
### Changed
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)

### Added
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at `docs/getting-started/api-key-guide.md` with detailed instructions for each provider.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== CHANGELOG.md 0.1.6 section ==="
sed -n '/## \[0.1.6\]/,/## \[0.1.5\]/p' CHANGELOG.md | head -20

echo ""
echo "=== docs/about/changelog.md 0.1.6 section ==="
sed -n '/## \[0.1.6\]/,/## \[0.1.5\]/p' docs/about/changelog.md | head -20

Repository: aritraroy24/ComProScanner

Length of output: 1587


Add missing ### Fixed section to 0.1.6 in docs/about/changelog.md.

The docs/about/changelog.md file is missing the ### Fixed section that exists in CHANGELOG.md for version 0.1.6. The following items should be added:

### Fixed
- Model prefix handling in `rag_tool.py` standardized to reflect the docs.
- `HF_TOKEN` documentation clarified as optional — only required for gated or private Hugging Face models.
🧰 Tools
🪛 LanguageTool

[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ccess: - [ComProScanner: a multi-agent based framework for composition-property...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/about/changelog.md` around lines 27 - 34, Add a new "### Fixed"
subsection under version 0.1.6 in docs/about/changelog.md containing the two
missing bullets: one noting that "Model prefix handling in rag_tool.py
standardized to reflect the docs" and the other clarifying "HF_TOKEN
documentation clarified as optional — only required for gated or private Hugging
Face models"; ensure the section header is exactly "### Fixed" and the two items
reference rag_tool.py and HF_TOKEN as written so they match existing
CHANGELOG.md entries.

@aritraroy24 aritraroy24 merged commit ec10352 into main Apr 2, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant