Release/0.1.6#7
Conversation
📝 WalkthroughWalkthroughComProScanner version 0.1.6 release PR updating citation references from arXiv preprint to published Digital Discovery article, standardizing LLM model prefix patterns, adding comprehensive API key documentation, and updating version numbers and changelog formats across documentation and metadata files. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
.gitignore (1)
189-191: Remove duplicated ignore patterns to keep the file clean.
CLAUDE.mdand.claudeare already present at Line 179 and Line 178. Keeping a single copy improves maintainability.Suggested cleanup
-# Claude files -CLAUDE.md -.claude🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 189 - 191, Remove the duplicated ignore patterns for CLAUDE.md and .claude: locate the repeated entries named "CLAUDE.md" and ".claude" in the .gitignore and delete the duplicate occurrences so only a single entry for each pattern remains to keep the file clean and maintainable.docs/about/changelog.md (1)
1-1: Clarify date format convention.Same as the root CHANGELOG.md, the date "02-04-2026" is ambiguous. Consider using ISO 8601 format "2026-04-02" for consistency and clarity.
📅 Proposed fix
-## [0.1.6] - 02-04-2026 +## [0.1.6] - 2026-04-02🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/about/changelog.md` at line 1, Replace the ambiguous date in the changelog header by updating the line that currently reads "## [0.1.6] - 02-04-2026" in docs/about/changelog.md to use ISO 8601 format "## [0.1.6] - 2026-04-02" so it matches the root CHANGELOG.md convention and removes month/day ambiguity.CHANGELOG.md (1)
1-1: Clarify date format convention.The date "02-04-2026" is ambiguous - it could be interpreted as either February 4th or April 2nd depending on locale. Consider using an unambiguous format like "2026-04-02" (ISO 8601) or "April 2, 2026" for clarity.
📅 Proposed fix for ISO 8601 format
-## [0.1.6] - 02-04-2026 +## [0.1.6] - 2026-04-02🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 1, Update the ambiguous release date in the changelog header "## [0.1.6] - 02-04-2026" to an unambiguous format (e.g., ISO 8601 "2026-04-02" or a full month format "April 2, 2026"); edit that header line so the date is replaced with the chosen clear format across the file to avoid locale confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/getting-started/api-key-guide.md`:
- Line 235: Update the docs/getting-started/api-key-guide.md entry that lists
"fireworks_ai/..." as a typical model prefix: replace that incorrect prefix with
the actual prefixes used by the implementation (see
src/comproscanner/extract_flow/tools/rag_tool.py) — specifically use
"fireworks/" and "accounts/fireworks" as the documented model prefixes so the
guide matches the code.
- Line 139: The docs incorrectly show the Google Gemini model prefix as
"gemini/"; update the documentation text and any examples to use the correct
prefix "gemini-" to match the implementation in
src/comproscanner/extract_flow/tools/rag_tool.py (where Gemini models are
recognized by the "gemini-" prefix).
- Line 155: The docs entry showing "Typical model prefixes: `anthropic/...`" is
incorrect; update the documentation to reflect the actual Anthropic model prefix
used in the implementation (see rag_tool.py where Anthropic models are
referenced) by replacing `anthropic/...` with the correct `claude-` prefix and
provide an example (e.g., `claude-2`), ensuring consistency with the logic in
the extract_flow/tools/rag_tool.py implementation.
- Line 122: The docs line stating "Typical model prefixes: `openai/...` or
OpenAI model names directly" is incorrect; update the text to reflect the actual
OpenAI model prefix logic used by the code (see the logic in rag_tool.py that
checks for prefixes `gpt-`, `text-`, `o1`, and `o3`). Replace the example prefix
with a concise list such as "Typical OpenAI model prefixes: `gpt-`, `text-`,
`o1`, `o3` (or full OpenAI model names)" so the documentation matches the
behavior in the model detection code (refer to the model-identification checks
in rag_tool.py).
- Line 203: The documentation line "Typical model prefixes: `together_ai/...`"
is incorrect; update docs/getting-started/api-key-guide.md to use the actual
prefix `together/...` (replace `together_ai/` with `together/`) to match the
implementation referenced in src/comproscanner/extract_flow/tools/rag_tool.py
(lines ~120-124) where Together AI models are constructed with the `together/`
prefix.
- Line 171: Update the docs line that currently shows the model prefix as
`deepseek/...` to match the implementation's actual prefix `deepseek` (no
trailing slash); locate the string "Typical model prefixes: `deepseek/...`" in
the API key guide and replace it with the corrected prefix format so it aligns
with the implementation that checks for the `deepseek` prefix (as referenced in
rag_tool.py where the prefix is used).
- Around line 261-283: Update the HF_TOKEN docs to state it's optional: change
the "Default Embedding Provider - Hugging Face" section to clarify that HF_TOKEN
is only required for gated/private model downloads or rate-limited APIs and not
needed for using public models; reference the embedding implementation's use of
AutoTokenizer.from_pretrained() and AutoModel.from_pretrained() to explain that
those calls may use HF_TOKEN implicitly when accessing gated models but do not
require it for public models, and remove any wording that implies HF_TOKEN is
always required.
---
Nitpick comments:
In @.gitignore:
- Around line 189-191: Remove the duplicated ignore patterns for CLAUDE.md and
.claude: locate the repeated entries named "CLAUDE.md" and ".claude" in the
.gitignore and delete the duplicate occurrences so only a single entry for each
pattern remains to keep the file clean and maintainable.
In `@CHANGELOG.md`:
- Line 1: Update the ambiguous release date in the changelog header "## [0.1.6]
- 02-04-2026" to an unambiguous format (e.g., ISO 8601 "2026-04-02" or a full
month format "April 2, 2026"); edit that header line so the date is replaced
with the chosen clear format across the file to avoid locale confusion.
In `@docs/about/changelog.md`:
- Line 1: Replace the ambiguous date in the changelog header by updating the
line that currently reads "## [0.1.6] - 02-04-2026" in docs/about/changelog.md
to use ISO 8601 format "## [0.1.6] - 2026-04-02" so it matches the root
CHANGELOG.md convention and removes month/day ambiguity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8e25d1e9-97bf-44ba-8c94-8e7a1f2d96db
📒 Files selected for processing (8)
.gitignoreCHANGELOG.mdCITATION.cffREADME.mddocs/about/changelog.mddocs/about/citation.mddocs/getting-started/api-key-guide.mdpyproject.toml
|
|
||
| Environment variable: `GEMINI_API_KEY` | ||
|
|
||
| Typical model prefixes: `gemini/...` |
There was a problem hiding this comment.
Incorrect model prefix documentation for Google Gemini.
The documented prefix gemini/... does not match the actual implementation. According to the code in src/comproscanner/extract_flow/tools/rag_tool.py:95-99, Gemini models use the prefix gemini-, not gemini/.
📝 Proposed fix
-Typical model prefixes: `gemini/...`
+Typical model prefixes: `gemini-*`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Typical model prefixes: `gemini/...` | |
| Typical model prefixes: `gemini-*` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/getting-started/api-key-guide.md` at line 139, The docs incorrectly show
the Google Gemini model prefix as "gemini/"; update the documentation text and
any examples to use the correct prefix "gemini-" to match the implementation in
src/comproscanner/extract_flow/tools/rag_tool.py (where Gemini models are
recognized by the "gemini-" prefix).
|
|
||
| Environment variable: `ANTHROPIC_API_KEY` | ||
|
|
||
| Typical model prefixes: `anthropic/...` |
There was a problem hiding this comment.
Incorrect model prefix documentation for Anthropic.
The documented prefix anthropic/... does not match the actual implementation. According to the code in src/comproscanner/extract_flow/tools/rag_tool.py:102-106, Anthropic models use the prefix claude-, not anthropic/.
📝 Proposed fix
-Typical model prefixes: `anthropic/...`
+Typical model prefixes: `claude-*`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Typical model prefixes: `anthropic/...` | |
| Typical model prefixes: `claude-*` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/getting-started/api-key-guide.md` at line 155, The docs entry showing
"Typical model prefixes: `anthropic/...`" is incorrect; update the documentation
to reflect the actual Anthropic model prefix used in the implementation (see
rag_tool.py where Anthropic models are referenced) by replacing `anthropic/...`
with the correct `claude-` prefix and provide an example (e.g., `claude-2`),
ensuring consistency with the logic in the extract_flow/tools/rag_tool.py
implementation.
|
|
||
| Environment variable: `DEEPSEEK_API_KEY` | ||
|
|
||
| Typical model prefixes: `deepseek/...` |
There was a problem hiding this comment.
Incorrect model prefix documentation for DeepSeek.
The documented prefix deepseek/... does not match the actual implementation. According to the code in src/comproscanner/extract_flow/tools/rag_tool.py:94, DeepSeek models use the prefix deepseek, not deepseek/ (no trailing slash).
📝 Proposed fix
-Typical model prefixes: `deepseek/...`
+Typical model prefixes: `deepseek*`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Typical model prefixes: `deepseek/...` | |
| Typical model prefixes: `deepseek*` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/getting-started/api-key-guide.md` at line 171, Update the docs line that
currently shows the model prefix as `deepseek/...` to match the implementation's
actual prefix `deepseek` (no trailing slash); locate the string "Typical model
prefixes: `deepseek/...`" in the API key guide and replace it with the corrected
prefix format so it aligns with the implementation that checks for the
`deepseek` prefix (as referenced in rag_tool.py where the prefix is used).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7 +/- ##
=======================================
Coverage 51.16% 51.16%
=======================================
Files 47 47
Lines 8906 8906
=======================================
Hits 4557 4557
Misses 4349 4349 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
docs/about/changelog.md (1)
22-22: Correct heading nesting for markdownlint complianceLine 22 should be
## Fixed(not### Fixed) because it is directly under# Unreleased.Proposed markdown fix
-### Fixed +## Fixed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/about/changelog.md` at line 22, The "Fixed" heading under the "# Unreleased" section uses one too many # characters; change the heading text "### Fixed" to "## Fixed" so it nests immediately under the top-level "Unreleased" heading and satisfies markdownlint; update the heading string "### Fixed" to "## Fixed" in the changelog.CHANGELOG.md (1)
22-22: Fix heading level jump under UnreleasedLine 22 uses
### Fixeddirectly under# Unreleased, which skips one heading level and triggers markdownlint MD001. Use## Fixedhere.Proposed markdown fix
-### Fixed +## Fixed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 22, Change the heading level under the "# Unreleased" section from "### Fixed" to "## Fixed" to avoid skipping a heading level (MD001); locate the "### Fixed" heading in CHANGELOG.md and replace it with "## Fixed" so the hierarchy directly under "# Unreleased" is correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 30: Replace the unhyphenated phrase in the paper title entry
"[ComProScanner: a multi-agent based framework for composition-property
structured data extraction from scientific
literature](https://doi.org/10.1039/D5DD00521C)" by changing "multi-agent based"
to "multi-agent-based" so the title reads "ComProScanner: a multi-agent-based
framework for composition-property structured data extraction from scientific
literature".
In `@docs/about/changelog.md`:
- Line 30: The changelog entry string "ComProScanner: a multi-agent based
framework for composition-property structured data extraction from scientific
literature" uses an unhyphenated compound modifier; update that string to read
"ComProScanner: a multi-agent-based framework for composition-property
structured data extraction from scientific literature" by replacing "multi-agent
based" with "multi-agent-based".
---
Nitpick comments:
In `@CHANGELOG.md`:
- Line 22: Change the heading level under the "# Unreleased" section from "###
Fixed" to "## Fixed" to avoid skipping a heading level (MD001); locate the "###
Fixed" heading in CHANGELOG.md and replace it with "## Fixed" so the hierarchy
directly under "# Unreleased" is correct.
In `@docs/about/changelog.md`:
- Line 22: The "Fixed" heading under the "# Unreleased" section uses one too
many # characters; change the heading text "### Fixed" to "## Fixed" so it nests
immediately under the top-level "Unreleased" heading and satisfies markdownlint;
update the heading string "### Fixed" to "## Fixed" in the changelog.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8abc9ec2-cdc8-4ca9-b1a1-46952b51c8c4
📒 Files selected for processing (3)
.gitignoreCHANGELOG.mddocs/about/changelog.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitignore
| ## [0.1.6] - 02-04-2026 | ||
| ### Changed | ||
| - Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access: | ||
| - [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) |
There was a problem hiding this comment.
Hyphenate compound adjective in paper title text
Line 30 should use “multi-agent-based” for correct grammar/readability.
Proposed text fix
-- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
+- [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) 📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) | |
| - [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) |
🧰 Tools
🪛 LanguageTool
[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ccess: - [ComProScanner: a multi-agent based framework for composition-property...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` at line 30, Replace the unhyphenated phrase in the paper title
entry "[ComProScanner: a multi-agent based framework for composition-property
structured data extraction from scientific
literature](https://doi.org/10.1039/D5DD00521C)" by changing "multi-agent based"
to "multi-agent-based" so the title reads "ComProScanner: a multi-agent-based
framework for composition-property structured data extraction from scientific
literature".
| ## [0.1.6] - 02-04-2026 | ||
| ### Changed | ||
| - Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access: | ||
| - [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) |
There was a problem hiding this comment.
Use hyphenated compound modifier
Line 30 should read “multi-agent-based framework”.
Proposed text fix
-- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
+- [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) 📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) | |
| - [ComProScanner: a multi-agent-based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) |
🧰 Tools
🪛 LanguageTool
[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ccess: - [ComProScanner: a multi-agent based framework for composition-property...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/about/changelog.md` at line 30, The changelog entry string
"ComProScanner: a multi-agent based framework for composition-property
structured data extraction from scientific literature" uses an unhyphenated
compound modifier; update that string to read "ComProScanner: a
multi-agent-based framework for composition-property structured data extraction
from scientific literature" by replacing "multi-agent based" with
"multi-agent-based".
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/comproscanner/extract_flow/tools/rag_tool.py (1)
92-110:⚠️ Potential issue | 🔴 CriticalBreaking change: Old model name formats will cause runtime errors.
The prefix changes to slash-based formats (
deepseek/,gemini/,claude/) break backward compatibility. Model names in the existing codebase and documentation using hyphenated formats likegemini-2.0-flash,claude-3-5-sonnet-20241022, anddeepseek-chatwill no longer match and will raise "Unrecognized or unsupported model name" errors at runtime.Examples found in the codebase that will break:
docs/rag-config.md:gemini-2.0-flash,claude-3-5-sonnet-20241022README.md:gemini-2.5-proexamples/:deepseek-chat(appears twice)Consider supporting both old and new formats:
Proposed fix: Accept both old and new model name formats
# Deepseek models - if model.startswith("deepseek/"): + elif model.startswith(("deepseek/", "deepseek-")): self._check_package_exists("langchain_deepseek", model) from langchain_deepseek import ChatDeepSeek return ChatDeepSeek(model=model, request_timeout=1000, **common_params) # Google Gemini models - elif model.startswith("gemini/"): + elif model.startswith(("gemini/", "gemini-")): self._check_package_exists("langchain_google_genai", model) from langchain_google_genai import ChatGoogleGenerativeAI return ChatGoogleGenerativeAI(model=model, **common_params) # Anthropic Claude models - elif model.startswith("claude/"): + elif model.startswith(("claude/", "claude-")): self._check_package_exists("langchain_anthropic", model) from langchain_anthropic import ChatAnthropic return ChatAnthropic(model=model, **common_params)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/comproscanner/extract_flow/tools/rag_tool.py` around lines 92 - 110, The model-name checks in the factory logic (the branch handling deepseek/gemini/claude in rag_tool.py) only recognize slash-based prefixes and will reject older hyphenated names; update the conditional logic in the factory function (the method containing the shown if/elif branches) to accept both formats (e.g., model.startswith("gemini/") OR model.startswith("gemini-") etc.) or normalize incoming names (replace the first '-' with '/' for known providers) before the existing _check_package_exists calls and imports (retain ChatDeepSeek, ChatGoogleGenerativeAI, ChatAnthropic usages). Ensure both the package checks and returned constructors use the normalized model value so backward-compatible names like "gemini-2.0-flash", "claude-3-5-sonnet-20241022", and "deepseek-chat" work alongside "gemini/...", "claude/...", "deepseek/...".
🧹 Nitpick comments (2)
src/comproscanner/extract_flow/tools/rag_tool.py (2)
92-92: Inconsistent control flow:ifshould beeliffor consistency.Line 92 uses a standalone
ifstatement while subsequent provider checks (lines 99, 106, 113, etc.) useelif. Although functionally correct due to early returns, this creates confusing control flow structure where theelifchain starting at line 99 is relative to line 92, not line 85.♻️ Proposed fix
# Deepseek models - if model.startswith("deepseek/"): + elif model.startswith("deepseek/"): self._check_package_exists("langchain_deepseek", model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/comproscanner/extract_flow/tools/rag_tool.py` at line 92, Change the standalone "if model.startswith('deepseek/')" to "elif model.startswith('deepseek/')" so it participates in the same conditional chain as the subsequent provider checks; update the conditional in the function containing the model dispatch (the branch where model is matched against providers using startswith and subsequent elifs) to keep control flow consistent and preserve existing return behavior.
146-146: Unnecessary parentheses around single-element string.
("fireworks/")is not a tuple—it's just a parenthesized string. The parentheses are redundant here.♻️ Proposed fix
- elif model.startswith(("fireworks/")): + elif model.startswith("fireworks/"):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/comproscanner/extract_flow/tools/rag_tool.py` at line 146, The condition uses unnecessary parentheses around a single string: locate the branch using model.startswith(("fireworks/")) in rag_tool.py and replace it with model.startswith("fireworks/") (or, if you intended multiple prefixes, use a proper tuple like ("fireworks/", "other/")). Keep the same branch (the elif that checks model.startswith) and only adjust the argument to startswith to remove the redundant parentheses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/about/changelog.md`:
- Around line 27-34: Add a new "### Fixed" subsection under version 0.1.6 in
docs/about/changelog.md containing the two missing bullets: one noting that
"Model prefix handling in rag_tool.py standardized to reflect the docs" and the
other clarifying "HF_TOKEN documentation clarified as optional — only required
for gated or private Hugging Face models"; ensure the section header is exactly
"### Fixed" and the two items reference rag_tool.py and HF_TOKEN as written so
they match existing CHANGELOG.md entries.
---
Outside diff comments:
In `@src/comproscanner/extract_flow/tools/rag_tool.py`:
- Around line 92-110: The model-name checks in the factory logic (the branch
handling deepseek/gemini/claude in rag_tool.py) only recognize slash-based
prefixes and will reject older hyphenated names; update the conditional logic in
the factory function (the method containing the shown if/elif branches) to
accept both formats (e.g., model.startswith("gemini/") OR
model.startswith("gemini-") etc.) or normalize incoming names (replace the first
'-' with '/' for known providers) before the existing _check_package_exists
calls and imports (retain ChatDeepSeek, ChatGoogleGenerativeAI, ChatAnthropic
usages). Ensure both the package checks and returned constructors use the
normalized model value so backward-compatible names like "gemini-2.0-flash",
"claude-3-5-sonnet-20241022", and "deepseek-chat" work alongside "gemini/...",
"claude/...", "deepseek/...".
---
Nitpick comments:
In `@src/comproscanner/extract_flow/tools/rag_tool.py`:
- Line 92: Change the standalone "if model.startswith('deepseek/')" to "elif
model.startswith('deepseek/')" so it participates in the same conditional chain
as the subsequent provider checks; update the conditional in the function
containing the model dispatch (the branch where model is matched against
providers using startswith and subsequent elifs) to keep control flow consistent
and preserve existing return behavior.
- Line 146: The condition uses unnecessary parentheses around a single string:
locate the branch using model.startswith(("fireworks/")) in rag_tool.py and
replace it with model.startswith("fireworks/") (or, if you intended multiple
prefixes, use a proper tuple like ("fireworks/", "other/")). Keep the same
branch (the elif that checks model.startswith) and only adjust the argument to
startswith to remove the redundant parentheses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c7f45f3e-0163-4f99-b568-de0af298a32d
📒 Files selected for processing (5)
CHANGELOG.mddocs/about/changelog.mddocs/getting-started/api-key-guide.mddocs/rag-config.mdsrc/comproscanner/extract_flow/tools/rag_tool.py
✅ Files skipped from review due to trivial changes (2)
- docs/rag-config.md
- docs/getting-started/api-key-guide.md
| ## [0.1.6] - 2026-04-02 | ||
| ### Changed | ||
| - Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access: | ||
| - [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C) | ||
|
|
||
| ### Added | ||
| - Guide for API key creation for various LLM providers and publisher APIs added to the documentation at `docs/getting-started/api-key-guide.md` with detailed instructions for each provider. | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
echo "=== CHANGELOG.md 0.1.6 section ==="
sed -n '/## \[0.1.6\]/,/## \[0.1.5\]/p' CHANGELOG.md | head -20
echo ""
echo "=== docs/about/changelog.md 0.1.6 section ==="
sed -n '/## \[0.1.6\]/,/## \[0.1.5\]/p' docs/about/changelog.md | head -20Repository: aritraroy24/ComProScanner
Length of output: 1587
Add missing ### Fixed section to 0.1.6 in docs/about/changelog.md.
The docs/about/changelog.md file is missing the ### Fixed section that exists in CHANGELOG.md for version 0.1.6. The following items should be added:
### Fixed
- Model prefix handling in `rag_tool.py` standardized to reflect the docs.
- `HF_TOKEN` documentation clarified as optional — only required for gated or private Hugging Face models.
🧰 Tools
🪛 LanguageTool
[grammar] ~30-~30: Use a hyphen to join words.
Context: ...ccess: - [ComProScanner: a multi-agent based framework for composition-property...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/about/changelog.md` around lines 27 - 34, Add a new "### Fixed"
subsection under version 0.1.6 in docs/about/changelog.md containing the two
missing bullets: one noting that "Model prefix handling in rag_tool.py
standardized to reflect the docs" and the other clarifying "HF_TOKEN
documentation clarified as optional — only required for gated or private Hugging
Face models"; ensure the section header is exactly "### Fixed" and the two items
reference rag_tool.py and HF_TOKEN as written so they match existing
CHANGELOG.md entries.
Summary by CodeRabbit
Documentation
Changed