Conversation
Signed-off-by: Luca Foppiano <luca@foppiano.org>
📝 WalkthroughWalkthroughThe PR upgrades the build system from Gradle 7.2 to 8.10.2, migrates the main project to Java 21 LTS (pubmed-glutton to Java 17), and updates the Shadow plugin to a new maintainer. Dependency versions are modernized, Apache HttpClient is replaced with JDK 11+ HttpClient in GROBID integration, and Gradle tasks are reconfigured with unified JVM argument handling via a shared Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
pubmed-glutton/build.gradle (1)
63-74: Consider prioritizing dependency updates for security.Several dependencies have known CVEs or are EOL:
log4j:1.2.17- EOL, multiple CVEsjackson-*:2.9.0.pr3- Pre-release from 2017, known CVEsxstream:1.4.8- Multiple CVEsThis is documented as a tracked follow-up in
pubmed-glutton/Readme.md, but given the severity of some vulnerabilities (especially log4j and Jackson), consider prioritizing this modernization.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pubmed-glutton/build.gradle` around lines 63 - 74, The dependency block pins insecure/outdated libraries (log4j:1.2.17, jackson-*:2.9.0.pr3, xstream:1.4.8, etc.); update these artifacts in the build.gradle implementation declarations to secure supported releases (replace log4j 1.x with Log4j2 or a maintained logging backend, upgrade jackson-core/jackson-databind/jackson-annotations/jackson-dataformat-yaml to a current patched 2.x release, and bump xstream to the latest secure version), run the test suite and dependency vulnerability scan, and update any code/config that needs API changes and the project documentation to reflect the new versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@build.gradle`:
- Line 86: Remove the explicit dependency pin for jakarta.validation-api:2.0.2
in the build.gradle by deleting or replacing the implementation entry for
'jakarta.validation:jakarta.validation-api:2.0.2' so the project uses the
Dropwizard 4-managed version (3.0.2); if you must declare it explicitly, update
the coordinate to 'jakarta.validation:jakarta.validation-api:3.0.2' to match
Dropwizard 4's managed version.
In `@doc/Install.md`:
- Line 79: Typo in the documentation: replace the misspelled word "strating"
with "starting" in the sentence that mentions changing the `port` used by
GROBID; edit the Install.md line containing "You can change the `port` used by
GROBID when strating the docker container" to read "starting the docker
container" so the sentence is correct.
In `@settings.gradle`:
- Around line 1-3: Update the Foojay toolchain resolver plugin version used in
the Gradle build: locate the plugins block that applies id
'org.gradle.toolchains.foojay-resolver-convention' and change its version from
'0.8.0' to '1.0.0' so the plugins declaration reads the newer stable release.
In `@src/main/java/com/scienceminer/glutton/utils/grobid/GrobidClient.java`:
- Around line 96-105: parseGrobidResponse creates an XMLStreamReader2 via
inputFactory and calls StaxUtils.traverse(reader, handler) but never closes the
reader; update parseGrobidResponse to ensure the XMLStreamReader2 (reader) is
closed after parsing (e.g. use try-with-resources or a try/finally that calls
reader.close()), preserving the existing ServiceException handling (wrap
XMLStreamException as before) and still returning handler.getResponse();
reference symbols: parseGrobidResponse, XMLStreamReader2, inputFactory,
StaxUtils.traverse, GrobidResponseStaxHandler.
In `@src/main/java/com/scienceminer/glutton/utils/grobid/GrobidClientAsync.java`:
- Around line 94-103: In parseGrobidResponse, the XMLStreamReader2 created by
inputFactory.createXMLStreamReader(body) is never closed; update
parseGrobidResponse to ensure the XMLStreamReader2 (reader) is closed in a
finally block (or equivalent) after StaxUtils.traverse completes — e.g., assign
the reader to a variable before traversing, call handler.getResponse(), and in
finally call reader.close() (catching any XMLStreamException from close and
wrapping or logging as a ServiceException) so resources are always released;
reference symbols: parseGrobidResponse, XMLStreamReader2 reader,
inputFactory.createXMLStreamReader, StaxUtils.traverse,
GrobidResponseStaxHandler.
- Around line 77-92: The async branch currently throws exceptions inside the
CompletableFuture chain but the method doesn't return that future, so callers
cannot observe failures; change the method in GrobidClientAsync to return the
CompletableFuture<Void> produced by the
httpClient.sendAsync(...).thenAccept(...).exceptionally(...) chain (i.e., update
the method signature to return CompletableFuture<Void> and return the result of
the httpClient.sendAsync call) so that any ServiceException or
CompletionException propagates to callers for handling; ensure you reference the
httpClient.sendAsync(...) -> thenAccept(response -> { ... }) ->
exceptionally(...) chain when making the change.
---
Nitpick comments:
In `@pubmed-glutton/build.gradle`:
- Around line 63-74: The dependency block pins insecure/outdated libraries
(log4j:1.2.17, jackson-*:2.9.0.pr3, xstream:1.4.8, etc.); update these artifacts
in the build.gradle implementation declarations to secure supported releases
(replace log4j 1.x with Log4j2 or a maintained logging backend, upgrade
jackson-core/jackson-databind/jackson-annotations/jackson-dataformat-yaml to a
current patched 2.x release, and bump xstream to the latest secure version), run
the test suite and dependency vulnerability scan, and update any code/config
that needs API changes and the project documentation to reflect the new
versions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ad4996b4-07ad-445d-98d7-b1d3741bcf7f
📒 Files selected for processing (9)
build.gradledoc/Install.mdgradle/wrapper/gradle-wrapper.propertiespubmed-glutton/Readme.mdpubmed-glutton/build.gradlepubmed-glutton/gradle/wrapper/gradle-wrapper.propertiessettings.gradlesrc/main/java/com/scienceminer/glutton/utils/grobid/GrobidClient.javasrc/main/java/com/scienceminer/glutton/utils/grobid/GrobidClientAsync.java
| @@ -80,19 +85,31 @@ dependencies { | |||
|
|
|||
| implementation 'jakarta.validation:jakarta.validation-api:2.0.2' | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check what validation API version Dropwizard 4 brings transitively
# Search for validation API usage in the codebase
echo "=== Validation imports in codebase ==="
rg -n "import.*validation" --type java | head -30Repository: kermitt2/biblio-glutton
Length of output: 345
🌐 Web query:
What jakarta.validation-api version does Dropwizard 4.0.0 require?
💡 Result:
Dropwizard 4.0.0 (via its dropwizard-dependencies BOM) pins jakarta.validation:jakarta.validation-api to version 3.0.2. (central.sonatype.com)
Citations:
Remove explicit jakarta.validation-api version pin to align with Dropwizard 4.
The explicit pin to jakarta.validation-api:2.0.2 overrides Dropwizard 4.0.0's managed version 3.0.2. While version 2.0.2 does use the Jakarta namespace, Dropwizard 4.x manages this dependency explicitly and pinning to an older version may introduce compatibility issues. Remove this explicit dependency and let Dropwizard manage the validation API version, or upgrade to 3.0.2 if an explicit version is required.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@build.gradle` at line 86, Remove the explicit dependency pin for
jakarta.validation-api:2.0.2 in the build.gradle by deleting or replacing the
implementation entry for 'jakarta.validation:jakarta.validation-api:2.0.2' so
the project uses the Dropwizard 4-managed version (3.0.2); if you must declare
it explicitly, update the coordinate to
'jakarta.validation:jakarta.validation-api:3.0.2' to match Dropwizard 4's
managed version.
| * First download and install GROBID as indicated in the [documentation](https://grobid.readthedocs.io/en/latest/Install-Grobid/), normally as a docker image to take advantage of Deep Learning models for more accurate parsing of bibliographical references. **Recommended Grobid version: the latest stable 0.8.x release** (see [Grobid releases](https://github.com/kermitt2/grobid/releases)). biblio-glutton communicates with Grobid only via HTTP (the `/api/isalive` and `/api/processCitation` endpoints), so any Grobid 0.7.x or later release is API-compatible. | ||
|
|
||
| * Start the service as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-service/). You can change the `port` used by GROBID when strating the docker container, or by updating the service config file under `grobid/grobid-home/config/grobid.yaml`. | ||
| * Start the service as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-service/). You can change the `port` used by GROBID when strating the docker container, or by updating the service config file under `grobid/grobid-home/config/grobid.yaml`. |
There was a problem hiding this comment.
Typo: "strating" should be "starting".
📝 Fix typo
-* Start the service as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-service/). You can change the `port` used by GROBID when strating the docker container, or by updating the service config file under `grobid/grobid-home/config/grobid.yaml`.
+* Start the service as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-service/). You can change the `port` used by GROBID when starting the docker container, or by updating the service config file under `grobid/grobid-home/config/grobid.yaml`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| * Start the service as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-service/). You can change the `port` used by GROBID when strating the docker container, or by updating the service config file under `grobid/grobid-home/config/grobid.yaml`. | |
| * Start the service as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-service/). You can change the `port` used by GROBID when starting the docker container, or by updating the service config file under `grobid/grobid-home/config/grobid.yaml`. |
🧰 Tools
🪛 LanguageTool
[grammar] ~79-~79: Ensure spelling is correct
Context: ...n change the port used by GROBID when strating the docker container, or by updating th...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@doc/Install.md` at line 79, Typo in the documentation: replace the misspelled
word "strating" with "starting" in the sentence that mentions changing the
`port` used by GROBID; edit the Install.md line containing "You can change the
`port` used by GROBID when strating the docker container" to read "starting the
docker container" so the sentence is correct.
| plugins { | ||
| id 'org.gradle.toolchains.foojay-resolver-convention' version '0.8.0' | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest version of org.gradle.toolchains.foojay-resolver-convention plugin?
💡 Result:
The latest version of the org.gradle.toolchains.foojay-resolver-convention Gradle plugin is 1.0.0 (released May 19, 2025). [1]
Usage:
plugins {
id("org.gradle.toolchains.foojay-resolver-convention") version "1.0.0"
}Upgrade Foojay resolver plugin to version 1.0.0.
Good addition—the Foojay toolchain resolver enables automatic JDK provisioning. However, version 0.8.0 is outdated; the latest stable version is 1.0.0 (released May 2025). Update to:
plugins {
id 'org.gradle.toolchains.foojay-resolver-convention' version '1.0.0'
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@settings.gradle` around lines 1 - 3, Update the Foojay toolchain resolver
plugin version used in the Gradle build: locate the plugins block that applies
id 'org.gradle.toolchains.foojay-resolver-convention' and change its version
from '0.8.0' to '1.0.0' so the plugins declaration reads the newer stable
release.
| private GrobidResponse parseGrobidResponse(InputStream body) throws ServiceException { | ||
| try { | ||
| XMLStreamReader2 reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body); | ||
| GrobidResponseStaxHandler handler = new GrobidResponseStaxHandler(); | ||
| StaxUtils.traverse(reader, handler); | ||
| return handler.getResponse(); | ||
| } catch (XMLStreamException e) { | ||
| throw new ServiceException(502, "Cannot parse the response from GROBID", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
XMLStreamReader is not closed after parsing.
The XMLStreamReader2 created at line 98 is never explicitly closed. While the underlying InputStream is closed by the caller (line 85-87), the XMLStreamReader holds its own parser resources that should be released. Per the relevant snippet in StaxUtils.java, traverse() does not close the reader.
🛡️ Proposed fix to close the reader
private GrobidResponse parseGrobidResponse(InputStream body) throws ServiceException {
+ XMLStreamReader2 reader = null;
try {
- XMLStreamReader2 reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body);
+ reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body);
GrobidResponseStaxHandler handler = new GrobidResponseStaxHandler();
StaxUtils.traverse(reader, handler);
return handler.getResponse();
} catch (XMLStreamException e) {
throw new ServiceException(502, "Cannot parse the response from GROBID", e);
+ } finally {
+ if (reader != null) {
+ try {
+ reader.close();
+ } catch (XMLStreamException ignored) {
+ // Best-effort cleanup
+ }
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private GrobidResponse parseGrobidResponse(InputStream body) throws ServiceException { | |
| try { | |
| XMLStreamReader2 reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body); | |
| GrobidResponseStaxHandler handler = new GrobidResponseStaxHandler(); | |
| StaxUtils.traverse(reader, handler); | |
| return handler.getResponse(); | |
| } catch (XMLStreamException e) { | |
| throw new ServiceException(502, "Cannot parse the response from GROBID", e); | |
| } | |
| } | |
| private GrobidResponse parseGrobidResponse(InputStream body) throws ServiceException { | |
| XMLStreamReader2 reader = null; | |
| try { | |
| reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body); | |
| GrobidResponseStaxHandler handler = new GrobidResponseStaxHandler(); | |
| StaxUtils.traverse(reader, handler); | |
| return handler.getResponse(); | |
| } catch (XMLStreamException e) { | |
| throw new ServiceException(502, "Cannot parse the response from GROBID", e); | |
| } finally { | |
| if (reader != null) { | |
| try { | |
| reader.close(); | |
| } catch (XMLStreamException ignored) { | |
| // Best-effort cleanup | |
| } | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/java/com/scienceminer/glutton/utils/grobid/GrobidClient.java` around
lines 96 - 105, parseGrobidResponse creates an XMLStreamReader2 via inputFactory
and calls StaxUtils.traverse(reader, handler) but never closes the reader;
update parseGrobidResponse to ensure the XMLStreamReader2 (reader) is closed
after parsing (e.g. use try-with-resources or a try/finally that calls
reader.close()), preserving the existing ServiceException handling (wrap
XMLStreamException as before) and still returning handler.getResponse();
reference symbols: parseGrobidResponse, XMLStreamReader2, inputFactory,
StaxUtils.traverse, GrobidResponseStaxHandler.
| httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofInputStream()) | ||
| .thenAccept(response -> { | ||
| if (response.statusCode() != HttpURLConnection.HTTP_OK) { | ||
| throw new ServiceException(502, "Error while connecting to GROBID service. Error code: " + response.statusCode()); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void failed(Exception ex) { | ||
| throw new ServiceException(502, "Cannot parse the response from GROBID", ex); | ||
| } | ||
|
|
||
| @Override | ||
| public void cancelled() { | ||
| throw new ServiceException(502, "Cannot parse the response from GROBID"); | ||
| } | ||
| }); | ||
|
|
||
| try (InputStream body = response.body()) { | ||
| callback.accept(parseGrobidResponse(body)); | ||
| } catch (IOException e) { | ||
| throw new ServiceException(502, "Cannot read the response from GROBID", e); | ||
| } | ||
| }) | ||
| .exceptionally(ex -> { | ||
| LOGGER.warn("Async GROBID call failed", ex); | ||
| throw new ServiceException(502, "Async GROBID call failed", ex); | ||
| }); | ||
| } |
There was a problem hiding this comment.
Async error handling: thrown exception is lost since the CompletableFuture isn't returned.
The exceptionally() stage throws ServiceException, but since the CompletableFuture isn't returned or awaited, the caller has no way to observe failures. The exception gets wrapped in a CompletionException and effectively goes nowhere.
Consider either:
- Return the
CompletableFuture<Void>so callers can handle errors, or - Pass an error callback alongside the success callback.
🔧 Option 1: Return the future
- public void processCitation(String rawCitation, String consolidation, Consumer<GrobidResponse> callback) throws ServiceException {
+ public CompletableFuture<Void> processCitation(String rawCitation, String consolidation, Consumer<GrobidResponse> callback) {
String formBody = "citations=" + URLEncoder.encode(rawCitation, StandardCharsets.UTF_8)
+ "&consolidateCitation=" + URLEncoder.encode(consolidation, StandardCharsets.UTF_8);
HttpRequest request = HttpRequest.newBuilder(URI.create(grobidPath + "/processCitation"))
.timeout(REQUEST_TIMEOUT)
.header("Content-Type", "application/x-www-form-urlencoded")
.POST(HttpRequest.BodyPublishers.ofString(formBody, StandardCharsets.UTF_8))
.build();
- httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofInputStream())
+ return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofInputStream())
.thenAccept(response -> {
// ... existing logic
})
.exceptionally(ex -> {
LOGGER.warn("Async GROBID call failed", ex);
- throw new ServiceException(502, "Async GROBID call failed", ex);
+ return null; // Or rethrow to propagate to caller
});
}🔧 Option 2: Add error callback
- public void processCitation(String rawCitation, String consolidation, Consumer<GrobidResponse> callback) throws ServiceException {
+ public void processCitation(String rawCitation, String consolidation,
+ Consumer<GrobidResponse> onSuccess,
+ Consumer<Throwable> onError) {
// ... build request ...
httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofInputStream())
.thenAccept(response -> {
// ... call onSuccess ...
})
.exceptionally(ex -> {
LOGGER.warn("Async GROBID call failed", ex);
+ onError.accept(ex);
return null;
});
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/java/com/scienceminer/glutton/utils/grobid/GrobidClientAsync.java`
around lines 77 - 92, The async branch currently throws exceptions inside the
CompletableFuture chain but the method doesn't return that future, so callers
cannot observe failures; change the method in GrobidClientAsync to return the
CompletableFuture<Void> produced by the
httpClient.sendAsync(...).thenAccept(...).exceptionally(...) chain (i.e., update
the method signature to return CompletableFuture<Void> and return the result of
the httpClient.sendAsync call) so that any ServiceException or
CompletionException propagates to callers for handling; ensure you reference the
httpClient.sendAsync(...) -> thenAccept(response -> { ... }) ->
exceptionally(...) chain when making the change.
| private GrobidResponse parseGrobidResponse(InputStream body) throws ServiceException { | ||
| try { | ||
| XMLStreamReader2 reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body); | ||
| GrobidResponseStaxHandler handler = new GrobidResponseStaxHandler(); | ||
| StaxUtils.traverse(reader, handler); | ||
| return handler.getResponse(); | ||
| } catch (XMLStreamException e) { | ||
| throw new ServiceException(502, "Cannot parse the response from GROBID", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
XMLStreamReader is not closed after parsing.
Same issue as in GrobidClient.java: the XMLStreamReader2 is never explicitly closed after traversal.
🛡️ Proposed fix to close the reader
private GrobidResponse parseGrobidResponse(InputStream body) throws ServiceException {
+ XMLStreamReader2 reader = null;
try {
- XMLStreamReader2 reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body);
+ reader = (XMLStreamReader2) inputFactory.createXMLStreamReader(body);
GrobidResponseStaxHandler handler = new GrobidResponseStaxHandler();
StaxUtils.traverse(reader, handler);
return handler.getResponse();
} catch (XMLStreamException e) {
throw new ServiceException(502, "Cannot parse the response from GROBID", e);
+ } finally {
+ if (reader != null) {
+ try {
+ reader.close();
+ } catch (XMLStreamException ignored) {
+ // Best-effort cleanup
+ }
+ }
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/java/com/scienceminer/glutton/utils/grobid/GrobidClientAsync.java`
around lines 94 - 103, In parseGrobidResponse, the XMLStreamReader2 created by
inputFactory.createXMLStreamReader(body) is never closed; update
parseGrobidResponse to ensure the XMLStreamReader2 (reader) is closed in a
finally block (or equivalent) after StaxUtils.traverse completes — e.g., assign
the reader to a variable before traversing, call handler.getResponse(), and in
finally call reader.close() (catching any XMLStreamException from close and
wrapping or logging as a ServiceException) so resources are always released;
reference symbols: parseGrobidResponse, XMLStreamReader2 reader,
inputFactory.createXMLStreamReader, StaxUtils.traverse,
GrobidResponseStaxHandler.
With the release of Grobid 0.9.0, I took the opportunity to update some libraries, including how grobid is called.
Summary by CodeRabbit
Chores
Documentation
grobidPathtogrobidHost