Skip to content

✨ feat(checksum calculator): Add cache to remote checksum calculator#1481

Merged
LogFlames merged 3 commits intomainfrom
avoid_duplicate_downloads_in_remote_mode
Jan 29, 2026
Merged

✨ feat(checksum calculator): Add cache to remote checksum calculator#1481
LogFlames merged 3 commits intomainfrom
avoid_duplicate_downloads_in_remote_mode

Conversation

@LogFlames
Copy link
Member

Adds cache to remote checksum calculator. Part of solution for #1476.

Reduces time to generate lockfile for maven-lockfile in remote mode (plugins included) from:

[INFO] maven-lockfile-parent .............................. SUCCESS [08:18 min]
[INFO] maven-lockfile-plugin .............................. SUCCESS [12:01 min]

To

[INFO] maven-lockfile-parent .............................. SUCCESS [02:21 min]
[INFO] maven-lockfile-plugin .............................. SUCCESS [02:26 min]

@LogFlames
Copy link
Member Author

@algomaster99 would love a review! Quite low hanging apple in terms of speedup to complexity ratio

@LogFlames LogFlames force-pushed the avoid_duplicate_downloads_in_remote_mode branch from 6d0e583 to f0c71d3 Compare January 28, 2026 15:07
@LogFlames LogFlames changed the title ✨ feat(checksum calculator): Add resolve cache to remote checksum calculator ✨ feat(checksum calculator): Add cache to remote checksum calculator Jan 28, 2026
@LogFlames LogFlames requested a review from Copilot January 28, 2026 15:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds caching functionality to the RemoteChecksumCalculator to significantly improve performance when generating lockfiles in remote mode. The change addresses issue #1476, which reported extremely slow execution times (around 20 minutes) when generating lockfiles with remote checksums. According to the PR description, the implementation reduces lockfile generation time from approximately 20 minutes to around 5 minutes by caching previously fetched checksums and repository information.

Changes:

  • Added two ConcurrentHashMap caches to store checksums and repository information
  • Implemented cache lookup logic in both checksum calculation and repository resolution methods
  • Cache uses artifact coordinates as keys and stores empty strings/sentinel values for failed lookups

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@algomaster99
Copy link
Member

The cache implemented here is useful if we compute checksum of a dependency multiple times. For example, we can see maven-artifact duplicated multiple times - 1 and 2. The issue mentions that it is slow because of sequential computation, which is not handled at least in the changes.

I suggest computing performance improvement over spoon just to know what actually is slowing down remote checksum calculator.

@LogFlames
Copy link
Member Author

I will run tests over spoon, my plan is to implement parallel connections in another PR to separate the functionality.

Adding parallellism introduces quite a bit of complexity so I thought it would be a good idea to start with this easier change which already gets huge improvements.

@LogFlames
Copy link
Member Author

Here are my spoon tests.

Without caching:

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13:46 min
[INFO] Finished at: 2026-01-29T13:54:29+01:00
[INFO] ------------------------------------------------------------------------

With caching:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:28 min
[INFO] Finished at: 2026-01-29T14:00:17+01:00
[INFO] ------------------------------------------------------------------------

@algomaster99
Copy link
Member

Oh this is very nice!

Adding parallellism introduces quite a bit of complexity so I thought it would be a good idea to start with this easier change which already gets huge improvements.

Sounds good, although 1 minute is also not too bad :)

@LogFlames LogFlames force-pushed the avoid_duplicate_downloads_in_remote_mode branch from 4b9c931 to e87ca10 Compare January 29, 2026 13:07
@LogFlames LogFlames enabled auto-merge (squash) January 29, 2026 13:07
@LogFlames LogFlames merged commit 30304ce into main Jan 29, 2026
14 checks passed
@LogFlames LogFlames deleted the avoid_duplicate_downloads_in_remote_mode branch January 29, 2026 13:17
@algomaster99
Copy link
Member

algomaster99 commented Feb 6, 2026

Here are my spoon tests.

Without caching:

[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13:46 min
[INFO] Finished at: 2026-01-29T13:54:29+01:00
[INFO] ------------------------------------------------------------------------

With caching:

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:28 min
[INFO] Finished at: 2026-01-29T14:00:17+01:00
[INFO] ------------------------------------------------------------------------

@LogFlames could you share the exact command to reproduce this? I feel mvn io.github.chains-project:maven-lockfile:5.13.0:generate is not it.

Even with mvn io.github.chains-project:maven-lockfile:5.13.0:generate -DchecksumMode=remote I get a lot of [INFO] Unable to find SHA-256 checksum for org.slf4j:slf4j-api:jar:1.7.36:compile on remote. Downloading and calculating locally. so remote is not working for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants