Skip to content

Index prefetch packs in parallel#1843

Closed
tyrielv wants to merge 1 commit intomicrosoft:masterfrom
tyrielv:parallel-index-prefetch
Closed

Index prefetch packs in parallel#1843
tyrielv wants to merge 1 commit intomicrosoft:masterfrom
tyrielv:parallel-index-prefetch

Conversation

@tyrielv
Copy link
Contributor

@tyrielv tyrielv commented Jun 16, 2025

#1840 required local indexing of pack files (an expensive task for large repositories) to gvfs clone. This pull request improves the performance of pack file indexing by:

  1. Launching git index-pack instances asynchronously during prefetch, and
  2. Specifying a number of threads equal to processor count to git index-pack, overriding the git default of half the processor count.

The prefetch API will usually return one large pack file comprising most of the commits and trees, followed by several smaller pack files representing the new commits and trees since the last maintenance job consolidated the main pack file.

git index-pack begins with a single-threaded task to index the objects in the pack file, followed by a multi-threaded task to resolve the deltas. With these changes, the smaller incremental pack files are typically indexed in full while the primary pack file is still in its single-threaded phase, and the primary pack file's multi-threaded phase is marginally faster.

In testing, these changes reduced clone time for a large repository by 15-30%.

@mjcheetham mjcheetham requested review from dscho and mjcheetham June 17, 2025 10:41
@tyrielv tyrielv closed this Jun 18, 2025
@tyrielv
Copy link
Contributor Author

tyrielv commented Jun 18, 2025

Closing due to #1844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant