Skip to content

Commit 7531e4b

Browse files
matheustavaresjeffhostetler
authored andcommitted
parallel-checkout: add configuration options
Make parallel checkout configurable by introducing two new settings: checkout.workers and checkout.thresholdForParallelism. The first defines the number of workers (where one means sequential checkout), and the second defines the minimum number of entries to attempt parallel checkout. To decide the default value for checkout.workers, the parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The four tables below show the mean run times and standard deviations for 5 runs in: a local file system on SSD, a local file system on HDD, a Linux NFS server, and Amazon EFS (all on Linux). Each parallel checkout test was executed with the number of workers that brings the best overall results in that environment. Local SSD: Sequential 10 workers Speedup Clone 8.805 s ± 0.043 s 3.564 s ± 0.041 s 2.47 ± 0.03 Checkout I 9.678 s ± 0.057 s 4.486 s ± 0.050 s 2.16 ± 0.03 Checkout II 5.034 s ± 0.072 s 3.021 s ± 0.038 s 1.67 ± 0.03 Local HDD: Sequential 10 workers Speedup Clone 32.288 s ± 0.580 s 30.724 s ± 0.522 s 1.05 ± 0.03 Checkout I 54.172 s ± 7.119 s 54.429 s ± 6.738 s 1.00 ± 0.18 Checkout II 40.465 s ± 2.402 s 38.682 s ± 1.365 s 1.05 ± 0.07 Linux NFS server (v4.1, on EBS, single availability zone): Sequential 32 workers Speedup Clone 240.368 s ± 6.347 s 57.349 s ± 0.870 s 4.19 ± 0.13 Checkout I 242.862 s ± 2.215 s 58.700 s ± 0.904 s 4.14 ± 0.07 Checkout II 65.751 s ± 1.577 s 23.820 s ± 0.407 s 2.76 ± 0.08 EFS (v4.1, replicated over multiple availability zones): Sequential 32 workers Speedup Clone 922.321 s ± 2.274 s 210.453 s ± 3.412 s 4.38 ± 0.07 Checkout I 1011.300 s ± 7.346 s 297.828 s ± 0.964 s 3.40 ± 0.03 Checkout II 294.104 s ± 1.836 s 126.017 s ± 1.190 s 2.33 ± 0.03 The above benchmarks show that parallel checkout is most effective on repositories located on an SSD or over a distributed file system. For local file systems on spinning disks, and/or older machines, the parallelism does not always bring a good performance. For this reason, the default value for checkout.workers is one, a.k.a. sequential checkout. To decide the default value for checkout.thresholdForParallelism, another benchmark was executed in the "Local SSD" setup, where parallel checkout showed to be beneficial. This time, we compared the runtime of a `git checkout -f`, with and without parallelism, after randomly removing an increasing number of files from the Linux working tree. The "sequential fallback" column below corresponds to the executions where checkout.workers was 10 but checkout.thresholdForParallelism was equal to the number of to-be-updated files plus one (so that we end up writing sequentially). Each test case was sampled 15 times, and each sample had a randomly different set of files removed. Here are the results: sequential fallback 10 workers speedup 10 files 772.3 ms ± 12.6 ms 769.0 ms ± 13.6 ms 1.00 ± 0.02 20 files 780.5 ms ± 15.8 ms 775.2 ms ± 9.2 ms 1.01 ± 0.02 50 files 806.2 ms ± 13.8 ms 767.4 ms ± 8.5 ms 1.05 ± 0.02 100 files 833.7 ms ± 21.4 ms 750.5 ms ± 16.8 ms 1.11 ± 0.04 200 files 897.6 ms ± 30.9 ms 730.5 ms ± 14.7 ms 1.23 ± 0.05 500 files 1035.4 ms ± 48.0 ms 677.1 ms ± 22.3 ms 1.53 ± 0.09 1000 files 1244.6 ms ± 35.6 ms 654.0 ms ± 38.3 ms 1.90 ± 0.12 2000 files 1488.8 ms ± 53.4 ms 658.8 ms ± 23.8 ms 2.26 ± 0.12 From the above numbers, 100 files seems to be a reasonable default value for the threshold setting. Note: Up to 1000 files, we observe a drop in the execution time of the parallel code with an increase in the number of files. This is a rather odd behavior, but it was observed in multiple repetitions. Above 1000 files, the execution time increases according to the number of files, as one would expect. About the test environments: Local SSD tests were executed on an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux. Local HDD tests were executed on an Intel(R) Xeon(R) E3-1230 (also 4 cores with hyper-threading), HDD Seagate Barracuda 7200.14 SATA 3.1, running Debian. NFS and EFS tests were executed on an Amazon EC2 c5n.xlarge instance, with 4 vCPUs. The Linux NFS server was running on a m6g.large instance with 2 vCPUSs and a 1 TB EBS GP2 volume. Before each timing, the linux repository was removed (or checked out back to its previous state), and `sync && sysctl vm.drop_caches=3` was executed. Co-authored-by: Jeff Hostetler <[email protected]> Signed-off-by: Matheus Tavares <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent e9e8adf commit 7531e4b

File tree

4 files changed

+54
-10
lines changed

4 files changed

+54
-10
lines changed

Documentation/config/checkout.txt

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,24 @@ checkout.guess::
2121
Provides the default value for the `--guess` or `--no-guess`
2222
option in `git checkout` and `git switch`. See
2323
linkgit:git-switch[1] and linkgit:git-checkout[1].
24+
25+
checkout.workers::
26+
The number of parallel workers to use when updating the working tree.
27+
The default is one, i.e. sequential execution. If set to a value less
28+
than one, Git will use as many workers as the number of logical cores
29+
available. This setting and `checkout.thresholdForParallelism` affect
30+
all commands that perform checkout. E.g. checkout, clone, reset,
31+
sparse-checkout, etc.
32+
+
33+
Note: parallel checkout usually delivers better performance for repositories
34+
located on SSDs or over NFS. For repositories on spinning disks and/or machines
35+
with a small number of cores, the default sequential checkout often performs
36+
better. The size and compression level of a repository might also influence how
37+
well the parallel version performs.
38+
39+
checkout.thresholdForParallelism::
40+
When running parallel checkout with a small number of files, the cost
41+
of subprocess spawning and inter-process communication might outweigh
42+
the parallelization gains. This setting allows to define the minimum
43+
number of files for which parallel checkout should be attempted. The
44+
default is 100.

parallel-checkout.c

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
#include "cache.h"
2+
#include "config.h"
23
#include "entry.h"
34
#include "parallel-checkout.h"
45
#include "pkt-line.h"
56
#include "run-command.h"
67
#include "sigchain.h"
78
#include "streaming.h"
9+
#include "thread-utils.h"
810

911
struct pc_worker {
1012
struct child_process cp;
@@ -24,6 +26,20 @@ enum pc_status parallel_checkout_status(void)
2426
return parallel_checkout.status;
2527
}
2628

29+
static const int DEFAULT_THRESHOLD_FOR_PARALLELISM = 100;
30+
static const int DEFAULT_NUM_WORKERS = 1;
31+
32+
void get_parallel_checkout_configs(int *num_workers, int *threshold)
33+
{
34+
if (git_config_get_int("checkout.workers", num_workers))
35+
*num_workers = DEFAULT_NUM_WORKERS;
36+
else if (*num_workers < 1)
37+
*num_workers = online_cpus();
38+
39+
if (git_config_get_int("checkout.thresholdForParallelism", threshold))
40+
*threshold = DEFAULT_THRESHOLD_FOR_PARALLELISM;
41+
}
42+
2743
void init_parallel_checkout(void)
2844
{
2945
if (parallel_checkout.status != PC_UNINITIALIZED)
@@ -584,11 +600,9 @@ static void write_items_sequentially(struct checkout *state)
584600
write_pc_item(&parallel_checkout.items[i], state);
585601
}
586602

587-
static const int DEFAULT_NUM_WORKERS = 2;
588-
589-
int run_parallel_checkout(struct checkout *state)
603+
int run_parallel_checkout(struct checkout *state, int num_workers, int threshold)
590604
{
591-
int ret, num_workers = DEFAULT_NUM_WORKERS;
605+
int ret;
592606

593607
if (parallel_checkout.status != PC_ACCEPTING_ENTRIES)
594608
BUG("cannot run parallel checkout: uninitialized or already running");
@@ -598,7 +612,7 @@ int run_parallel_checkout(struct checkout *state)
598612
if (parallel_checkout.nr < num_workers)
599613
num_workers = parallel_checkout.nr;
600614

601-
if (num_workers <= 1) {
615+
if (num_workers <= 1 || parallel_checkout.nr < threshold) {
602616
write_items_sequentially(state);
603617
} else {
604618
struct pc_worker *workers = setup_workers(state, num_workers);

parallel-checkout.h

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ enum pc_status {
1717
};
1818

1919
enum pc_status parallel_checkout_status(void);
20+
void get_parallel_checkout_configs(int *num_workers, int *threshold);
2021

2122
/*
2223
* Put parallel checkout into the PC_ACCEPTING_ENTRIES state. Should be used
@@ -31,8 +32,12 @@ void init_parallel_checkout(void);
3132
*/
3233
int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca);
3334

34-
/* Write all the queued entries, returning 0 on success.*/
35-
int run_parallel_checkout(struct checkout *state);
35+
/*
36+
* Write all the queued entries, returning 0 on success. If the number of
37+
* entries is smaller than the specified threshold, the operation is performed
38+
* sequentially.
39+
*/
40+
int run_parallel_checkout(struct checkout *state, int num_workers, int threshold);
3641

3742
/****************************************************************
3843
* Interface with checkout--worker

unpack-trees.c

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o,
399399
int errs = 0;
400400
struct progress *progress;
401401
struct checkout state = CHECKOUT_INIT;
402-
int i;
402+
int i, pc_workers, pc_threshold;
403403

404404
trace_performance_enter();
405405
state.force = 1;
@@ -465,8 +465,11 @@ static int check_updates(struct unpack_trees_options *o,
465465
oid_array_clear(&to_fetch);
466466
}
467467

468+
get_parallel_checkout_configs(&pc_workers, &pc_threshold);
469+
468470
enable_delayed_checkout(&state);
469-
init_parallel_checkout();
471+
if (pc_workers > 1)
472+
init_parallel_checkout();
470473
for (i = 0; i < index->cache_nr; i++) {
471474
struct cache_entry *ce = index->cache[i];
472475

@@ -480,7 +483,8 @@ static int check_updates(struct unpack_trees_options *o,
480483
}
481484
}
482485
stop_progress(&progress);
483-
errs |= run_parallel_checkout(&state);
486+
if (pc_workers > 1)
487+
errs |= run_parallel_checkout(&state, pc_workers, pc_threshold);
484488
errs |= finish_delayed_checkout(&state, NULL);
485489
git_attr_set_direction(GIT_ATTR_CHECKIN);
486490

0 commit comments

Comments
 (0)