Skip to content

Commit 72d3a5d

Browse files
vdyegitster
authored andcommitted
scalar: convert README.md into a technical design doc
Adapt the content from 'contrib/scalar/README.md' into a design document in 'Documentation/technical/'. In addition to reformatting for asciidoc, elaborate on the background, purpose, and design choices that went into Scalar. Most of this document will persist in the 'Documentation/technical/' after Scalar has been moved out of 'contrib/' and into the root of Git. Until that time, it will also contain a temporary "Roadmap" section detailing the remaining series needed to finish the initial version of Scalar. The section will be removed once Scalar is moved to the repo root, but in the meantime serves as a guide for readers to keep up with progress on the feature. Signed-off-by: Victoria Dye <[email protected]> Acked-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent f22c95d commit 72d3a5d

File tree

2 files changed

+127
-82
lines changed

2 files changed

+127
-82
lines changed

Documentation/technical/scalar.txt

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
Scalar
2+
======
3+
4+
Scalar is a repository management tool that optimizes Git for use in large
5+
repositories. It accomplishes this by helping users to take advantage of
6+
advanced performance features in Git. Unlike most other Git built-in commands,
7+
Scalar is not executed as a subcommand of 'git'; rather, it is built as a
8+
separate executable containing its own series of subcommands.
9+
10+
Background
11+
----------
12+
13+
Scalar was originally designed as an add-on to Git and implemented as a .NET
14+
Core application. It was created based on the learnings from the VFS for Git
15+
project (another application aimed at improving the experience of working with
16+
large repositories). As part of its initial implementation, Scalar relied on
17+
custom features in the Microsoft fork of Git that have since been integrated
18+
into core Git:
19+
20+
* partial clone,
21+
* commit graphs,
22+
* multi-pack index,
23+
* sparse checkout (cone mode),
24+
* scheduled background maintenance,
25+
* etc
26+
27+
With the requisite Git functionality in place and a desire to bring the benefits
28+
of Scalar to the larger Git community, the Scalar application itself was ported
29+
from C# to C and integrated upstream.
30+
31+
Features
32+
--------
33+
34+
Scalar is comprised of two major pieces of functionality: automatically
35+
configuring built-in Git performance features and managing repository
36+
enlistments.
37+
38+
The Git performance features configured by Scalar (see "Background" for
39+
examples) confer substantial performance benefits to large repositories, but are
40+
either too experimental to enable for all of Git yet, or only benefit large
41+
repositories. As new features are introduced, Scalar should be updated
42+
accordingly to incorporate them. This will prevent the tool from becoming stale
43+
while also providing a path for more easily bringing features to the appropriate
44+
users.
45+
46+
Enlistments are how Scalar knows which repositories on a user's system should
47+
utilize Scalar-configured features. This allows it to update performance
48+
settings when new ones are added to the tool, as well as centrally manage
49+
repository maintenance. The enlistment structure - a root directory with a
50+
`src/` subdirectory containing the cloned repository itself - is designed to
51+
encourage users to route build outputs outside of the repository to avoid the
52+
performance-limiting overhead of ignoring those files in Git.
53+
54+
Design
55+
------
56+
57+
Scalar is implemented in C and interacts with Git via a mix of child process
58+
invocations of Git and direct usage of `libgit.a`. Internally, it is structured
59+
much like other built-ins with subcommands (e.g., `git stash`), containing a
60+
`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
61+
function. Most options are unique to each subcommand, with `scalar` respecting
62+
some "global" `git` options (e.g., `-c` and `-C`).
63+
64+
Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
65+
built and installed as its own executable in the `bin/` directory, alongside
66+
`git`, `git-gui`, etc.
67+
68+
Roadmap
69+
-------
70+
71+
NOTE: this section will be removed once the remaining tasks outlined in this
72+
roadmap are complete.
73+
74+
Scalar is a large enough project that it is being upstreamed incrementally,
75+
living in `contrib/` until it is feature-complete. So far, the following patch
76+
series have been accepted:
77+
78+
- `scalar-the-beginning`: The initial patch series which sets up
79+
`contrib/scalar/` and populates it with a minimal `scalar` command that
80+
demonstrates the fundamental ideas.
81+
82+
- `scalar-c-and-C`: The `scalar` command learns about two options that can be
83+
specified before the command, `-c <key>=<value>` and `-C <directory>`.
84+
85+
- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
86+
87+
Roughly speaking (and subject to change), the following series are needed to
88+
"finish" this initial version of Scalar:
89+
90+
- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments
91+
and implement `scalar help`. At the end of this series, Scalar should be
92+
feature-complete from the perspective of a user.
93+
94+
- Generalize features not specific to Scalar: In the spirit of making Scalar
95+
configure only what is needed for large repo performance, move common
96+
utilities into other parts of Git. Some of this will be internal-only, but one
97+
major change will be generalizing `scalar diagnose` for use with any Git
98+
repository.
99+
100+
- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
101+
`git`, including updates to build and install it with the rest of Git. This
102+
change will incorporate Scalar into the Git CI and test framework, as well as
103+
expand regression and performance testing to ensure the tool is stable.
104+
105+
Finally, there are two additional patch series that exist in Microsoft's fork of
106+
Git, but there is no current plan to upstream them. There are some interesting
107+
ideas there, but the implementation is too specific to Azure Repos and/or VFS
108+
for Git to be of much help in general.
109+
110+
These still exist mainly because the GVFS protocol is what Azure Repos has
111+
instead of partial clone, while Git is focused on improving partial clone:
112+
113+
- `scalar-with-gvfs`: The primary purpose of this patch series is to support
114+
existing Scalar users whose repositories are hosted in Azure Repos (which does
115+
not support Git's partial clones, but supports its predecessor, the GVFS
116+
protocol, which is used by Scalar to emulate the partial clone).
117+
118+
Since the GVFS protocol will never be supported by core Git, this patch series
119+
will remain in Microsoft's fork of Git.
120+
121+
- `run-scalar-functional-tests`: The Scalar project developed a quite
122+
comprehensive set of integration tests (or, "Functional Tests"). They are the
123+
sole remaining part of the original C#-based Scalar project, and this patch
124+
adds a GitHub workflow that runs them all.
125+
126+
Since the tests partially depend on features that are only provided in the
127+
`scalar-with-gvfs` patch series, this patch cannot be upstreamed.

contrib/scalar/README.md

Lines changed: 0 additions & 82 deletions
This file was deleted.

0 commit comments

Comments
 (0)