|
| 1 | +Scalar |
| 2 | +====== |
| 3 | + |
| 4 | +Scalar is a repository management tool that optimizes Git for use in large |
| 5 | +repositories. It accomplishes this by helping users to take advantage of |
| 6 | +advanced performance features in Git. Unlike most other Git built-in commands, |
| 7 | +Scalar is not executed as a subcommand of 'git'; rather, it is built as a |
| 8 | +separate executable containing its own series of subcommands. |
| 9 | + |
| 10 | +Background |
| 11 | +---------- |
| 12 | + |
| 13 | +Scalar was originally designed as an add-on to Git and implemented as a .NET |
| 14 | +Core application. It was created based on the learnings from the VFS for Git |
| 15 | +project (another application aimed at improving the experience of working with |
| 16 | +large repositories). As part of its initial implementation, Scalar relied on |
| 17 | +custom features in the Microsoft fork of Git that have since been integrated |
| 18 | +into core Git: |
| 19 | + |
| 20 | +* partial clone, |
| 21 | +* commit graphs, |
| 22 | +* multi-pack index, |
| 23 | +* sparse checkout (cone mode), |
| 24 | +* scheduled background maintenance, |
| 25 | +* etc |
| 26 | + |
| 27 | +With the requisite Git functionality in place and a desire to bring the benefits |
| 28 | +of Scalar to the larger Git community, the Scalar application itself was ported |
| 29 | +from C# to C and integrated upstream. |
| 30 | + |
| 31 | +Features |
| 32 | +-------- |
| 33 | + |
| 34 | +Scalar is comprised of two major pieces of functionality: automatically |
| 35 | +configuring built-in Git performance features and managing repository |
| 36 | +enlistments. |
| 37 | + |
| 38 | +The Git performance features configured by Scalar (see "Background" for |
| 39 | +examples) confer substantial performance benefits to large repositories, but are |
| 40 | +either too experimental to enable for all of Git yet, or only benefit large |
| 41 | +repositories. As new features are introduced, Scalar should be updated |
| 42 | +accordingly to incorporate them. This will prevent the tool from becoming stale |
| 43 | +while also providing a path for more easily bringing features to the appropriate |
| 44 | +users. |
| 45 | + |
| 46 | +Enlistments are how Scalar knows which repositories on a user's system should |
| 47 | +utilize Scalar-configured features. This allows it to update performance |
| 48 | +settings when new ones are added to the tool, as well as centrally manage |
| 49 | +repository maintenance. The enlistment structure - a root directory with a |
| 50 | +`src/` subdirectory containing the cloned repository itself - is designed to |
| 51 | +encourage users to route build outputs outside of the repository to avoid the |
| 52 | +performance-limiting overhead of ignoring those files in Git. |
| 53 | + |
| 54 | +Design |
| 55 | +------ |
| 56 | + |
| 57 | +Scalar is implemented in C and interacts with Git via a mix of child process |
| 58 | +invocations of Git and direct usage of `libgit.a`. Internally, it is structured |
| 59 | +much like other built-ins with subcommands (e.g., `git stash`), containing a |
| 60 | +`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()` |
| 61 | +function. Most options are unique to each subcommand, with `scalar` respecting |
| 62 | +some "global" `git` options (e.g., `-c` and `-C`). |
| 63 | + |
| 64 | +Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is |
| 65 | +built and installed as its own executable in the `bin/` directory, alongside |
| 66 | +`git`, `git-gui`, etc. |
| 67 | + |
| 68 | +Roadmap |
| 69 | +------- |
| 70 | + |
| 71 | +NOTE: this section will be removed once the remaining tasks outlined in this |
| 72 | +roadmap are complete. |
| 73 | + |
| 74 | +Scalar is a large enough project that it is being upstreamed incrementally, |
| 75 | +living in `contrib/` until it is feature-complete. So far, the following patch |
| 76 | +series have been accepted: |
| 77 | + |
| 78 | +- `scalar-the-beginning`: The initial patch series which sets up |
| 79 | + `contrib/scalar/` and populates it with a minimal `scalar` command that |
| 80 | + demonstrates the fundamental ideas. |
| 81 | + |
| 82 | +- `scalar-c-and-C`: The `scalar` command learns about two options that can be |
| 83 | + specified before the command, `-c <key>=<value>` and `-C <directory>`. |
| 84 | + |
| 85 | +- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand. |
| 86 | + |
| 87 | +Roughly speaking (and subject to change), the following series are needed to |
| 88 | +"finish" this initial version of Scalar: |
| 89 | + |
| 90 | +- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments |
| 91 | + and implement `scalar help`. At the end of this series, Scalar should be |
| 92 | + feature-complete from the perspective of a user. |
| 93 | + |
| 94 | +- Generalize features not specific to Scalar: In the spirit of making Scalar |
| 95 | + configure only what is needed for large repo performance, move common |
| 96 | + utilities into other parts of Git. Some of this will be internal-only, but one |
| 97 | + major change will be generalizing `scalar diagnose` for use with any Git |
| 98 | + repository. |
| 99 | + |
| 100 | +- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of |
| 101 | + `git`, including updates to build and install it with the rest of Git. This |
| 102 | + change will incorporate Scalar into the Git CI and test framework, as well as |
| 103 | + expand regression and performance testing to ensure the tool is stable. |
| 104 | + |
| 105 | +Finally, there are two additional patch series that exist in Microsoft's fork of |
| 106 | +Git, but there is no current plan to upstream them. There are some interesting |
| 107 | +ideas there, but the implementation is too specific to Azure Repos and/or VFS |
| 108 | +for Git to be of much help in general. |
| 109 | + |
| 110 | +These still exist mainly because the GVFS protocol is what Azure Repos has |
| 111 | +instead of partial clone, while Git is focused on improving partial clone: |
| 112 | + |
| 113 | +- `scalar-with-gvfs`: The primary purpose of this patch series is to support |
| 114 | + existing Scalar users whose repositories are hosted in Azure Repos (which does |
| 115 | + not support Git's partial clones, but supports its predecessor, the GVFS |
| 116 | + protocol, which is used by Scalar to emulate the partial clone). |
| 117 | + |
| 118 | + Since the GVFS protocol will never be supported by core Git, this patch series |
| 119 | + will remain in Microsoft's fork of Git. |
| 120 | + |
| 121 | +- `run-scalar-functional-tests`: The Scalar project developed a quite |
| 122 | + comprehensive set of integration tests (or, "Functional Tests"). They are the |
| 123 | + sole remaining part of the original C#-based Scalar project, and this patch |
| 124 | + adds a GitHub workflow that runs them all. |
| 125 | + |
| 126 | + Since the tests partially depend on features that are only provided in the |
| 127 | + `scalar-with-gvfs` patch series, this patch cannot be upstreamed. |
0 commit comments