Skip to content

Commit 411903c

Browse files
committed
Merge branch 'rj/doc-technical-fixes'
Documentation mark-up fixes. * rj/doc-technical-fixes: doc: add large-object-promisors.adoc to the docs build doc: commit-graph.adoc: fix up some formatting doc: sparse-checkout.adoc: fix asciidoc warnings doc: remembering-renames.adoc: fix asciidoc warnings
2 parents 1d10771 + 1c1fc86 commit 411903c

File tree

6 files changed

+507
-412
lines changed

6 files changed

+507
-412
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ TECH_DOCS += technical/bundle-uri
123123
TECH_DOCS += technical/commit-graph
124124
TECH_DOCS += technical/directory-rename-detection
125125
TECH_DOCS += technical/hash-function-transition
126+
TECH_DOCS += technical/large-object-promisors
126127
TECH_DOCS += technical/long-running-process-protocol
127128
TECH_DOCS += technical/multi-pack-index
128129
TECH_DOCS += technical/packfile-uri

Documentation/technical/commit-graph.adoc

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ A consumer may load the following info for a commit from the graph:
3939
Values 1-4 satisfy the requirements of parse_commit_gently().
4040

4141
There are two definitions of generation number:
42+
4243
1. Corrected committer dates (generation number v2)
4344
2. Topological levels (generation number v1)
4445
@@ -158,7 +159,8 @@ number of commits in the full history. By creating a "chain" of commit-graphs,
158159
we enable fast writes of new commit data without rewriting the entire commit
159160
history -- at least, most of the time.
160161

161-
## File Layout
162+
File Layout
163+
~~~~~~~~~~~
162164

163165
A commit-graph chain uses multiple files, and we use a fixed naming convention
164166
to organize these files. Each commit-graph file has a name
@@ -170,11 +172,11 @@ hashes for the files in order from "lowest" to "highest".
170172

171173
For example, if the `commit-graph-chain` file contains the lines
172174

173-
```
175+
----
174176
{hash0}
175177
{hash1}
176178
{hash2}
177-
```
179+
----
178180

179181
then the commit-graph chain looks like the following diagram:
180182

@@ -213,7 +215,8 @@ specifying the hashes of all files in the lower layers. In the above example,
213215
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
214216
`{hash0}` and `{hash1}`.
215217

216-
## Merging commit-graph files
218+
Merging commit-graph files
219+
~~~~~~~~~~~~~~~~~~~~~~~~~~
217220

218221
If we only added a new commit-graph file on every write, we would run into a
219222
linear search problem through many commit-graph files. Instead, we use a merge
@@ -225,6 +228,7 @@ is determined by the merge strategy that the files should collapse to
225228
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
226229
file.
227230

231+
....
228232
+---------------------+
229233
| |
230234
| (new commits) |
@@ -250,21 +254,23 @@ file.
250254
| |
251255
| |
252256
+-----------------------+
257+
....
253258

254259
During this process, the commits to write are combined, sorted and we write the
255260
contents to a temporary file, all while holding a `commit-graph-chain.lock`
256261
lock-file. When the file is flushed, we rename it to `graph-{hash3}`
257262
according to the computed `{hash3}`. Finally, we write the new chain data to
258263
`commit-graph-chain.lock`:
259264

260-
```
265+
----
261266
{hash3}
262267
{hash0}
263-
```
268+
----
264269

265270
We then close the lock-file.
266271

267-
## Merge Strategy
272+
Merge Strategy
273+
~~~~~~~~~~~~~~
268274

269275
When writing a set of commits that do not exist in the commit-graph stack of
270276
height N, we default to creating a new file at level N + 1. We then decide to
@@ -289,7 +295,8 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
289295
number of commits) could be extracted into config settings for full
290296
flexibility.
291297

292-
## Handling Mixed Generation Number Chains
298+
Handling Mixed Generation Number Chains
299+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293300

294301
With the introduction of generation number v2 and generation data chunk, the
295302
following scenario is possible:
@@ -318,7 +325,8 @@ have corrected commit dates when written by compatible versions of Git. Thus,
318325
rewriting split commit-graph as a single file (`--split=replace`) creates a
319326
single layer with corrected commit dates.
320327

321-
## Deleting graph-{hash} files
328+
Deleting graph-\{hash\} files
329+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
322330

323331
After a new tip file is written, some `graph-{hash}` files may no longer
324332
be part of a chain. It is important to remove these files from disk, eventually.
@@ -333,7 +341,8 @@ files whose modified times are older than a given expiry window. This window
333341
defaults to zero, but can be changed using command-line arguments or a config
334342
setting.
335343

336-
## Chains across multiple object directories
344+
Chains across multiple object directories
345+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
337346

338347
In a repo with alternates, we look for the `commit-graph-chain` file starting
339348
in the local object directory and then in each alternate. The first file that

Documentation/technical/large-object-promisors.adoc

Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ a new object representation for large blobs as discussed in:
3434

3535
https://lore.kernel.org/git/[email protected]/
3636

37-
0) Non goals
38-
------------
37+
Non goals
38+
---------
3939

4040
- We will not discuss those client side improvements here, as they
4141
would require changes in different parts of Git than this effort.
@@ -90,8 +90,8 @@ later in this document:
9090
even more to host content with larger blobs or more large blobs
9191
than currently.
9292

93-
I) Issues with the current situation
94-
------------------------------------
93+
I Issues with the current situation
94+
-----------------------------------
9595
9696
- Some statistics made on GitLab repos have shown that more than 75%
9797
of the disk space is used by blobs that are larger than 1MB and
@@ -138,8 +138,8 @@ I) Issues with the current situation
138138
complaining that these tools require significant effort to set up,
139139
learn and use correctly.
140140
141-
II) Main features of the "Large Object Promisors" solution
142-
----------------------------------------------------------
141+
II Main features of the "Large Object Promisors" solution
142+
---------------------------------------------------------
143143

144144
The main features below should give a rough overview of how the
145145
solution may work. Details about needed elements can be found in
@@ -166,7 +166,7 @@ format. They should be used along with main remotes that contain the
166166
other objects.
167167

168168
Note 1
169-
++++++
169+
^^^^^^
170170

171171
To clarify, a LOP is a normal promisor remote, except that:
172172

@@ -178,21 +178,21 @@ To clarify, a LOP is a normal promisor remote, except that:
178178
itself.
179179

180180
Note 2
181-
++++++
181+
^^^^^^
182182

183183
Git already makes it possible for a main remote to also be a promisor
184184
remote storing both regular objects and large blobs for a client that
185185
clones from it with a filter on blob size. But here we explicitly want
186186
to avoid that.
187187

188188
Rationale
189-
+++++++++
189+
^^^^^^^^^
190190

191191
LOPs aim to be good at handling large blobs while main remotes are
192192
already good at handling other objects.
193193

194194
Implementation
195-
++++++++++++++
195+
^^^^^^^^^^^^^^
196196

197197
Git already has support for multiple promisor remotes, see
198198
link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation].
@@ -213,19 +213,19 @@ remote helper (see linkgit:gitremote-helpers[7]) which makes the
213213
underlying object storage appear like a remote to Git.
214214

215215
Note
216-
++++
216+
^^^^
217217

218218
A LOP can be a promisor remote accessed using a remote helper by
219219
both some clients and the main remote.
220220

221221
Rationale
222-
+++++++++
222+
^^^^^^^^^
223223

224224
This looks like the simplest way to create LOPs that can cheaply
225225
handle many large blobs.
226226

227227
Implementation
228-
++++++++++++++
228+
^^^^^^^^^^^^^^
229229

230230
Remote helpers are quite easy to write as shell scripts, but it might
231231
be more efficient and maintainable to write them using other languages
@@ -247,7 +247,7 @@ The underlying object storage that a LOP uses could also serve as
247247
storage for large files handled by Git LFS.
248248

249249
Rationale
250-
+++++++++
250+
^^^^^^^^^
251251

252252
This would simplify the server side if it wants to both use a LOP and
253253
act as a Git LFS server.
@@ -259,7 +259,7 @@ On the server side, a main remote should have a way to offload to a
259259
LOP all its blobs with a size over a configurable threshold.
260260

261261
Rationale
262-
+++++++++
262+
^^^^^^^^^
263263

264264
This makes it easy to set things up and to clean things up. For
265265
example, an admin could use this to manually convert a repo not using
@@ -268,7 +268,7 @@ some users would sometimes push large blobs, a cron job could use this
268268
to regularly make sure the large blobs are moved to the LOP.
269269

270270
Implementation
271-
++++++++++++++
271+
^^^^^^^^^^^^^^
272272

273273
Using something based on `git repack --filter=...` to separate the
274274
blobs we want to offload from the other Git objects could be a good
@@ -284,13 +284,13 @@ should have ways to prevent oversize blobs to be fetched, and also
284284
perhaps pushed, into it.
285285

286286
Rationale
287-
+++++++++
287+
^^^^^^^^^
288288

289289
A main remote containing many oversize blobs would defeat the purpose
290290
of LOPs.
291291

292292
Implementation
293-
++++++++++++++
293+
^^^^^^^^^^^^^^
294294

295295
The way to offload to a LOP discussed in 4) above can be used to
296296
regularly offload oversize blobs. About preventing oversize blobs from
@@ -326,18 +326,18 @@ large blobs directly from the LOP and the server would not need to
326326
fetch those blobs from the LOP to be able to serve the client.
327327

328328
Note
329-
++++
329+
^^^^
330330

331331
For fetches instead of clones, a protocol negotiation might not always
332332
happen, see the "What about fetches?" FAQ entry below for details.
333333

334334
Rationale
335-
+++++++++
335+
^^^^^^^^^
336336

337337
Security, configurability and efficiency of setting things up.
338338

339339
Implementation
340-
++++++++++++++
340+
^^^^^^^^^^^^^^
341341

342342
A "promisor-remote" protocol v2 capability looks like a good way to
343343
implement this. The way the client and server use this capability
@@ -356,7 +356,7 @@ the client should be able to offload some large blobs it has fetched,
356356
but might not need anymore, to the LOP.
357357

358358
Note
359-
++++
359+
^^^^
360360

361361
It might depend on the context if it should be OK or not for clients
362362
to offload large blobs they have created, instead of fetched, directly
@@ -367,13 +367,13 @@ This should be discussed and refined when we get closer to
367367
implementing this feature.
368368

369369
Rationale
370-
+++++++++
370+
^^^^^^^^^
371371

372372
On the client, the easiest way to deal with unneeded large blobs is to
373373
offload them.
374374

375375
Implementation
376-
++++++++++++++
376+
^^^^^^^^^^^^^^
377377

378378
This is very similar to what 4) above is about, except on the client
379379
side instead of the server side. So a good solution to 4) could likely
@@ -385,8 +385,8 @@ when cloning (see 6) above). Also if the large blobs were fetched from
385385
a LOP, it is likely, and can easily be confirmed, that the LOP still
386386
has them, so that they can just be removed from the client.
387387

388-
III) Benefits of using LOPs
389-
---------------------------
388+
III Benefits of using LOPs
389+
--------------------------
390390
391391
Many benefits are related to the issues discussed in "I) Issues with
392392
the current situation" above:
@@ -406,8 +406,8 @@ the current situation" above:
406406
407407
- Reduced storage needs on the client side.
408408
409-
IV) FAQ
410-
-------
409+
IV FAQ
410+
------
411411

412412
What about using multiple LOPs on the server and client side?
413413
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -533,7 +533,7 @@ some objects it already knows about but doesn't have because they are
533533
on a promisor remote.
534534

535535
Regular fetch
536-
+++++++++++++
536+
^^^^^^^^^^^^^
537537

538538
In a regular fetch, the client will contact the main remote and a
539539
protocol negotiation will happen between them. It's a good thing that
@@ -551,7 +551,7 @@ new fetch will happen in the same way as the previous clone or fetch,
551551
using, or not using, the same LOP(s) as last time.
552552

553553
"Backfill" or "lazy" fetch
554-
++++++++++++++++++++++++++
554+
^^^^^^^^^^^^^^^^^^^^^^^^^^
555555

556556
When there is a backfill fetch, the client doesn't necessarily contact
557557
the main remote first. It will try to fetch from its promisor remotes
@@ -576,8 +576,8 @@ from the client when it fetches from them. The client could get the
576576
token when performing a protocol negotiation with the main remote (see
577577
section II.6 above).
578578

579-
V) Future improvements
580-
----------------------
579+
V Future improvements
580+
---------------------
581581
582582
It is expected that at the beginning using LOPs will be mostly worth
583583
it either in a corporate context where the Git version that clients

Documentation/technical/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ articles = [
1313
'commit-graph.adoc',
1414
'directory-rename-detection.adoc',
1515
'hash-function-transition.adoc',
16+
'large-object-promisors.adoc',
1617
'long-running-process-protocol.adoc',
1718
'multi-pack-index.adoc',
1819
'packfile-uri.adoc',

0 commit comments

Comments
 (0)