Skip to content

Commit 4dd8469

Browse files
committed
Merge branch 'master' of github.com:git/git
* 'master' of github.com:git/git: (63 commits) Git 2.31-rc1 Hopefully the last batch before -rc1 Revert "commit-graph: when incompatible with graphs, indicate why" read-cache: make the index write buffer size 128K dir: fix malloc of root untracked_cache_dir commit-graph.c: display correct number of chunks when writing doc/reftable: document how to handle windows fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods ...
2 parents 3ed77c4 + f01623b commit 4dd8469

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+1948
-924
lines changed

Documentation/RelNotes/2.31.0.txt

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,31 @@ Performance, Internal Implementation, Development Support etc.
197197
* The code to implement "git merge-base --independent" was poorly
198198
done and was kept from the very beginning of the feature.
199199

200+
* Preliminary changes to fsmonitor integration.
201+
202+
* Performance optimization work on the rename detection continues.
203+
204+
* The common code to deal with "chunked file format" that is shared
205+
by the multi-pack-index and commit-graph files have been factored
206+
out, to help codepaths for both filetypes to become more robust.
207+
208+
* The approach to "fsck" the incoming objects in "index-pack" is
209+
attractive for performance reasons (we have them already in core,
210+
inflated and ready to be inspected), but fundamentally cannot be
211+
applied fully when we receive more than one pack stream, as a tree
212+
object in one pack may refer to a blob object in another pack as
213+
".gitmodules", when we want to inspect blobs that are used as
214+
".gitmodules" file, for example. Teach "index-pack" to emit
215+
objects that must be inspected later and check them in the calling
216+
"fetch-pack" process.
217+
218+
* The logic to handle "trailer" related placeholders in the
219+
"--format=" mechanisms in the "log" family and "for-each-ref"
220+
family is getting unified.
221+
222+
* Raise the buffer size used when writing the index file out from
223+
(obviously too small) 8kB to (clearly sufficiently large) 128kB.
224+
200225

201226
Fixes since v2.30
202227
-----------------
@@ -318,6 +343,12 @@ Fixes since v2.30
318343
corrected.
319344
(merge 20e416409f jc/push-delete-nothing later to maint).
320345

346+
* Test script modernization.
347+
(merge 488acf15df sv/t7001-modernize later to maint).
348+
349+
* An under-allocation for the untracked cache data has been corrected.
350+
(merge 6347d649bc jh/untracked-cache-fix later to maint).
351+
321352
* Other code cleanup, docfix, build fix, etc.
322353
(merge e3f5da7e60 sg/t7800-difftool-robustify later to maint).
323354
(merge 9d336655ba js/doc-proto-v2-response-end later to maint).

Documentation/git-for-each-ref.txt

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -260,11 +260,9 @@ contents:lines=N::
260260
The first `N` lines of the message.
261261

262262
Additionally, the trailers as interpreted by linkgit:git-interpret-trailers[1]
263-
are obtained as `trailers` (or by using the historical alias
264-
`contents:trailers`). Non-trailer lines from the trailer block can be omitted
265-
with `trailers:only`. Whitespace-continuations can be removed from trailers so
266-
that each trailer appears on a line by itself with its full content with
267-
`trailers:unfold`. Both can be used together as `trailers:unfold,only`.
263+
are obtained as `trailers[:options]` (or by using the historical alias
264+
`contents:trailers[:options]`). For valid [:option] values see `trailers`
265+
section of linkgit:git-log[1].
268266

269267
For sorting purposes, fields with numeric values sort in numeric order
270268
(`objectsize`, `authordate`, `committerdate`, `creatordate`, `taggerdate`).

Documentation/git-http-fetch.txt

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,17 @@ commit-id::
4141
<commit-id>['\t'<filename-as-in--w>]
4242

4343
--packfile=<hash>::
44-
Instead of a commit id on the command line (which is not expected in
44+
For internal use only. Instead of a commit id on the command
45+
line (which is not expected in
4546
this case), 'git http-fetch' fetches the packfile directly at the given
4647
URL and uses index-pack to generate corresponding .idx and .keep files.
4748
The hash is used to determine the name of the temporary file and is
48-
arbitrary. The output of index-pack is printed to stdout.
49+
arbitrary. The output of index-pack is printed to stdout. Requires
50+
--index-pack-args.
51+
52+
--index-pack-args=<args>::
53+
For internal use only. The command to run on the contents of the
54+
downloaded pack. Arguments are URL-encoded separated by spaces.
4955

5056
--recover::
5157
Verify that everything reachable from target is fetched. Used after

Documentation/git-index-pack.txt

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,12 @@ OPTIONS
8686
Die if the pack contains broken links. For internal use only.
8787

8888
--fsck-objects::
89-
Die if the pack contains broken objects. For internal use only.
89+
For internal use only.
90+
+
91+
Die if the pack contains broken objects. If the pack contains a tree
92+
pointing to a .gitmodules blob that does not exist, prints the hash of
93+
that blob (for the caller to check) after the hash that goes into the
94+
name of the pack/idx file (see "Notes").
9095

9196
--threads=<n>::
9297
Specifies the number of threads to spawn when resolving

Documentation/gitdiffcore.txt

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,26 @@ a similarity score different from the default of 50% by giving a
169169
number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
170170
8/10 = 80%).
171171

172+
Note that when rename detection is on but both copy and break
173+
detection are off, rename detection adds a preliminary step that first
174+
checks if files are moved across directories while keeping their
175+
filename the same. If there is a file added to a directory whose
176+
contents is sufficiently similar to a file with the same name that got
177+
deleted from a different directory, it will mark them as renames and
178+
exclude them from the later quadratic step (the one that pairwise
179+
compares all unmatched files to find the "best" matches, determined by
180+
the highest content similarity). So, for example, if a deleted
181+
docs/ext.txt and an added docs/config/ext.txt are similar enough, they
182+
will be marked as a rename and prevent an added docs/ext.md that may
183+
be even more similar to the deleted docs/ext.txt from being considered
184+
as the rename destination in the later step. For this reason, the
185+
preliminary "match same filename" step uses a bit higher threshold to
186+
mark a file pair as a rename and stop considering other candidates for
187+
better matches. At most, one comparison is done per file in this
188+
preliminary pass; so if there are several remaining ext.txt files
189+
throughout the directory hierarchy after exact rename detection, this
190+
preliminary step will be skipped for those files.
191+
172192
Note. When the "-C" option is used with `--find-copies-harder`
173193
option, 'git diff-{asterisk}' commands feed unmodified filepairs to
174194
diffcore mechanism as well as modified ones. This lets the copy
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
Chunk-based file formats
2+
========================
3+
4+
Some file formats in Git use a common concept of "chunks" to describe
5+
sections of the file. This allows structured access to a large file by
6+
scanning a small "table of contents" for the remaining data. This common
7+
format is used by the `commit-graph` and `multi-pack-index` files. See
8+
link:technical/pack-format.html[the `multi-pack-index` format] and
9+
link:technical/commit-graph-format.html[the `commit-graph` format] for
10+
how they use the chunks to describe structured data.
11+
12+
A chunk-based file format begins with some header information custom to
13+
that format. That header should include enough information to identify
14+
the file type, format version, and number of chunks in the file. From this
15+
information, that file can determine the start of the chunk-based region.
16+
17+
The chunk-based region starts with a table of contents describing where
18+
each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
19+
where C is the number of chunks. Consider the following table:
20+
21+
| Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
22+
|--------------------|------------------------|
23+
| ID[0] | OFFSET[0] |
24+
| ... | ... |
25+
| ID[C] | OFFSET[C] |
26+
| 0x0000 | OFFSET[C+1] |
27+
28+
Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
29+
Each integer is stored in network-byte order.
30+
31+
The chunk identifier `ID[i]` is a label for the data stored within this
32+
fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
33+
size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
34+
and `OFFSET[i]`. This requires that the chunk data appears contiguously
35+
in the same order as the table of contents.
36+
37+
The final entry in the table of contents must be four zero bytes. This
38+
confirms that the table of contents is ending and provides the offset for
39+
the end of the chunk-based data.
40+
41+
Note: The chunk-based format expects that the file contains _at least_ a
42+
trailing hash after `OFFSET[C+1]`.
43+
44+
Functions for working with chunk-based file formats are declared in
45+
`chunk-format.h`. Using these methods provide extra checks that assist
46+
developers when creating new file formats.
47+
48+
Writing chunk-based file formats
49+
--------------------------------
50+
51+
To write a chunk-based file format, create a `struct chunkfile` by
52+
calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
53+
caller is responsible for opening the `hashfile` and writing header
54+
information so the file format is identifiable before the chunk-based
55+
format begins.
56+
57+
Then, call `add_chunk()` for each chunk that is intended for write. This
58+
populates the `chunkfile` with information about the order and size of
59+
each chunk to write. Provide a `chunk_write_fn` function pointer to
60+
perform the write of the chunk data upon request.
61+
62+
Call `write_chunkfile()` to write the table of contents to the `hashfile`
63+
followed by each of the chunks. This will verify that each chunk wrote
64+
the expected amount of data so the table of contents is correct.
65+
66+
Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
67+
caller is responsible for finalizing the `hashfile` by writing the trailing
68+
hash and closing the file.
69+
70+
Reading chunk-based file formats
71+
--------------------------------
72+
73+
To read a chunk-based file format, the file must be opened as a
74+
memory-mapped region. The chunk-format API expects that the entire file
75+
is mapped as a contiguous memory region.
76+
77+
Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
78+
79+
After reading the header information from the beginning of the file,
80+
including the chunk count, call `read_table_of_contents()` to populate
81+
the `struct chunkfile` with the list of chunks, their offsets, and their
82+
sizes.
83+
84+
Extract the data information for each chunk using `pair_chunk()` or
85+
`read_chunk()`:
86+
87+
* `pair_chunk()` assigns a given pointer with the location inside the
88+
memory-mapped file corresponding to that chunk's offset. If the chunk
89+
does not exist, then the pointer is not modified.
90+
91+
* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
92+
with the appropriate initial pointer and size information. The function
93+
is not called if the chunk does not exist. Use this method to read chunks
94+
if you need to perform immediate parsing or if you need to execute logic
95+
based on the size of the chunk.
96+
97+
After calling these methods, call `free_chunkfile()` to clear the
98+
`struct chunkfile` data. This will not close the memory-mapped region.
99+
Callers are expected to own that data for the timeframe the pointers into
100+
the region are needed.
101+
102+
Examples
103+
--------
104+
105+
These file formats use the chunk-format API, and can be used as examples
106+
for future formats:
107+
108+
* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
109+
in `commit-graph.c` for how the chunk-format API is used to write and
110+
parse the commit-graph file format documented in
111+
link:technical/commit-graph-format.html[the commit-graph file format].
112+
113+
* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
114+
in `midx.c` for how the chunk-format API is used to write and
115+
parse the multi-pack-index file format documented in
116+
link:technical/pack-format.html[the multi-pack-index file format].

Documentation/technical/commit-graph-format.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ CHUNK LOOKUP:
6161
the length using the next chunk position if necessary.) Each chunk
6262
ID appears at most once.
6363

64+
The CHUNK LOOKUP matches the table of contents from
65+
link:technical/chunk-format.html[the chunk-based file format].
66+
6467
The remaining data in the body is described one chunk at a time, and
6568
these chunks may be given in any order. Chunks are required unless
6669
otherwise specified.

Documentation/technical/pack-format.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,9 @@ CHUNK LOOKUP:
336336
(Chunks are provided in file-order, so you can infer the length
337337
using the next chunk position if necessary.)
338338

339+
The CHUNK LOOKUP matches the table of contents from
340+
link:technical/chunk-format.html[the chunk-based file format].
341+
339342
The remaining data in the body is described one chunk at a time, and
340343
these chunks may be given in any order. Chunks are required unless
341344
otherwise specified.

Documentation/technical/reftable.txt

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -872,17 +872,11 @@ A repository must set its `$GIT_DIR/config` to configure reftable:
872872
Layout
873873
^^^^^^
874874

875-
A collection of reftable files are stored in the `$GIT_DIR/reftable/`
876-
directory:
877-
878-
....
879-
00000001-00000001.log
880-
00000002-00000002.ref
881-
00000003-00000003.ref
882-
....
883-
884-
where reftable files are named by a unique name such as produced by the
885-
function `${min_update_index}-${max_update_index}.ref`.
875+
A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
876+
Their names should have a random element, such that each filename is globally
877+
unique; this helps avoid spurious failures on Windows, where open files cannot
878+
be removed or overwritten. It suggested to use
879+
`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
886880

887881
Log-only files use the `.log` extension, while ref-only and mixed ref
888882
and log files use `.ref`. extension.
@@ -893,9 +887,9 @@ current files, one per line, in order, from oldest (base) to newest
893887

894888
....
895889
$ cat .git/reftable/tables.list
896-
00000001-00000001.log
897-
00000002-00000002.ref
898-
00000003-00000003.ref
890+
00000001-00000001-RANDOM1.log
891+
00000002-00000002-RANDOM2.ref
892+
00000003-00000003-RANDOM3.ref
899893
....
900894

901895
Readers must read `$GIT_DIR/reftable/tables.list` to determine which
@@ -940,7 +934,7 @@ new reftable and atomically appending it to the stack:
940934
3. Select `update_index` to be most recent file's
941935
`max_update_index + 1`.
942936
4. Prepare temp reftable `tmp_XXXXXX`, including log entries.
943-
5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
937+
5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
944938
6. Copy `tables.list` to `tables.list.lock`, appending file from (5).
945939
7. Rename `tables.list.lock` to `tables.list`.
946940

@@ -993,7 +987,7 @@ prevents other processes from trying to compact these files.
993987
should always be the case, assuming that other processes are adhering to
994988
the locking protocol.
995989
7. Rename `${min_update_index}-${max_update_index}_XXXXXX` to
996-
`${min_update_index}-${max_update_index}.ref`.
990+
`${min_update_index}-${max_update_index}-${random}.ref`.
997991
8. Write the new stack to `tables.list.lock`, replacing `B` and `C`
998992
with the file from (4).
999993
9. Rename `tables.list.lock` to `tables.list`.
@@ -1005,6 +999,22 @@ This strategy permits compactions to proceed independently of updates.
1005999
Each reftable (compacted or not) is uniquely identified by its name, so
10061000
open reftables can be cached by their name.
10071001

1002+
Windows
1003+
^^^^^^^
1004+
1005+
On windows, and other systems that do not allow deleting or renaming to open
1006+
files, compaction may succeed, but other readers may prevent obsolete tables
1007+
from being deleted.
1008+
1009+
On these platforms, the following strategy can be followed: on closing a
1010+
reftable stack, reload `tables.list`, and delete any tables no longer mentioned
1011+
in `tables.list`.
1012+
1013+
Irregular program exit may still leave about unused files. In this case, a
1014+
cleanup operation can read `tables.list`, note its modification timestamp, and
1015+
delete any unreferenced `*.ref` files that are older.
1016+
1017+
10081018
Alternatives considered
10091019
~~~~~~~~~~~~~~~~~~~~~~~
10101020

GIT-VERSION-GEN

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#!/bin/sh
22

33
GVF=GIT-VERSION-FILE
4-
DEF_VER=v2.31.0-rc0
4+
DEF_VER=v2.31.0-rc1
55

66
LF='
77
'

0 commit comments

Comments
 (0)