Skip to content

Commit 479a687

Browse files
committed
Merge branch 'bc/sha1-256-interop-01' into jch
The beginning of SHA1-SHA256 interoperability work. * bc/sha1-256-interop-01: t1010: use BROKEN_OBJECTS prerequisite t: allow specifying compatibility hash fsck: consider gpgsig headers expected in tags rev-parse: allow printing compatibility hash docs: add documentation for loose objects docs: improve ambiguous areas of pack format documentation docs: reflect actual double signature for tags docs: update offset order for pack index v3 docs: update pack index v3 format
2 parents c7be885 + 3d50ab9 commit 479a687

15 files changed

+256
-34
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ MAN5_TXT += gitformat-bundle.adoc
3434
MAN5_TXT += gitformat-chunk.adoc
3535
MAN5_TXT += gitformat-commit-graph.adoc
3636
MAN5_TXT += gitformat-index.adoc
37+
MAN5_TXT += gitformat-loose.adoc
3738
MAN5_TXT += gitformat-pack.adoc
3839
MAN5_TXT += gitformat-signature.adoc
3940
MAN5_TXT += githooks.adoc

Documentation/fsck-msgids.adoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
`badFilemode`::
1111
(INFO) A tree contains a bad filemode entry.
1212

13+
`badGpgsig`::
14+
(ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
15+
16+
`badHeaderContinuation`::
17+
(ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
18+
1319
`badName`::
1420
(ERROR) An author/committer name is empty.
1521

Documentation/git-rev-parse.adoc

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
324324
path of the current directory relative to the top-level
325325
directory.
326326

327-
--show-object-format[=(storage|input|output)]::
328-
Show the object format (hash algorithm) used for the repository
329-
for storage inside the `.git` directory, input, or output. For
330-
input, multiple algorithms may be printed, space-separated.
331-
If not specified, the default is "storage".
327+
--show-object-format[=(storage|input|output|compat)]::
328+
Show the object format (hash algorithm) used for the repository for storage
329+
inside the `.git` directory, input, output, or compatibility. For input,
330+
multiple algorithms may be printed, space-separated. If `compat` is
331+
requested and no compatibility algorithm is enabled, prints an empty line. If
332+
not specified, the default is "storage".
332333

333334
--show-ref-format::
334335
Show the reference storage format used for the repository.

Documentation/gitformat-loose.adoc

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
gitformat-loose(5)
2+
==================
3+
4+
NAME
5+
----
6+
gitformat-loose - Git loose object format
7+
8+
9+
SYNOPSIS
10+
--------
11+
[verse]
12+
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
13+
14+
DESCRIPTION
15+
-----------
16+
17+
Loose objects are how Git stores individual objects, where every object is
18+
written as a separate file.
19+
20+
Over the lifetime of a repository, objects are usually written as loose objects
21+
initially. Eventually, these loose objects will be compacted into packfiles
22+
via repository maintenance to improve disk space usage and speed up the lookup
23+
of these objects.
24+
25+
== Loose objects
26+
27+
Each loose object contains a prefix, followed immediately by the data of the
28+
object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
29+
`tree`, `commit`, or `tag` and `size` is the size of the data (without the
30+
prefix) as a decimal integer expressed in ASCII.
31+
32+
The entire contents, prefix and data concatenated, is then compressed with zlib
33+
and the compressed data is stored in the file. The object ID of the object is
34+
the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
35+
36+
The file for the loose object is stored under the `objects` directory, with the
37+
first two hex characters of the object ID being the directory and the remaining
38+
characters being the file name. This is done to shard the data and avoid too
39+
many files being in one directory, since some file systems perform poorly with
40+
many items in a directory.
41+
42+
As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
43+
and, in a SHA-256 repository, would have the object ID
44+
`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
45+
stored under
46+
`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
47+
48+
Similarly, a blob containing the contents `abc` would have the uncompressed
49+
data of `blob 3\0abc`.
50+
51+
GIT
52+
---
53+
Part of the linkgit:git[1] suite

Documentation/gitformat-pack.adoc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
3232
and object IDs (object names) mentioned below are all computed using SHA-1.
3333
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
3434
35+
CRC32 checksums are always computed over the entire packed object, including
36+
the header (n-byte type and length); the base object name or offset, if any;
37+
and the entire compressed object. The CRC32 algorithm used is that of zlib.
38+
3539
== pack-*.pack files have the following format:
3640

3741
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,15 @@ Valid object types are:
8084

8185
Type 5 is reserved for future expansion. Type 0 is invalid.
8286

87+
=== Object encoding
88+
89+
Unlike loose objects, packed objects do not have a prefix containing the type,
90+
size, and a NUL byte. These are not necessary because they can be determined by
91+
the n-byte type and length that prefixes the data and so they are omitted from
92+
the compressed and deltified data.
93+
94+
The computation of the object ID still uses this prefix, however.
95+
8396
=== Size encoding
8497

8598
This document uses the following "size encoding" of non-negative
@@ -92,6 +105,11 @@ values are more significant.
92105
This size encoding should not be confused with the "offset encoding",
93106
which is also used in this document.
94107

108+
When encoding the size of an undeltified object in a pack, the size is that of
109+
the uncompressed raw object. For deltified objects, it is the size of the
110+
uncompressed delta. The base object name or offset is not included in the size
111+
computation.
112+
95113
=== Deltified representation
96114

97115
Conceptually there are only four object types: commit, tree, tag and

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ manpages = {
173173
'gitformat-chunk.adoc' : 5,
174174
'gitformat-commit-graph.adoc' : 5,
175175
'gitformat-index.adoc' : 5,
176+
'gitformat-loose.adoc' : 5,
176177
'gitformat-pack.adoc' : 5,
177178
'gitformat-signature.adoc' : 5,
178179
'githooks.adoc' : 5,

Documentation/technical/hash-function-transition.adoc

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -227,9 +227,9 @@ network byte order):
227227
** 4-byte length in bytes of shortened object names. This is the
228228
shortest possible length needed to make names in the shortened
229229
object name table unambiguous.
230-
** 4-byte integer, recording where tables relating to this format
230+
** 8-byte integer, recording where tables relating to this format
231231
are stored in this index file, as an offset from the beginning.
232-
* 4-byte offset to the trailer from the beginning of this file.
232+
* 8-byte offset to the trailer from the beginning of this file.
233233
* Zero or more additional key/value pairs (4-byte key, 4-byte
234234
value). Only one key is supported: 'PSRC'. See the "Loose objects
235235
and unreachable objects" section for supported values and how this
@@ -260,12 +260,10 @@ network byte order):
260260
compressed data to be copied directly from pack to pack during
261261
repacking without undetected data corruption.
262262

263-
* A table of 4-byte offset values. For an object in the table of
264-
sorted shortened object names, the value at the corresponding
265-
index in this table indicates where that object can be found in
266-
the pack file. These are usually 31-bit pack file offsets, but
267-
large offsets are encoded as an index into the next table with the
268-
most significant bit set.
263+
* A table of 4-byte offset values. The index of this table in pack order
264+
indicates where that object can be found in the pack file. These are
265+
usually 31-bit pack file offsets, but large offsets are encoded as
266+
an index into the next table with the most significant bit set.
269267

270268
* A table of 8-byte offset entries (empty for pack files less than
271269
2 GiB). Pack files are organized with heavily used objects toward
@@ -276,10 +274,14 @@ network byte order):
276274
up to and not including the table of CRC32 values.
277275
- Zero or more NUL bytes.
278276
- The trailer consists of the following:
279-
* A copy of the 20-byte SHA-256 checksum at the end of the
277+
* A copy of the full main hash checksum at the end of the
280278
corresponding packfile.
281279

282-
* 20-byte SHA-256 checksum of all of the above.
280+
* Full main hash checksum of all of the above.
281+
282+
The "full main hash" is a full-length hash of the main (not compatibility)
283+
algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
284+
a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
283285

284286
Loose object index
285287
~~~~~~~~~~~~~~~~~~
@@ -427,17 +429,19 @@ ordinary unsigned commit.
427429

428430
Signed Tags
429431
~~~~~~~~~~~
430-
We add a new field "gpgsig-sha256" to the tag object format to allow
431-
signing tags without relying on SHA-1. Its signed payload is the
432-
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
433-
SIGNATURE-----" delimited in-body signature removed.
434-
435-
This means tags can be signed
436-
437-
1. using SHA-1 only, as in existing signed tag objects
438-
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
439-
signature.
440-
3. using only SHA-256, by only using the gpgsig-sha256 field.
432+
We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
433+
allow signing tags in both formats. The in-body signature is used for the
434+
signature in the current hash algorithm and the header is used for the
435+
signature in the other algorithm. Thus, a dual-signature tag will contain both
436+
an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
437+
object or both an in-body signature and a gpgsig header for the SHA-256 format
438+
of and object.
439+
440+
The signed payload of the tag is the content of the tag in the current
441+
algorithm with both its gpgsig and gpgsig-sha256 fields and
442+
"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
443+
444+
This means tags can be signed using one or both algorithms.
441445

442446
Mergetag embedding
443447
~~~~~~~~~~~~~~~~~~

builtin/rev-parse.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1107,11 +1107,20 @@ int cmd_rev_parse(int argc,
11071107
const char *val = arg ? arg : "storage";
11081108

11091109
if (strcmp(val, "storage") &&
1110+
strcmp(val, "compat") &&
11101111
strcmp(val, "input") &&
11111112
strcmp(val, "output"))
11121113
die(_("unknown mode for --show-object-format: %s"),
11131114
arg);
1114-
puts(the_hash_algo->name);
1115+
1116+
if (!strcmp(val, "compat")) {
1117+
if (the_repository->compat_hash_algo)
1118+
puts(the_repository->compat_hash_algo->name);
1119+
else
1120+
putchar('\n');
1121+
} else {
1122+
puts(the_hash_algo->name);
1123+
}
11151124
continue;
11161125
}
11171126
if (!strcmp(arg, "--show-ref-format")) {

fsck.c

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
10671067
else
10681068
ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
10691069

1070+
if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
1071+
eol = memchr(buffer, '\n', buffer_end - buffer);
1072+
if (!eol) {
1073+
ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
1074+
goto done;
1075+
}
1076+
buffer = eol + 1;
1077+
1078+
while (buffer < buffer_end && starts_with(buffer, " ")) {
1079+
eol = memchr(buffer, '\n', buffer_end - buffer);
1080+
if (!eol) {
1081+
ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
1082+
goto done;
1083+
}
1084+
buffer = eol + 1;
1085+
}
1086+
}
1087+
10701088
if (buffer < buffer_end && !starts_with(buffer, "\n")) {
10711089
/*
10721090
* The verify_headers() check will allow

fsck.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,11 @@ enum fsck_msg_type {
2525
FUNC(NUL_IN_HEADER, FATAL) \
2626
FUNC(UNTERMINATED_HEADER, FATAL) \
2727
/* errors */ \
28+
FUNC(BAD_HEADER_CONTINUATION, ERROR) \
2829
FUNC(BAD_DATE, ERROR) \
2930
FUNC(BAD_DATE_OVERFLOW, ERROR) \
3031
FUNC(BAD_EMAIL, ERROR) \
32+
FUNC(BAD_GPGSIG, ERROR) \
3133
FUNC(BAD_NAME, ERROR) \
3234
FUNC(BAD_OBJECT_SHA1, ERROR) \
3335
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \

0 commit comments

Comments
 (0)