Skip to content

Commit ee9681d

Browse files
hanwengitster
authored andcommitted
reftable: define version 2 of the spec to accomodate SHA256
Version appends a hash ID to the file header, making it slightly larger. This commit also changes "SHA-1" into "object ID" in many places. Signed-off-by: Han-Wen Nienhuys <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 10f007c commit ee9681d

File tree

1 file changed

+45
-37
lines changed

1 file changed

+45
-37
lines changed

Documentation/technical/reftable.txt

Lines changed: 45 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Objectives
3030

3131
* Near constant time lookup for any single reference, even when the
3232
repository is cold and not in process or kernel cache.
33-
* Near constant time verification if a SHA-1 is referred to by at least
33+
* Near constant time verification if an object name is referred to by at least
3434
one reference (for allow-tip-sha1-in-want).
3535
* Efficient enumeration of an entire namespace, such as `refs/tags/`.
3636
* Support atomic push with `O(size_of_update)` operations.
@@ -210,8 +210,8 @@ and non-aligned files.
210210
Very small files (e.g. a single ref block) may omit `padding` and the ref
211211
index to reduce total file size.
212212

213-
Header
214-
^^^^^^
213+
Header (version 1)
214+
^^^^^^^^^^^^^^^^^^
215215

216216
A 24-byte header appears at the beginning of the file:
217217

@@ -232,6 +232,26 @@ used in a stack for link:#Update-transactions[transactions], these
232232
fields can order the files such that the prior file's
233233
`max_update_index + 1` is the next file's `min_update_index`.
234234

235+
Header (version 2)
236+
^^^^^^^^^^^^^^^^^^
237+
238+
A 28-byte header appears at the beginning of the file:
239+
240+
....
241+
'REFT'
242+
uint8( version_number = 2 )
243+
uint24( block_size )
244+
uint64( min_update_index )
245+
uint64( max_update_index )
246+
uint32( hash_id )
247+
....
248+
249+
The header is identical to `version_number=1`, with the 4-byte hash ID
250+
("sha1" for SHA1 and "s256" for SHA-256) append to the header.
251+
252+
For maximum backward compatibility, it is recommended to use version 1 when
253+
writing SHA1 reftables.
254+
235255
First ref block
236256
^^^^^^^^^^^^^^^
237257

@@ -319,8 +339,8 @@ The `value` follows. Its format is determined by `value_type`, one of
319339
the following:
320340

321341
* `0x0`: deletion; no value data (see transactions, below)
322-
* `0x1`: one 20-byte object name; value of the ref
323-
* `0x2`: two 20-byte object names; value of the ref, peeled target
342+
* `0x1`: one object name; value of the ref
343+
* `0x2`: two object names; value of the ref, peeled target
324344
* `0x3`: symbolic reference: `varint( target_len ) target`
325345

326346
Symbolic references use `0x3`, followed by the complete name of the
@@ -421,9 +441,9 @@ Obj block format
421441
^^^^^^^^^^^^^^^^
422442

423443
Object blocks are optional. Writers may choose to omit object blocks,
424-
especially if readers will not use the SHA-1 to ref mapping.
444+
especially if readers will not use the object name to ref mapping.
425445

426-
Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
446+
Object blocks use unique, abbreviated 2-32 object name keys, mapping to
427447
ref blocks containing references pointing to that object directly, or as
428448
the peeled value of an annotated tag. Like ref blocks, object blocks use
429449
the file's standard block size. The abbrevation length is available in
@@ -432,7 +452,7 @@ the footer as `obj_id_len`.
432452
To save space in small files, object blocks may be omitted if the ref
433453
index is not present, as brute force search will only need to read a few
434454
ref blocks. When missing, readers should brute force a linear search of
435-
all references to lookup by SHA-1.
455+
all references to lookup by object name.
436456

437457
An object block is written as:
438458

@@ -449,9 +469,9 @@ padding?
449469
Fields are identical to ref block. Binary search using the restart table
450470
works the same as in reference blocks.
451471

452-
Because object names are abbreviated by writers to the shortest
453-
unique abbreviation within the reftable, obj key lengths are variable
454-
between 2 and 20 bytes. Readers must compare only for common prefix
472+
Because object names are abbreviated by writers to the shortest unique
473+
abbreviation within the reftable, obj key lengths have a variable length. Their
474+
length must be at least 2 bytes. Readers must compare only for common prefix
455475
match within an obj block or obj index.
456476

457477
obj record
@@ -504,9 +524,9 @@ for (j = 1; j < position_count; j++) {
504524
....
505525

506526
With a position in hand, a reader must linearly scan the ref block,
507-
starting from the first `ref_record`, testing each reference's SHA-1s
527+
starting from the first `ref_record`, testing each reference's object names
508528
(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
509-
SHA-1 within a single ref block is not supported by the reftable format.
529+
object name within a single ref block is not supported by the reftable format.
510530
Smaller block sizes reduce the number of candidates this step must
511531
consider.
512532

@@ -621,7 +641,7 @@ reflogs must treat this as a deletion.
621641
For `log_type = 0x1`, the `log_data` section follows
622642
linkgit:git-update-ref[1] logging and includes:
623643

624-
* two 20-byte SHA-1s (old id, new id)
644+
* two object names (old id, new id)
625645
* varint string of committer's name
626646
* varint string of committer's email
627647
* varint time in seconds since epoch (Jan 1, 1970)
@@ -689,14 +709,8 @@ Footer
689709
After the last block of the file, a file footer is written. It begins
690710
like the file header, but is extended with additional data.
691711

692-
A 68-byte footer appears at the end:
693-
694712
....
695-
'REFT'
696-
uint8( version_number = 1 )
697-
uint24( block_size )
698-
uint64( min_update_index )
699-
uint64( max_update_index )
713+
HEADER
700714

701715
uint64( ref_index_position )
702716
uint64( (obj_position << 5) | obj_id_len )
@@ -719,12 +733,16 @@ obj blocks.
719733
* `obj_index_position`: byte position for the start of the obj index.
720734
* `log_index_position`: byte position for the start of the log index.
721735

736+
The size of the footer is 68 bytes for version 1, and 72 bytes for
737+
version 2.
738+
722739
Reading the footer
723740
++++++++++++++++++
724741

725-
Readers must seek to `file_length - 68` to access the footer. A trusted
726-
external source (such as `stat(2)`) is necessary to obtain
727-
`file_length`. When reading the footer, readers must verify:
742+
Readers must first read the file start to determine the version
743+
number. Then they seek to `file_length - FOOTER_LENGTH` to access the
744+
footer. A trusted external source (such as `stat(2)`) is necessary to
745+
obtain `file_length`. When reading the footer, readers must verify:
728746

729747
* 4-byte magic is correct
730748
* 1-byte version number is recognized
@@ -780,11 +798,11 @@ Lightweight refs dominate
780798
^^^^^^^^^^^^^^^^^^^^^^^^^
781799

782800
The reftable format assumes the vast majority of references are single
783-
SHA-1 valued with common prefixes, such as Gerrit Code Review's
801+
object names valued with common prefixes, such as Gerrit Code Review's
784802
`refs/changes/` namespace, GitHub's `refs/pulls/` namespace, or many
785803
lightweight tags in the `refs/tags/` namespace.
786804

787-
Annotated tags storing the peeled object cost an additional 20 bytes per
805+
Annotated tags storing the peeled object cost an additional object name per
788806
reference.
789807

790808
Low overhead
@@ -817,7 +835,7 @@ Scans and lookups dominate
817835

818836
Scanning all references and lookup by name (or namespace such as
819837
`refs/heads/`) are the most common activities performed on repositories.
820-
SHA-1s are stored directly with references to optimize this use case.
838+
Object names are stored directly with references to optimize this use case.
821839

822840
Logs are infrequently read
823841
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1063,13 +1081,3 @@ impossible.
10631081

10641082
A common format that can be supported by all major Git implementations
10651083
(git-core, JGit, libgit2) is strongly preferred.
1066-
1067-
Future
1068-
~~~~~~
1069-
1070-
Longer hashes
1071-
^^^^^^^^^^^^^
1072-
1073-
Version will bump (e.g. 2) to indicate `value` uses a different object
1074-
name length other than 20. The length could be stored in an expanded file
1075-
heaer, or hardcoded as part of the version.

0 commit comments

Comments
 (0)