@@ -30,7 +30,7 @@ Objectives
30
30
31
31
* Near constant time lookup for any single reference, even when the
32
32
repository is cold and not in process or kernel cache.
33
- * Near constant time verification if a SHA-1 is referred to by at least
33
+ * Near constant time verification if an object name is referred to by at least
34
34
one reference (for allow-tip-sha1-in-want).
35
35
* Efficient enumeration of an entire namespace, such as `refs/tags/`.
36
36
* Support atomic push with `O(size_of_update)` operations.
@@ -210,8 +210,8 @@ and non-aligned files.
210
210
Very small files (e.g. a single ref block) may omit `padding` and the ref
211
211
index to reduce total file size.
212
212
213
- Header
214
- ^^^^^^
213
+ Header (version 1)
214
+ ^^^^^^^^^^^^^^^^^^
215
215
216
216
A 24-byte header appears at the beginning of the file:
217
217
@@ -232,6 +232,26 @@ used in a stack for link:#Update-transactions[transactions], these
232
232
fields can order the files such that the prior file's
233
233
`max_update_index + 1` is the next file's `min_update_index`.
234
234
235
+ Header (version 2)
236
+ ^^^^^^^^^^^^^^^^^^
237
+
238
+ A 28-byte header appears at the beginning of the file:
239
+
240
+ ....
241
+ 'REFT'
242
+ uint8( version_number = 2 )
243
+ uint24( block_size )
244
+ uint64( min_update_index )
245
+ uint64( max_update_index )
246
+ uint32( hash_id )
247
+ ....
248
+
249
+ The header is identical to `version_number=1`, with the 4-byte hash ID
250
+ ("sha1" for SHA1 and "s256" for SHA-256) append to the header.
251
+
252
+ For maximum backward compatibility, it is recommended to use version 1 when
253
+ writing SHA1 reftables.
254
+
235
255
First ref block
236
256
^^^^^^^^^^^^^^^
237
257
@@ -319,8 +339,8 @@ The `value` follows. Its format is determined by `value_type`, one of
319
339
the following:
320
340
321
341
* `0x0`: deletion; no value data (see transactions, below)
322
- * `0x1`: one 20-byte object name; value of the ref
323
- * `0x2`: two 20-byte object names; value of the ref, peeled target
342
+ * `0x1`: one object name; value of the ref
343
+ * `0x2`: two object names; value of the ref, peeled target
324
344
* `0x3`: symbolic reference: `varint( target_len ) target`
325
345
326
346
Symbolic references use `0x3`, followed by the complete name of the
@@ -421,9 +441,9 @@ Obj block format
421
441
^^^^^^^^^^^^^^^^
422
442
423
443
Object blocks are optional. Writers may choose to omit object blocks,
424
- especially if readers will not use the SHA-1 to ref mapping.
444
+ especially if readers will not use the object name to ref mapping.
425
445
426
- Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
446
+ Object blocks use unique, abbreviated 2-32 object name keys, mapping to
427
447
ref blocks containing references pointing to that object directly, or as
428
448
the peeled value of an annotated tag. Like ref blocks, object blocks use
429
449
the file's standard block size. The abbrevation length is available in
@@ -432,7 +452,7 @@ the footer as `obj_id_len`.
432
452
To save space in small files, object blocks may be omitted if the ref
433
453
index is not present, as brute force search will only need to read a few
434
454
ref blocks. When missing, readers should brute force a linear search of
435
- all references to lookup by SHA-1 .
455
+ all references to lookup by object name .
436
456
437
457
An object block is written as:
438
458
@@ -449,9 +469,9 @@ padding?
449
469
Fields are identical to ref block. Binary search using the restart table
450
470
works the same as in reference blocks.
451
471
452
- Because object names are abbreviated by writers to the shortest
453
- unique abbreviation within the reftable, obj key lengths are variable
454
- between 2 and 20 bytes. Readers must compare only for common prefix
472
+ Because object names are abbreviated by writers to the shortest unique
473
+ abbreviation within the reftable, obj key lengths have a variable length. Their
474
+ length must be at least 2 bytes. Readers must compare only for common prefix
455
475
match within an obj block or obj index.
456
476
457
477
obj record
@@ -504,9 +524,9 @@ for (j = 1; j < position_count; j++) {
504
524
....
505
525
506
526
With a position in hand, a reader must linearly scan the ref block,
507
- starting from the first `ref_record`, testing each reference's SHA-1s
527
+ starting from the first `ref_record`, testing each reference's object names
508
528
(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
509
- SHA-1 within a single ref block is not supported by the reftable format.
529
+ object name within a single ref block is not supported by the reftable format.
510
530
Smaller block sizes reduce the number of candidates this step must
511
531
consider.
512
532
@@ -621,7 +641,7 @@ reflogs must treat this as a deletion.
621
641
For `log_type = 0x1`, the `log_data` section follows
622
642
linkgit:git-update-ref[1] logging and includes:
623
643
624
- * two 20-byte SHA-1s (old id, new id)
644
+ * two object names (old id, new id)
625
645
* varint string of committer's name
626
646
* varint string of committer's email
627
647
* varint time in seconds since epoch (Jan 1, 1970)
@@ -689,14 +709,8 @@ Footer
689
709
After the last block of the file, a file footer is written. It begins
690
710
like the file header, but is extended with additional data.
691
711
692
- A 68-byte footer appears at the end:
693
-
694
712
....
695
- 'REFT'
696
- uint8( version_number = 1 )
697
- uint24( block_size )
698
- uint64( min_update_index )
699
- uint64( max_update_index )
713
+ HEADER
700
714
701
715
uint64( ref_index_position )
702
716
uint64( (obj_position << 5) | obj_id_len )
@@ -719,12 +733,16 @@ obj blocks.
719
733
* `obj_index_position`: byte position for the start of the obj index.
720
734
* `log_index_position`: byte position for the start of the log index.
721
735
736
+ The size of the footer is 68 bytes for version 1, and 72 bytes for
737
+ version 2.
738
+
722
739
Reading the footer
723
740
++++++++++++++++++
724
741
725
- Readers must seek to `file_length - 68` to access the footer. A trusted
726
- external source (such as `stat(2)`) is necessary to obtain
727
- `file_length`. When reading the footer, readers must verify:
742
+ Readers must first read the file start to determine the version
743
+ number. Then they seek to `file_length - FOOTER_LENGTH` to access the
744
+ footer. A trusted external source (such as `stat(2)`) is necessary to
745
+ obtain `file_length`. When reading the footer, readers must verify:
728
746
729
747
* 4-byte magic is correct
730
748
* 1-byte version number is recognized
@@ -780,11 +798,11 @@ Lightweight refs dominate
780
798
^^^^^^^^^^^^^^^^^^^^^^^^^
781
799
782
800
The reftable format assumes the vast majority of references are single
783
- SHA-1 valued with common prefixes, such as Gerrit Code Review's
801
+ object names valued with common prefixes, such as Gerrit Code Review's
784
802
`refs/changes/` namespace, GitHub's `refs/pulls/` namespace, or many
785
803
lightweight tags in the `refs/tags/` namespace.
786
804
787
- Annotated tags storing the peeled object cost an additional 20 bytes per
805
+ Annotated tags storing the peeled object cost an additional object name per
788
806
reference.
789
807
790
808
Low overhead
@@ -817,7 +835,7 @@ Scans and lookups dominate
817
835
818
836
Scanning all references and lookup by name (or namespace such as
819
837
`refs/heads/`) are the most common activities performed on repositories.
820
- SHA-1s are stored directly with references to optimize this use case.
838
+ Object names are stored directly with references to optimize this use case.
821
839
822
840
Logs are infrequently read
823
841
^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1063,13 +1081,3 @@ impossible.
1063
1081
1064
1082
A common format that can be supported by all major Git implementations
1065
1083
(git-core, JGit, libgit2) is strongly preferred.
1066
-
1067
- Future
1068
- ~~~~~~
1069
-
1070
- Longer hashes
1071
- ^^^^^^^^^^^^^
1072
-
1073
- Version will bump (e.g. 2) to indicate `value` uses a different object
1074
- name length other than 20. The length could be stored in an expanded file
1075
- heaer, or hardcoded as part of the version.
0 commit comments