Skip to content

Commit 22ad860

Browse files
derrickstoleegitster
authored andcommitted
index-format: update preamble to cache tree extension
I had difficulty in my efforts to learn about the cache tree extension based on the documentation and code because I had an incorrect assumption about how it behaved. This might be due to some ambiguity in the documentation, so this change modifies the beginning of the cache tree format by expanding the description of the feature. My hope is that this documentation clarifies a few things: 1. There is an in-memory recursive tree structure that is constructed from the extension data. This structure has a few differences, such as where the name is stored. 2. What does it mean for an entry to be invalid? 3. When exactly are "new" trees created? Helped-by: Junio C Hamano <[email protected]> Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 845d15d commit 22ad860

File tree

1 file changed

+27
-6
lines changed

1 file changed

+27
-6
lines changed

Documentation/technical/index-format.txt

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -138,12 +138,33 @@ Git index format
138138

139139
=== Cache tree
140140

141-
Cache tree extension contains pre-computed hashes for trees that can
142-
be derived from the index. It helps speed up tree object generation
143-
from index for a new commit.
144-
145-
When a path is updated in index, the path must be invalidated and
146-
removed from tree cache.
141+
Since the index does not record entries for directories, the cache
142+
entries cannot describe tree objects that already exist in the object
143+
database for regions of the index that are unchanged from an existing
144+
commit. The cache tree extension stores a recursive tree structure that
145+
describes the trees that already exist and completely match sections of
146+
the cache entries. This speeds up tree object generation from the index
147+
for a new commit by only computing the trees that are "new" to that
148+
commit. It also assists when comparing the index to another tree, such
149+
as `HEAD^{tree}`, since sections of the index can be skipped when a tree
150+
comparison demonstrates equality.
151+
152+
The recursive tree structure uses nodes that store a number of cache
153+
entries, a list of subnodes, and an object ID (OID). The OID references
154+
the existing tree for that node, if it is known to exist. The subnodes
155+
correspond to subdirectories that themselves have cache tree nodes. The
156+
number of cache entries corresponds to the number of cache entries in
157+
the index that describe paths within that tree's directory.
158+
159+
The extension tracks the full directory structure in the cache tree
160+
extension, but this is generally smaller than the full cache entry list.
161+
162+
When a path is updated in index, Git invalidates all nodes of the
163+
recursive cache tree corresponding to the parent directories of that
164+
path. We store these tree nodes as being "invalid" by using "-1" as the
165+
number of cache entries. Invalid nodes still store a span of index
166+
entries, allowing Git to focus its efforts when reconstructing a full
167+
cache tree.
147168

148169
The signature for this extension is { 'T', 'R', 'E', 'E' }.
149170

0 commit comments

Comments
 (0)