Skip to content

Commit 094574b

Browse files
committed
Merge branch 'nd/index-doc' into maint
* nd/index-doc: doc: technical details about the index file format doc: technical details about the index file format
2 parents 2aa5b6b + 23fcc98 commit 094574b

File tree

1 file changed

+185
-0
lines changed

1 file changed

+185
-0
lines changed
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
GIT index format
2+
================
3+
4+
= The git index file has the following format
5+
6+
All binary numbers are in network byte order. Version 2 is described
7+
here unless stated otherwise.
8+
9+
- A 12-byte header consisting of
10+
11+
4-byte signature:
12+
The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
13+
14+
4-byte version number:
15+
The current supported versions are 2 and 3.
16+
17+
32-bit number of index entries.
18+
19+
- A number of sorted index entries (see below).
20+
21+
- Extensions
22+
23+
Extensions are identified by signature. Optional extensions can
24+
be ignored if GIT does not understand them.
25+
26+
GIT currently supports cached tree and resolve undo extensions.
27+
28+
4-byte extension signature. If the first byte is 'A'..'Z' the
29+
extension is optional and can be ignored.
30+
31+
32-bit size of the extension
32+
33+
Extension data
34+
35+
- 160-bit SHA-1 over the content of the index file before this
36+
checksum.
37+
38+
== Index entry
39+
40+
Index entries are sorted in ascending order on the name field,
41+
interpreted as a string of unsigned bytes (i.e. memcmp() order, no
42+
localization, no special casing of directory separator '/'). Entries
43+
with the same name are sorted by their stage field.
44+
45+
32-bit ctime seconds, the last time a file's metadata changed
46+
this is stat(2) data
47+
48+
32-bit ctime nanosecond fractions
49+
this is stat(2) data
50+
51+
32-bit mtime seconds, the last time a file's data changed
52+
this is stat(2) data
53+
54+
32-bit mtime nanosecond fractions
55+
this is stat(2) data
56+
57+
32-bit dev
58+
this is stat(2) data
59+
60+
32-bit ino
61+
this is stat(2) data
62+
63+
32-bit mode, split into (high to low bits)
64+
65+
4-bit object type
66+
valid values in binary are 1000 (regular file), 1010 (symbolic link)
67+
and 1110 (gitlink)
68+
69+
3-bit unused
70+
71+
9-bit unix permission. Only 0755 and 0644 are valid for regular files.
72+
Symbolic links and gitlinks have value 0 in this field.
73+
74+
32-bit uid
75+
this is stat(2) data
76+
77+
32-bit gid
78+
this is stat(2) data
79+
80+
32-bit file size
81+
This is the on-disk size from stat(2), truncated to 32-bit.
82+
83+
160-bit SHA-1 for the represented object
84+
85+
A 16-bit 'flags' field split into (high to low bits)
86+
87+
1-bit assume-valid flag
88+
89+
1-bit extended flag (must be zero in version 2)
90+
91+
2-bit stage (during merge)
92+
93+
12-bit name length if the length is less than 0xFFF; otherwise 0xFFF
94+
is stored in this field.
95+
96+
(Version 3) A 16-bit field, only applicable if the "extended flag"
97+
above is 1, split into (high to low bits).
98+
99+
1-bit reserved for future
100+
101+
1-bit skip-worktree flag (used by sparse checkout)
102+
103+
1-bit intent-to-add flag (used by "git add -N")
104+
105+
13-bit unused, must be zero
106+
107+
Entry path name (variable length) relative to top level directory
108+
(without leading slash). '/' is used as path separator. The special
109+
path components ".", ".." and ".git" (without quotes) are disallowed.
110+
Trailing slash is also disallowed.
111+
112+
The exact encoding is undefined, but the '.' and '/' characters
113+
are encoded in 7-bit ASCII and the encoding cannot contain a NUL
114+
byte (iow, this is a UNIX pathname).
115+
116+
1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes
117+
while keeping the name NUL-terminated.
118+
119+
== Extensions
120+
121+
=== Cached tree
122+
123+
Cached tree extension contains pre-computed hashes for trees that can
124+
be derived from the index. It helps speed up tree object generation
125+
from index for a new commit.
126+
127+
When a path is updated in index, the path must be invalidated and
128+
removed from tree cache.
129+
130+
The signature for this extension is { 'T', 'R', 'E', 'E' }.
131+
132+
A series of entries fill the entire extension; each of which
133+
consists of:
134+
135+
- NUL-terminated path component (relative to its parent directory);
136+
137+
- ASCII decimal number of entries in the index that is covered by the
138+
tree this entry represents (entry_count);
139+
140+
- A space (ASCII 32);
141+
142+
- ASCII decimal number that represents the number of subtrees this
143+
tree has;
144+
145+
- A newline (ASCII 10); and
146+
147+
- 160-bit object name for the object that would result from writing
148+
this span of index as a tree.
149+
150+
An entry can be in an invalidated state and is represented by having -1
151+
in the entry_count field.
152+
153+
The entries are written out in the top-down, depth-first order. The
154+
first entry represents the root level of the repository, followed by the
155+
first subtree---let's call this A---of the root level (with its name
156+
relative to the root level), followed by the first subtree of A (with
157+
its name relative to A), ...
158+
159+
=== Resolve undo
160+
161+
A conflict is represented in the index as a set of higher stage entries.
162+
When a conflict is resolved (e.g. with "git add path"), these higher
163+
stage entries will be removed and a stage-0 entry with proper resoluton
164+
is added.
165+
166+
When these higher stage entries are removed, they are saved in the
167+
resolve undo extension, so that conflicts can be recreated (e.g. with
168+
"git checkout -m"), in case users want to redo a conflict resolution
169+
from scratch.
170+
171+
The signature for this extension is { 'R', 'E', 'U', 'C' }.
172+
173+
A series of entries fill the entire extension; each of which
174+
consists of:
175+
176+
- NUL-terminated pathname the entry describes (relative to the root of
177+
the repository, i.e. full pathname);
178+
179+
- Three NUL-terminated ASCII octal numbers, entry mode of entries in
180+
stage 1 to 3 (a missing stage is represented by "0" in this field);
181+
and
182+
183+
- At most three 160-bit object names of the entry in stages from 1 to 3
184+
(nothing is written for a missing stage).
185+

0 commit comments

Comments
 (0)