Skip to content

Commit 56f8416

Browse files
committed
Merge branch 'je/doc-data-model' into seen
Add a new manual that describes the data model. Comments? * je/doc-data-model: doc: add a explanation of Git's data model
2 parents 49285c6 + 1368c08 commit 56f8416

File tree

2 files changed

+227
-0
lines changed

2 files changed

+227
-0
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ MAN7_TXT += gitcli.adoc
5353
MAN7_TXT += gitcore-tutorial.adoc
5454
MAN7_TXT += gitcredentials.adoc
5555
MAN7_TXT += gitcvs-migration.adoc
56+
MAN7_TXT += gitdatamodel.adoc
5657
MAN7_TXT += gitdiffcore.adoc
5758
MAN7_TXT += giteveryday.adoc
5859
MAN7_TXT += gitfaq.adoc

Documentation/gitdatamodel.adoc

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
gitdatamodel(7)
2+
===============
3+
4+
NAME
5+
----
6+
gitdatamodel - Git's core data model
7+
8+
DESCRIPTION
9+
-----------
10+
11+
It's not necessary to understand Git's data model to use Git, but it's
12+
very helpful when reading Git's documentation so that you know what it
13+
means when the documentation says "object" "reference" or "index".
14+
15+
Git's core operations use 4 kinds of data:
16+
17+
1. <<objects,Objects>>: commits, trees, blobs, and tag objects
18+
2. <<references,References>>: branches, tags,
19+
remote-tracking branches, etc
20+
3. <<index,The index>>, also known as the staging area
21+
4. <<reflogs,Reflogs>>
22+
23+
[[objects]]
24+
OBJECTS
25+
-------
26+
27+
Commits, trees, blobs, and tag objects are all stored in Git's object database.
28+
Every object has:
29+
30+
1. an *ID*, which is the SHA-1 hash of its contents.
31+
It's fast to look up a Git object using its ID.
32+
The ID is usually represented in hexadecimal, like
33+
`1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
34+
2. a *type*. There are 4 types of objects:
35+
<<commit,commits>>, <<tree,trees>>, <<blob,blobs>>,
36+
and <<tag-object,tag objects>>.
37+
3. *contents*. The structure of the contents depends on the type.
38+
39+
Once an object is created, it can never be changed.
40+
Here are the 4 types of objects:
41+
42+
[[commit]]
43+
commits::
44+
A commit contains:
45+
+
46+
1. Its *parent commit ID(s)*. The first commit in a repository has 0 parents,
47+
regular commits have 1 parent, merge commits have 2+ parents
48+
2. A *commit message*
49+
3. All the *files* in the commit, stored as a *<<tree,tree>>*
50+
4. An *author* and the time the commit was authored
51+
5. A *committer* and the time the commit was committed
52+
+
53+
Here's how an example commit is stored:
54+
+
55+
----
56+
tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
57+
parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647
58+
author Maya <maya@example.com> 1759173425 -0400
59+
committer Maya <maya@example.com> 1759173425 -0400
60+
61+
Add README
62+
----
63+
+
64+
Like all other objects, commits can never be changed after they're created.
65+
For example, "amending" a commit with `git commit --amend` creates a new commit.
66+
The old commit will eventually be deleted by `git gc`.
67+
68+
[[tree]]
69+
trees::
70+
A tree is how Git represents a directory. It lists, for each item in
71+
the tree:
72+
+
73+
1. The *permissions*, for example `100644`
74+
2. The *type*: either <<blob,`blob`>> (a file), `tree` (a directory),
75+
or <<commit,`commit`>> (a Git submodule)
76+
3. The *object ID*
77+
4. The *filename*
78+
+
79+
For example, this is how a tree containing one directory (`src`) and one file
80+
(`README.md`) is stored:
81+
+
82+
----
83+
100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md
84+
040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src
85+
----
86+
+
87+
*NOTE:* The permissions are in the same format as UNIX permissions, but
88+
the only allowed permissions for files (blobs) are 644 and 755.
89+
90+
[[blob]]
91+
blobs::
92+
A blob is how Git represents a file. A blob object contains the
93+
file's contents.
94+
+
95+
Storing a new blob for every new version of a file can get big, so
96+
`git gc` periodically compresses objects for efficiency in `.git/objects/pack`.
97+
98+
[[tag-object]]
99+
tag objects::
100+
Tag objects (also known as "annotated tags") contain:
101+
+
102+
1. The *tagger* and tag date
103+
2. A *tag message*, similar to a commit message
104+
3. The *ID* of the object (often a commit) that they reference
105+
106+
[[references]]
107+
REFERENCES
108+
----------
109+
110+
References are a way to give a name to a commit.
111+
It's easier to remember "the changes I'm working on are on the `turtle`
112+
branch" than "the changes are in commit bb69721404348e".
113+
Git often uses "ref" as shorthand for "reference".
114+
115+
References that you create are stored in the `.git/refs` directory,
116+
and Git has a few special internal references like `HEAD` that are stored
117+
in the base `.git` directory.
118+
119+
References can either be:
120+
121+
1. References to an object ID, usually a <<commit,commit>> ID
122+
2. References to another reference. This is called a "symbolic reference".
123+
124+
Git handles references differently based on which subdirectory of
125+
`.git/refs` they're stored in.
126+
Here are the main types:
127+
128+
[[branch]]
129+
branches: `.git/refs/heads/<name>`::
130+
A branch is a name for a commit ID.
131+
That commit is the latest commit on the branch.
132+
Branches are stored in the `.git/refs/heads/` directory.
133+
+
134+
To get the history of commits on a branch, Git will start at the commit
135+
ID the branch references, and then look at the commit's parent(s),
136+
the parent's parent, etc.
137+
138+
[[tag]]
139+
tags: `.git/refs/tags/<name>`::
140+
A tag is a name for a commit ID, tag object ID, or other object ID.
141+
Tags are stored in the `refs/tags/` directory.
142+
+
143+
Even though branches and commits are both "a name for a commit ID", Git
144+
treats them very differently.
145+
Branches are expected to be regularly updated as you work on the branch,
146+
but it's expected that a tag will never change after you create it.
147+
148+
[[HEAD]]
149+
HEAD: `.git/HEAD`::
150+
`HEAD` is where Git stores your current <<branch,branch>>.
151+
`HEAD` is normally a symbolic reference to your current branch, for
152+
example `ref: refs/heads/main` if your current branch is `main`.
153+
`HEAD` can also be a direct reference to a commit ID,
154+
that's called "detached HEAD state".
155+
156+
[[remote-tracking-branch]]
157+
remote tracking branches: `.git/refs/remotes/<remote>/<branch>`::
158+
A remote-tracking branch is a name for a commit ID.
159+
It's how Git stores the last-known state of a branch in a remote
160+
repository. `git fetch` updates remote-tracking branches. When
161+
`git status` says "you're up to date with origin/main", it's looking at
162+
this.
163+
164+
[[other-refs]]
165+
Other references::
166+
Git tools may create references in any subdirectory of `.git/refs`.
167+
For example, linkgit:git-stash[1], linkgit:git-bisect[1],
168+
and linkgit:git-notes[1] all create their own references
169+
in `.git/refs/stash`, `.git/refs/bisect`, etc.
170+
Third-party Git tools may also create their own references.
171+
+
172+
Git may also create references in the base `.git` directory
173+
other than `HEAD`, like `ORIG_HEAD`.
174+
175+
*NOTE:* As an optimization, references may be stored as packed
176+
refs instead of in `.git/refs`. See linkgit:git-pack-refs[1].
177+
178+
[[index]]
179+
THE INDEX
180+
---------
181+
182+
The index, also known as the "staging area", contains the current staged
183+
version of every file in your Git repository. When you commit, the files
184+
in the index are used as the files in the next commit.
185+
186+
Unlike a tree, the index is a flat list of files.
187+
Each index entry has 4 fields:
188+
189+
1. The *permissions*
190+
2. The *<<blob,blob>> ID* of the file
191+
3. The *filename*
192+
4. The *number*. This is normally 0, but if there's a merge conflict
193+
there can be multiple versions (with numbers 0, 1, 2, ..)
194+
of the same filename in the index.
195+
196+
It's extremely uncommon to look at the index directly: normally you'd
197+
run `git status` to see a list of changes between the index and <<HEAD,HEAD>>.
198+
But you can use `git ls-files --stage` to see the index.
199+
Here's the output of `git ls-files --stage` in a repository with 2 files:
200+
201+
----
202+
100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md
203+
100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py
204+
----
205+
206+
[[reflogs]]
207+
REFLOGS
208+
-------
209+
210+
Git stores the history of branch, tag, and HEAD refs in a reflog
211+
(you should read "reflog" as "ref log"). Not every ref is logged by
212+
default, but any ref can be logged.
213+
214+
Each reflog entry has:
215+
216+
1. *Before/after *commit IDs*
217+
2. *User* who made the change, for example `Maya <[email protected]>`
218+
3. *Timestamp*
219+
4. *Log message*, for example `pull: Fast-forward`
220+
221+
Reflogs only log changes made in your local repository.
222+
They are not shared with remotes.
223+
224+
GIT
225+
---
226+
Part of the linkgit:git[1] suite

0 commit comments

Comments
 (0)