Skip to content

Commit 808515f

Browse files
committed
Merge branch 'je/doc-data-model' into seen
Add a new manual that describes the data model. Comments? * je/doc-data-model: fixup! doc: add a explanation of Git's data model doc: add a explanation of Git's data model
2 parents 650b281 + 9687be6 commit 808515f

File tree

3 files changed

+232
-0
lines changed

3 files changed

+232
-0
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ MAN7_TXT += gitcli.adoc
5353
MAN7_TXT += gitcore-tutorial.adoc
5454
MAN7_TXT += gitcredentials.adoc
5555
MAN7_TXT += gitcvs-migration.adoc
56+
MAN7_TXT += gitdatamodel.adoc
5657
MAN7_TXT += gitdiffcore.adoc
5758
MAN7_TXT += giteveryday.adoc
5859
MAN7_TXT += gitfaq.adoc

Documentation/gitdatamodel.adoc

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
gitdatamodel(7)
2+
===============
3+
4+
NAME
5+
----
6+
gitdatamodel - Git's core data model
7+
8+
SYNOPSIS
9+
--------
10+
Git data model
11+
12+
DESCRIPTION
13+
-----------
14+
15+
It's not necessary to understand Git's data model to use Git, but it's
16+
very helpful when reading Git's documentation so that you know what it
17+
means when the documentation says "object" "reference" or "index".
18+
19+
Git's core operations use 4 kinds of data:
20+
21+
1. <<objects,Objects>>: commits, trees, blobs, and tag objects
22+
2. <<references,References>>: branches, tags,
23+
remote-tracking branches, etc
24+
3. <<index,The index>>, also known as the staging area
25+
4. <<reflogs,Reflogs>>
26+
27+
[[objects]]
28+
OBJECTS
29+
-------
30+
31+
Commits, trees, blobs, and tag objects are all stored in Git's object database.
32+
Every object has:
33+
34+
1. an *ID*, which is the SHA-1 hash of its contents.
35+
It's fast to look up a Git object using its ID.
36+
The ID is usually represented in hexadecimal, like
37+
`1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
38+
2. a *type*. There are 4 types of objects:
39+
<<commit,commits>>, <<tree,trees>>, <<blob,blobs>>,
40+
and <<tag-object,tag objects>>.
41+
3. *contents*. The structure of the contents depends on the type.
42+
43+
Once an object is created, it can never be changed.
44+
Here are the 4 types of objects:
45+
46+
[[commit]]
47+
commits::
48+
A commit contains:
49+
+
50+
1. Its *parent commit ID(s)*. The first commit in a repository has 0 parents,
51+
regular commits have 1 parent, merge commits have 2+ parents
52+
2. A *commit message*
53+
3. All the *files* in the commit, stored as a *<<tree,tree>>*
54+
4. An *author* and the time the commit was authored
55+
5. A *committer* and the time the commit was committed
56+
+
57+
Here's how an example commit is stored:
58+
+
59+
----
60+
tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
61+
parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647
62+
author Maya <maya@example.com> 1759173425 -0400
63+
committer Maya <maya@example.com> 1759173425 -0400
64+
65+
Add README
66+
----
67+
+
68+
Like all other objects, commits can never be changed after they're created.
69+
For example, "amending" a commit with `git commit --amend` creates a new commit.
70+
The old commit will eventually be deleted by `git gc`.
71+
72+
[[tree]]
73+
trees::
74+
A tree is how Git represents a directory. It lists, for each item in
75+
the tree:
76+
+
77+
1. The *permissions*, for example `100644`
78+
2. The *type*: either <<blob,`blob`>> (a file), `tree` (a directory),
79+
or <<commit,`commit`>> (a Git submodule)
80+
3. The *object ID*
81+
4. The *filename*
82+
+
83+
For example, this is how a tree containing one directory (`src`) and one file
84+
(`README.md`) is stored:
85+
+
86+
----
87+
100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md
88+
040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src
89+
----
90+
+
91+
*NOTE:* The permissions are in the same format as UNIX permissions, but
92+
the only allowed permissions for files (blobs) are 644 and 755.
93+
94+
[[blob]]
95+
blobs::
96+
A blob is how Git represents a file. A blob object contains the
97+
file's contents.
98+
+
99+
Storing a new blob for every new version of a file can get big, so
100+
`git gc` periodically compresses objects for efficiency in `.git/objects/pack`.
101+
102+
[[tag-object]]
103+
tag objects::
104+
Tag objects (also known as "annotated tags") contain:
105+
+
106+
1. The *tagger* and tag date
107+
2. A *tag message*, similar to a commit message
108+
3. The *ID* of the object (often a commit) that they reference
109+
110+
[[references]]
111+
REFERENCES
112+
----------
113+
114+
References are a way to give a name to a commit.
115+
It's easier to remember "the changes I'm working on are on the `turtle`
116+
branch" than "the changes are in commit bb69721404348e".
117+
Git often uses "ref" as shorthand for "reference".
118+
119+
References that you create are stored in the `.git/refs` directory,
120+
and Git has a few special internal references like `HEAD` that are stored
121+
in the base `.git` directory.
122+
123+
References can either be:
124+
125+
1. References to an object ID, usually a <<commit,commit>> ID
126+
2. References to another reference. This is called a "symbolic reference".
127+
128+
Git handles references differently based on which subdirectory of
129+
`.git/refs` they're stored in.
130+
Here are the main types:
131+
132+
[[branch]]
133+
branches: `.git/refs/heads/<name>`::
134+
A branch is a name for a commit ID.
135+
That commit is the latest commit on the branch.
136+
Branches are stored in the `.git/refs/heads/` directory.
137+
+
138+
To get the history of commits on a branch, Git will start at the commit
139+
ID the branch references, and then look at the commit's parent(s),
140+
the parent's parent, etc.
141+
142+
[[tag]]
143+
tags: `.git/refs/tags/<name>`::
144+
A tag is a name for a commit ID, tag object ID, or other object ID.
145+
Tags are stored in the `refs/tags/` directory.
146+
+
147+
Even though branches and commits are both "a name for a commit ID", Git
148+
treats them very differently.
149+
Branches are expected to be regularly updated as you work on the branch,
150+
but it's expected that a tag will never change after you create it.
151+
152+
[[HEAD]]
153+
HEAD: `.git/HEAD`::
154+
`HEAD` is where Git stores your current <<branch,branch>>.
155+
`HEAD` is normally a symbolic reference to your current branch, for
156+
example `ref: refs/heads/main` if your current branch is `main`.
157+
`HEAD` can also be a direct reference to a commit ID,
158+
that's called "detached HEAD state".
159+
160+
[[remote-tracking-branch]]
161+
remote tracking branches: `.git/refs/remotes/<remote>/<branch>`::
162+
A remote-tracking branch is a name for a commit ID.
163+
It's how Git stores the last-known state of a branch in a remote
164+
repository. `git fetch` updates remote-tracking branches. When
165+
`git status` says "you're up to date with origin/main", it's looking at
166+
this.
167+
168+
[[other-refs]]
169+
Other references::
170+
Git tools may create references in any subdirectory of `.git/refs`.
171+
For example, linkgit:git-stash[1], linkgit:git-bisect[1],
172+
and linkgit:git-notes[1] all create their own references
173+
in `.git/refs/stash`, `.git/refs/bisect`, etc.
174+
Third-party Git tools may also create their own references.
175+
+
176+
Git may also create references in the base `.git` directory
177+
other than `HEAD`, like `ORIG_HEAD`.
178+
179+
*NOTE:* As an optimization, references may be stored as packed
180+
refs instead of in `.git/refs`. See linkgit:git-pack-refs[1].
181+
182+
[[index]]
183+
THE INDEX
184+
---------
185+
186+
The index, also known as the "staging area", contains the current staged
187+
version of every file in your Git repository. When you commit, the files
188+
in the index are used as the files in the next commit.
189+
190+
Unlike a tree, the index is a flat list of files.
191+
Each index entry has 4 fields:
192+
193+
1. The *permissions*
194+
2. The *<<blob,blob>> ID* of the file
195+
3. The *filename*
196+
4. The *number*. This is normally 0, but if there's a merge conflict
197+
there can be multiple versions (with numbers 0, 1, 2, ..)
198+
of the same filename in the index.
199+
200+
It's extremely uncommon to look at the index directly: normally you'd
201+
run `git status` to see a list of changes between the index and <<HEAD,HEAD>>.
202+
But you can use `git ls-files --stage` to see the index.
203+
Here's the output of `git ls-files --stage` in a repository with 2 files:
204+
205+
----
206+
100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md
207+
100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py
208+
----
209+
210+
[[reflogs]]
211+
REFLOGS
212+
-------
213+
214+
Git stores the history of branch, tag, and HEAD refs in a reflog
215+
(you should read "reflog" as "ref log"). Not every ref is logged by
216+
default, but any ref can be logged.
217+
218+
Each reflog entry has:
219+
220+
1. *Before/after *commit IDs*
221+
2. *User* who made the change, for example `Maya <[email protected]>`
222+
3. *Timestamp*
223+
4. *Log message*, for example `pull: Fast-forward`
224+
225+
Reflogs only log changes made in your local repository.
226+
They are not shared with remotes.
227+
228+
GIT
229+
---
230+
Part of the linkgit:git[1] suite

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ manpages = {
194194
'gitcore-tutorial.adoc' : 7,
195195
'gitcredentials.adoc' : 7,
196196
'gitcvs-migration.adoc' : 7,
197+
'gitdatamodel.adoc' : 7,
197198
'gitdiffcore.adoc' : 7,
198199
'giteveryday.adoc' : 7,
199200
'gitfaq.adoc' : 7,

0 commit comments

Comments
 (0)