Skip to content

Commit d6bd63a

Browse files
committed
Merge branch 'je/doc-data-model' into seen
Add a new manual that describes the data model. Comments? * je/doc-data-model: doc: add a explanation of Git's data model
2 parents 1eaf3f5 + c080aba commit d6bd63a

File tree

4 files changed

+252
-2
lines changed

4 files changed

+252
-2
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ MAN7_TXT += gitcli.adoc
5353
MAN7_TXT += gitcore-tutorial.adoc
5454
MAN7_TXT += gitcredentials.adoc
5555
MAN7_TXT += gitcvs-migration.adoc
56+
MAN7_TXT += gitdatamodel.adoc
5657
MAN7_TXT += gitdiffcore.adoc
5758
MAN7_TXT += giteveryday.adoc
5859
MAN7_TXT += gitfaq.adoc

Documentation/gitdatamodel.adoc

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
gitdatamodel(7)
2+
===============
3+
4+
NAME
5+
----
6+
gitdatamodel - Git's core data model
7+
8+
SYNOPSIS
9+
--------
10+
gitdatamodel
11+
12+
DESCRIPTION
13+
-----------
14+
15+
It's not necessary to understand Git's data model to use Git, but it's
16+
very helpful when reading Git's documentation so that you know what it
17+
means when the documentation says "object", "reference" or "index".
18+
19+
Git's core operations use 4 kinds of data:
20+
21+
1. <<objects,Objects>>: commits, trees, blobs, and tag objects
22+
2. <<references,References>>: branches, tags,
23+
remote-tracking branches, etc
24+
3. <<index,The index>>, also known as the staging area
25+
4. <<reflogs,Reflogs>>
26+
27+
[[objects]]
28+
OBJECTS
29+
-------
30+
31+
Commits, trees, blobs, and tag objects are all stored in Git's object database.
32+
Every object has:
33+
34+
1. an *ID* (aka "object name"), which is a cryptographic hash of its
35+
type and contents.
36+
It's fast to look up a Git object using its ID.
37+
This is usually represented in hexadecimal, like
38+
`1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
39+
2. a *type*. There are 4 types of objects:
40+
<<commit,commits>>, <<tree,trees>>, <<blob,blobs>>,
41+
and <<tag-object,tag objects>>.
42+
3. *contents*. The structure of the contents depends on the type.
43+
44+
Once an object is created, it can never be changed.
45+
Here are the 4 types of objects:
46+
47+
[[commit]]
48+
commits::
49+
A commit contains these required fields
50+
(though there are other optional fields):
51+
+
52+
1. Its *parent commit ID(s)*. The first commit in a repository has 0 parents,
53+
regular commits have 1 parent, merge commits have 2 or more parents
54+
2. A *commit message*
55+
3. All the *files* in the commit, stored as a *<<tree,tree>>*
56+
4. An *author* and the time the commit was authored
57+
5. A *committer* and the time the commit was committed
58+
+
59+
Here's how an example commit is stored:
60+
+
61+
----
62+
tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
63+
parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647
64+
author Maya <maya@example.com> 1759173425 -0400
65+
committer Maya <maya@example.com> 1759173425 -0400
66+
67+
Add README
68+
----
69+
+
70+
Like all other objects, commits can never be changed after they're created.
71+
For example, "amending" a commit with `git commit --amend` creates a new commit.
72+
73+
[[tree]]
74+
trees::
75+
A tree is how Git represents a directory. It lists, for each item in
76+
the tree:
77+
+
78+
1. The *file mode*, for example `100644`
79+
2. The *type*: either <<blob,`blob`>> (a file), `tree` (a directory),
80+
or <<commit,`commit`>> (a Git submodule)
81+
3. The *object ID*
82+
4. The *filename*
83+
+
84+
For example, this is how a tree containing one directory (`src`) and one file
85+
(`README.md`) is stored:
86+
+
87+
----
88+
100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md
89+
040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src
90+
----
91+
+
92+
Git only supports these file modes:
93+
+
94+
- `100644`: regular file (with type `blob`)
95+
- `100755`: executable file (with type `blob`)
96+
- `120000`: symbolic link (with type `blob`)
97+
- `040000`: directory (with type `tree`)
98+
- `160000`: gitlink, for use with submodules (with type `commit`)
99+
100+
[[blob]]
101+
blobs::
102+
A blob is how Git represents a file. A blob object contains the
103+
file's contents.
104+
+
105+
106+
NOTE: Storing a new blob for every new version of a file can use a
107+
lot of disk space. To handle this, Git periodically runs repository
108+
maintenance with linkgit:git-gc[1]. Part of this maintenance is
109+
compressing objects so that if a small part of a file was changed, only
110+
the change is stored instead of the whole file.
111+
112+
[[tag-object]]
113+
tag objects::
114+
Tag objects (also known as "annotated tags") contain these required fields
115+
(though there are other optional fields):
116+
+
117+
1. The *tagger* and tag date
118+
2. A *tag message*, similar to a commit message
119+
3. The *ID* and *type* of the object (often a commit) that they reference
120+
121+
Here's how an example tag object is stored:
122+
123+
----
124+
object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3
125+
type commit
126+
tag v1.0.0
127+
tagger Maya <[email protected]> 1759927359 -0400
128+
129+
Release version 1.0.0
130+
----
131+
132+
[[references]]
133+
REFERENCES
134+
----------
135+
136+
References are a way to give a name to a commit.
137+
It's easier to remember "the changes I'm working on are on the `turtle`
138+
branch" than "the changes are in commit bb69721404348e".
139+
Git often uses "ref" as shorthand for "reference".
140+
141+
References can either be:
142+
143+
1. References to an object ID, usually a <<commit,commit>> ID
144+
2. References to another reference. This is called a "symbolic reference".
145+
146+
References are stored in a hierarchy, and Git handles references
147+
differently based on where they are in the hierarchy.
148+
Most references are under `refs/`. Here are the main types:
149+
150+
[[branch]]
151+
branches: `refs/heads/<name>`::
152+
A branch is a name for a commit ID.
153+
That commit is the latest commit on the branch.
154+
+
155+
To get the history of commits on a branch, Git will start at the commit
156+
ID the branch references, and then look at the commit's parent(s),
157+
the parent's parent, etc.
158+
159+
[[tag]]
160+
tags: `refs/tags/<name>`::
161+
A tag is a name for a commit ID, tag object ID, or other object ID.
162+
Tags that reference a tag object ID are called "annotated tags",
163+
because the tag object contains a tag message.
164+
Tags that reference a commit ID, blob ID, or tree ID are
165+
called "lightweight tags".
166+
+
167+
Even though branches and tags are both "a name for a commit ID", Git
168+
treats them very differently.
169+
Branches are expected to change over time: when you make a commit, Git
170+
will update your <<HEAD,current branch>> to reference the new changes.
171+
It's expected that a tag will never change after you create it.
172+
173+
[[HEAD]]
174+
HEAD: `HEAD`::
175+
`HEAD` is where Git stores your current <<branch,branch>>.
176+
`HEAD` can either be:
177+
1. A symbolic reference to your current branch, for example `ref:
178+
refs/heads/main` if your current branch is `main`.
179+
2. A direct reference to a commit ID.
180+
This is called "detached HEAD state".
181+
182+
[[remote-tracking-branch]]
183+
remote tracking branches: `refs/remotes/<remote>/<branch>`::
184+
A remote-tracking branch is a name for a commit ID.
185+
It's how Git stores the last-known state of a branch in a remote
186+
repository. `git fetch` updates remote-tracking branches. When
187+
`git status` says "you're up to date with origin/main", it's looking at
188+
this.
189+
190+
[[other-refs]]
191+
Other references::
192+
Git tools may create references anywhere under `refs/`.
193+
For example, linkgit:git-stash[1], linkgit:git-bisect[1],
194+
and linkgit:git-notes[1] all create their own references
195+
in `refs/stash`, `refs/bisect`, etc.
196+
Third-party Git tools may also create their own references.
197+
+
198+
Git may also create references other than `HEAD` at the base of the
199+
hierarchy, like `ORIG_HEAD`.
200+
201+
[[index]]
202+
THE INDEX
203+
---------
204+
205+
The index, also known as the "staging area", contains the current staged
206+
version of every file in your Git repository. When you commit, the files
207+
in the index are used as the files in the next commit.
208+
209+
Unlike a tree, the index is a flat list of files.
210+
Each index entry has 4 fields:
211+
212+
1. The *permissions*
213+
2. The *<<blob,blob>> ID* of the file
214+
3. The *filename*
215+
4. The *stage number*. This is normally 0, but if there's a merge conflict
216+
there can be multiple versions (with numbers 0, 1, 2, ..)
217+
of the same filename in the index.
218+
219+
It's extremely uncommon to look at the index directly: normally you'd
220+
run `git status` to see a list of changes between the index and <<HEAD,HEAD>>.
221+
But you can use `git ls-files --stage` to see the index.
222+
Here's the output of `git ls-files --stage` in a repository with 2 files:
223+
224+
----
225+
100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md
226+
100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py
227+
----
228+
229+
[[reflogs]]
230+
REFLOGS
231+
-------
232+
233+
Git stores the history of your branch, remote-tracking branch, and HEAD refs
234+
in a reflog (you should read "reflog" as "ref log").
235+
236+
Each reflog entry has:
237+
238+
1. Before/after *commit IDs*
239+
2. *User* who made the change, for example `Maya <[email protected]>`
240+
3. *Timestamp* when the change was made
241+
4. *Log message*, for example `pull: Fast-forward`
242+
243+
Reflogs only log changes made in your local repository.
244+
They are not shared with remotes.
245+
246+
GIT
247+
---
248+
Part of the linkgit:git[1] suite

Documentation/glossary-content.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -297,8 +297,8 @@ This commit is referred to as a "merge commit", or sometimes just a
297297
identified by its <<def_object_name,object name>>. The objects usually
298298
live in `$GIT_DIR/objects/`.
299299
300-
[[def_object_identifier]]object identifier (oid)::
301-
Synonym for <<def_object_name,object name>>.
300+
[[def_object_identifier]]object identifier, object ID, oid::
301+
Synonyms for <<def_object_name,object name>>.
302302
303303
[[def_object_name]]object name::
304304
The unique identifier of an <<def_object,object>>. The

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ manpages = {
194194
'gitcore-tutorial.adoc' : 7,
195195
'gitcredentials.adoc' : 7,
196196
'gitcvs-migration.adoc' : 7,
197+
'gitdatamodel.adoc' : 7,
197198
'gitdiffcore.adoc' : 7,
198199
'giteveryday.adoc' : 7,
199200
'gitfaq.adoc' : 7,

0 commit comments

Comments
 (0)