diff --git a/Documentation/Makefile b/Documentation/Makefile index 6fb83d0c6ebf22..5f4acfacbdb6f0 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -52,6 +52,7 @@ MAN7_TXT += gitcli.adoc MAN7_TXT += gitcore-tutorial.adoc MAN7_TXT += gitcredentials.adoc MAN7_TXT += gitcvs-migration.adoc +MAN7_TXT += gitdatamodel.adoc MAN7_TXT += gitdiffcore.adoc MAN7_TXT += giteveryday.adoc MAN7_TXT += gitfaq.adoc diff --git a/Documentation/gitdatamodel.adoc b/Documentation/gitdatamodel.adoc new file mode 100644 index 00000000000000..c3a25ea8d24bec --- /dev/null +++ b/Documentation/gitdatamodel.adoc @@ -0,0 +1,248 @@ +gitdatamodel(7) +=============== + +NAME +---- +gitdatamodel - Git's core data model + +SYNOPSIS +-------- +gitdatamodel + +DESCRIPTION +----------- + +It's not necessary to understand Git's data model to use Git, but it's +very helpful when reading Git's documentation so that you know what it +means when the documentation says "object", "reference" or "index". + +Git's core operations use 4 kinds of data: + +1. <>: commits, trees, blobs, and tag objects +2. <>: branches, tags, + remote-tracking branches, etc +3. <>, also known as the staging area +4. <> + +[[objects]] +OBJECTS +------- + +Commits, trees, blobs, and tag objects are all stored in Git's object database. +Every object has: + +1. an *ID* (aka "object name"), which is a cryptographic hash of its + type and contents. + It's fast to look up a Git object using its ID. + This is usually represented in hexadecimal, like + `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. +2. a *type*. There are 4 types of objects: + <>, <>, <>, + and <>. +3. *contents*. The structure of the contents depends on the type. + +Once an object is created, it can never be changed. +Here are the 4 types of objects: + +[[commit]] +commits:: + A commit contains these required fields + (though there are other optional fields): ++ +1. Its *parent commit ID(s)*. The first commit in a repository has 0 parents, + regular commits have 1 parent, merge commits have 2 or more parents +2. A *commit message* +3. All the *files* in the commit, stored as a *<>* +4. An *author* and the time the commit was authored +5. A *committer* and the time the commit was committed ++ +Here's how an example commit is stored: ++ +---- +tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a +parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647 +author Maya 1759173425 -0400 +committer Maya 1759173425 -0400 + +Add README +---- ++ +Like all other objects, commits can never be changed after they're created. +For example, "amending" a commit with `git commit --amend` creates a new commit. + +[[tree]] +trees:: + A tree is how Git represents a directory. It lists, for each item in + the tree: ++ +1. The *file mode*, for example `100644` +2. The *type*: either <> (a file), `tree` (a directory), + or <> (a Git submodule) +3. The *object ID* +4. The *filename* ++ +For example, this is how a tree containing one directory (`src`) and one file +(`README.md`) is stored: ++ +---- +100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md +040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src +---- ++ +Git only supports these file modes: ++ + - `100644`: regular file (with type `blob`) + - `100755`: executable file (with type `blob`) + - `120000`: symbolic link (with type `blob`) + - `040000`: directory (with type `tree`) + - `160000`: gitlink, for use with submodules (with type `commit`) + +[[blob]] +blobs:: + A blob is how Git represents a file. A blob object contains the + file's contents. ++ + +NOTE: Storing a new blob for every new version of a file can use a +lot of disk space. To handle this, Git periodically runs repository +maintenance with linkgit:git-gc[1]. Part of this maintenance is +compressing objects so that if a small part of a file was changed, only +the change is stored instead of the whole file. + +[[tag-object]] +tag objects:: + Tag objects (also known as "annotated tags") contain these required fields + (though there are other optional fields): ++ +1. The *tagger* and tag date +2. A *tag message*, similar to a commit message +3. The *ID* and *type* of the object (often a commit) that they reference + +Here's how an example tag object is stored: + +---- +object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3 +type commit +tag v1.0.0 +tagger Maya 1759927359 -0400 + +Release version 1.0.0 +---- + +[[references]] +REFERENCES +---------- + +References are a way to give a name to a commit. +It's easier to remember "the changes I'm working on are on the `turtle` +branch" than "the changes are in commit bb69721404348e". +Git often uses "ref" as shorthand for "reference". + +References can either be: + +1. References to an object ID, usually a <> ID +2. References to another reference. This is called a "symbolic reference". + +References are stored in a hierarchy, and Git handles references +differently based on where they are in the hierarchy. +Most references are under `refs/`. Here are the main types: + +[[branch]] +branches: `refs/heads/`:: + A branch is a name for a commit ID. + That commit is the latest commit on the branch. ++ +To get the history of commits on a branch, Git will start at the commit +ID the branch references, and then look at the commit's parent(s), +the parent's parent, etc. + +[[tag]] +tags: `refs/tags/`:: + A tag is a name for a commit ID, tag object ID, or other object ID. + Tags that reference a tag object ID are called "annotated tags", + because the tag object contains a tag message. + Tags that reference a commit ID, blob ID, or tree ID are + called "lightweight tags". ++ +Even though branches and tags are both "a name for a commit ID", Git +treats them very differently. +Branches are expected to change over time: when you make a commit, Git +will update your <> to reference the new changes. +It's expected that a tag will never change after you create it. + +[[HEAD]] +HEAD: `HEAD`:: + `HEAD` is where Git stores your current <>. + `HEAD` can either be: + 1. A symbolic reference to your current branch, for example `ref: + refs/heads/main` if your current branch is `main`. + 2. A direct reference to a commit ID. + This is called "detached HEAD state". + +[[remote-tracking-branch]] +remote tracking branches: `refs/remotes//`:: + A remote-tracking branch is a name for a commit ID. + It's how Git stores the last-known state of a branch in a remote + repository. `git fetch` updates remote-tracking branches. When + `git status` says "you're up to date with origin/main", it's looking at + this. + +[[other-refs]] +Other references:: + Git tools may create references anywhere under `refs/`. + For example, linkgit:git-stash[1], linkgit:git-bisect[1], + and linkgit:git-notes[1] all create their own references + in `refs/stash`, `refs/bisect`, etc. + Third-party Git tools may also create their own references. ++ +Git may also create references other than `HEAD` at the base of the +hierarchy, like `ORIG_HEAD`. + +[[index]] +THE INDEX +--------- + +The index, also known as the "staging area", contains the current staged +version of every file in your Git repository. When you commit, the files +in the index are used as the files in the next commit. + +Unlike a tree, the index is a flat list of files. +Each index entry has 4 fields: + +1. The *permissions* +2. The *<> ID* of the file +3. The *filename* +4. The *stage number*. This is normally 0, but if there's a merge conflict + there can be multiple versions (with numbers 0, 1, 2, ..) + of the same filename in the index. + +It's extremely uncommon to look at the index directly: normally you'd +run `git status` to see a list of changes between the index and <>. +But you can use `git ls-files --stage` to see the index. +Here's the output of `git ls-files --stage` in a repository with 2 files: + +---- +100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md +100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py +---- + +[[reflogs]] +REFLOGS +------- + +Git stores the history of your branch, remote-tracking branch, and HEAD refs +in a reflog (you should read "reflog" as "ref log"). + +Each reflog entry has: + +1. Before/after *commit IDs* +2. *User* who made the change, for example `Maya ` +3. *Timestamp* when the change was made +4. *Log message*, for example `pull: Fast-forward` + +Reflogs only log changes made in your local repository. +They are not shared with remotes. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Documentation/glossary-content.adoc b/Documentation/glossary-content.adoc index e423e4765b71b0..20ba121314b9a4 100644 --- a/Documentation/glossary-content.adoc +++ b/Documentation/glossary-content.adoc @@ -297,8 +297,8 @@ This commit is referred to as a "merge commit", or sometimes just a identified by its <>. The objects usually live in `$GIT_DIR/objects/`. -[[def_object_identifier]]object identifier (oid):: - Synonym for <>. +[[def_object_identifier]]object identifier, object ID, oid:: + Synonyms for <>. [[def_object_name]]object name:: The unique identifier of an <>. The diff --git a/Documentation/meson.build b/Documentation/meson.build index e34965c5b0e236..ace0573e8272e0 100644 --- a/Documentation/meson.build +++ b/Documentation/meson.build @@ -192,6 +192,7 @@ manpages = { 'gitcore-tutorial.adoc' : 7, 'gitcredentials.adoc' : 7, 'gitcvs-migration.adoc' : 7, + 'gitdatamodel.adoc' : 7, 'gitdiffcore.adoc' : 7, 'giteveryday.adoc' : 7, 'gitfaq.adoc' : 7,