Skip to content

Commit 3f28ba2

Browse files
committed
Merge branch 'zwpaper-10-git-internals-packfiles'
2 parents db878cd + d2a7338 commit 3f28ba2

File tree

2 files changed

+36
-37
lines changed

2 files changed

+36
-37
lines changed

book/10-git-internals/sections/packfiles.asc

Lines changed: 35 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
=== Packfiles
1+
=== 包文件
22

3-
Let's go back to the objects database for your test Git repository.
4-
At this point, you have 11 objects – 4 blobs, 3 trees, 3 commits, and 1 tag:
3+
让我们重新回到示例 Git 版本库的对象数据库。
4+
目前为止,可以看到有 11 个对象——4 个数据对象、3 个树对象、3 个提交对象和 1 个标签对象:
55

66
[source,console]
77
----
@@ -19,9 +19,9 @@ $ find .git/objects -type f
1919
.git/objects/fd/f4fc3344e67ab068f836878b6c4951e3b15f3d # commit 1
2020
----
2121

22-
Git compresses the contents of these files with zlib, and you're not storing much, so all these files collectively take up only 925 bytes.
23-
You'll add some larger content to the repository to demonstrate an interesting feature of Git.
24-
To demonstrate, we'll add the `repo.rb` file from the Grit library – this is about a 22K source code file:
22+
Git 使用 zlib 压缩这些文件的内容,而且我们并没有存储太多东西,所以上文中的文件一共只占用了 925 字节。
23+
接下来,我们会指引你添加一些大文件到版本库中,以此展示 Git 的一个很有趣的功能。
24+
为了便于展示,我们要把之前在 Grit 库中用到过的 `repo.rb` 文件添加进来——这是一个大小约为 22K 的源代码文件:
2525

2626
[source,console]
2727
----
@@ -35,7 +35,7 @@ $ git commit -m 'added repo.rb'
3535
rewrite test.txt (100%)
3636
----
3737

38-
If you look at the resulting tree, you can see the SHA-1 value your repo.rb file got for the blob object:
38+
如果你查看生成的树对象,可以看到 repo.rb 文件对应的数据对象的 SHA-1 值:
3939

4040
[source,console]
4141
----
@@ -45,15 +45,15 @@ $ git cat-file -p master^{tree}
4545
100644 blob e3f094f522629ae358806b17daf78246c27c007b test.txt
4646
----
4747

48-
You can then use `git cat-file` to see how big that object is:
48+
接下来你可以使用 `git cat-file` 命令查看这个对象有多大:
4949

5050
[source,console]
5151
----
5252
$ git cat-file -s 033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5
5353
22044
5454
----
5555

56-
Now, modify that file a little, and see what happens:
56+
现在,稍微修改这个文件,然后看看会发生什么:
5757

5858
[source,console]
5959
----
@@ -63,7 +63,7 @@ $ git commit -am 'modified repo a bit'
6363
1 file changed, 1 insertion(+)
6464
----
6565

66-
Check the tree created by that commit, and you see something interesting:
66+
查看这个提交生成的树对象,你会看到一些有趣的东西:
6767

6868
[source,console]
6969
----
@@ -73,22 +73,22 @@ $ git cat-file -p master^{tree}
7373
100644 blob e3f094f522629ae358806b17daf78246c27c007b test.txt
7474
----
7575

76-
The blob is now a different blob, which means that although you added only a single line to the end of a 400-line file, Git stored that new content as a completely new object:
76+
repo.rb 对应一个与之前完全不同的数据对象,这意味着,虽然你只是在一个 400 行的文件后面加入一行新内容,Git 也会用一个全新的对象来存储新的文件内容:
7777

7878
[source,console]
7979
----
8080
$ git cat-file -s b042a60ef7dff760008df33cee372b945b6e884e
8181
22054
8282
----
8383

84-
You have two nearly identical 22K objects on your disk.
85-
Wouldn't it be nice if Git could store one of them in full but then the second object only as the delta between it and the first?
84+
你的磁盘上现在有两个几乎完全相同、大小均为 22K 的对象。
85+
如果 Git 只完整保存其中一个,再保存另一个对象与之前版本的差异内容,岂不更好?
8686

87-
It turns out that it can.
88-
The initial format in which Git saves objects on disk is called a ``loose'' object format.
89-
However, occasionally Git packs up several of these objects into a single binary file called a ``packfile'' in order to save space and be more efficient.
90-
Git does this if you have too many loose objects around, if you run the `git gc` command manually, or if you push to a remote server.
91-
To see what happens, you can manually ask Git to pack up the objects by calling the `git gc` command:
87+
事实上 Git 可以那样做。
88+
Git 最初向磁盘中存储对象时所使用的格式被称为“松散(loose)”对象格式。
89+
但是,Git 会时不时地将多个这些对象打包成一个称为“包文件(packfile)”的二进制文件,以节省空间和提高效率。
90+
当版本库中有太多的松散对象,或者你手动执行 `git gc` 命令,或者你向远程服务器执行推送时,Git 都会这样做。
91+
要看到打包过程,你可以手动执行 `git gc` 命令让 Git 对对象进行打包:
9292

9393
[source,console]
9494
----
@@ -100,7 +100,7 @@ Writing objects: 100% (18/18), done.
100100
Total 18 (delta 3), reused 0 (delta 0)
101101
----
102102

103-
If you look in your objects directory, you'll find that most of your objects are gone, and a new pair of files has appeared:
103+
这个时候再查看 objects 目录,你会发现大部分的对象都不见了,与此同时出现了一对新文件:
104104

105105
[source,console]
106106
----
@@ -112,20 +112,19 @@ $ find .git/objects -type f
112112
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack
113113
----
114114

115-
The objects that remain are the blobs that aren't pointed to by any commit – in this case, the ``what is up, doc?''
116-
example and the ``test content'' example blobs you created earlier.
117-
Because you never added them to any commits, they're considered dangling and aren't packed up in your new packfile.
115+
仍保留着的几个对象是未被任何提交记录引用的数据对象——在此例中是你之前创建的“what is up, doc?”和“test content”这两个示例数据对象。
116+
因为你从没将它们添加至任何提交记录中,所以 Git 认为它们是摇摆(dangling)的,不会将它们打包进新生成的包文件中。
118117

119-
The other files are your new packfile and an index.
120-
The packfile is a single file containing the contents of all the objects that were removed from your filesystem.
121-
The index is a file that contains offsets into that packfile so you can quickly seek to a specific object.
122-
What is cool is that although the objects on disk before you ran the `gc` were collectively about 22K in size, the new packfile is only 7K.
123-
You've cut your disk usage by ⅔ by packing your objects.
118+
剩下的文件是新创建的包文件和一个索引。
119+
包文件包含了刚才从文件系统中移除的所有对象的内容。
120+
索引文件包含了包文件的偏移信息,我们通过索引文件就可以快速定位任意一个指定对象。
121+
有意思的是运行 `gc` 命令前磁盘上的对象大小约为 22K,而这个新生成的包文件大小仅有 7K。
122+
通过打包对象减少了 ⅔ 的磁盘占用空间。
124123

125-
How does Git do this?
126-
When Git packs objects, it looks for files that are named and sized similarly, and stores just the deltas from one version of the file to the next.
127-
You can look into the packfile and see what Git did to save space.
128-
The `git verify-pack` plumbing command allows you to see what was packed up:
124+
Git 是如何做到这点的?
125+
Git 打包对象时,会查找命名及大小相近的文件,并只保存文件不同版本之间的差异内容。
126+
你可以查看包文件,观察它是如何节省空间的。
127+
`git verify-pack` 这个底层命令可以让你查看已打包的内容:
129128

130129
[source,console]
131130
----
@@ -156,9 +155,9 @@ chain length = 1: 3 objects
156155
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack: ok
157156
----
158157

159-
Here, the `033b4` blob, which if you remember was the first version of your repo.rb file, is referencing the `b042a` blob, which was the second version of the file.
160-
The third column in the output is the size of the object in the pack, so you can see that `b042a` takes up 22K of the file, but that `033b4` only takes up 9 bytes.
161-
What is also interesting is that the second version of the file is the one that is stored intact, whereas the original version is stored as a delta – this is because you're most likely to need faster access to the most recent version of the file.
158+
此处,`033b4` 这个数据对象(即 repo.rb 文件的第一个版本,如果你还记得的话)引用了数据对象 `b042a`,即该文件的第二个版本。
159+
命令输出内容的第三列显示的是各个对象在包文件中的大小,可以看到 `b042a` 占用了 22K 空间,而 `033b4` 仅占用 9 字节。
160+
同样有趣的地方在于,第二个版本完整保存了文件内容,而原始的版本反而是以差异方式保存的——这是因为大部分情况下需要快速访问文件的最新版本。
162161

163-
The really nice thing about this is that it can be repacked at any time.
164-
Git will occasionally repack your database automatically, always trying to save more space, but you can also manually repack at any time by running `git gc` by hand.
162+
最妙之处是你可以随时重新打包。
163+
Git 时常会自动对仓库进行重新打包以节省空间。当然你也可以随时手动执行 `git gc` 命令来这么做。

status.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@
101101
"sections/environment.asc": 0,
102102
"sections/maintenance.asc": 0,
103103
"sections/objects.asc": 100,
104-
"sections/packfiles.asc": 0,
104+
"sections/packfiles.asc": 100,
105105
"sections/plumbing-porcelain.asc": 100,
106106
"sections/refs.asc": 100,
107107
"sections/refspec.asc": 100,

0 commit comments

Comments
 (0)