Skip to content

Commit 60e779a

Browse files
committed
Merge branch 'jk/pack-corruption-post-mortem'
* jk/pack-corruption-post-mortem: howto: add article on recovering a corrupted object
2 parents c167b76 + 41dfbb2 commit 60e779a

File tree

2 files changed

+243
-0
lines changed

2 files changed

+243
-0
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ SP_ARTICLES += howto/setup-git-server-over-http
5353
SP_ARTICLES += howto/separating-topic-branches
5454
SP_ARTICLES += howto/revert-a-faulty-merge
5555
SP_ARTICLES += howto/recover-corrupted-blob-object
56+
SP_ARTICLES += howto/recover-corrupted-object-harder
5657
SP_ARTICLES += howto/rebuild-from-update-hook
5758
SP_ARTICLES += howto/rebase-from-internal-branch
5859
SP_ARTICLES += howto/maintain-git
Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
Date: Wed, 16 Oct 2013 04:34:01 -0400
2+
From: Jeff King <[email protected]>
3+
Subject: pack corruption post-mortem
4+
Abstract: Recovering a corrupted object when no good copy is available.
5+
Content-type: text/asciidoc
6+
7+
How to recover an object from scratch
8+
=====================================
9+
10+
I was recently presented with a repository with a corrupted packfile,
11+
and was asked if the data was recoverable. This post-mortem describes
12+
the steps I took to investigate and fix the problem. I thought others
13+
might find the process interesting, and it might help somebody in the
14+
same situation.
15+
16+
********************************
17+
Note: In this case, no good copy of the repository was available. For
18+
the much easier case where you can get the corrupted object from
19+
elsewhere, see link:recover-corrupted-blob-object.html[this howto].
20+
********************************
21+
22+
I started with an fsck, which found a problem with exactly one object
23+
(I've used $pack and $obj below to keep the output readable, and also
24+
because I'll refer to them later):
25+
26+
-----------
27+
$ git fsck
28+
error: $pack SHA1 checksum mismatch
29+
error: index CRC mismatch for object $obj from $pack at offset 51653873
30+
error: inflate: data stream error (incorrect data check)
31+
error: cannot unpack $obj from $pack at offset 51653873
32+
-----------
33+
34+
The pack checksum failing means a byte is munged somewhere, and it is
35+
presumably in the object mentioned (since both the index checksum and
36+
zlib were failing).
37+
38+
Reading the zlib source code, I found that "incorrect data check" means
39+
that the adler-32 checksum at the end of the zlib data did not match the
40+
inflated data. So stepping the data through zlib would not help, as it
41+
did not fail until the very end, when we realize the crc does not match.
42+
The problematic bytes could be anywhere in the object data.
43+
44+
The first thing I did was pull the broken data out of the packfile. I
45+
needed to know how big the object was, which I found out with:
46+
47+
------------
48+
$ git show-index <$idx | cut -d' ' -f1 | sort -n | grep -A1 51653873
49+
51653873
50+
51664736
51+
------------
52+
53+
Show-index gives us the list of objects and their offsets. We throw away
54+
everything but the offsets, and then sort them so that our interesting
55+
offset (which we got from the fsck output above) is followed immediately
56+
by the offset of the next object. Now we know that the object data is
57+
10863 bytes long, and we can grab it with:
58+
59+
------------
60+
dd if=$pack of=object bs=1 skip=51653873 count=10863
61+
------------
62+
63+
I inspected a hexdump of the data, looking for any obvious bogosity
64+
(e.g., a 4K run of zeroes would be a good sign of filesystem
65+
corruption). But everything looked pretty reasonable.
66+
67+
Note that the "object" file isn't fit for feeding straight to zlib; it
68+
has the git packed object header, which is variable-length. We want to
69+
strip that off so we can start playing with the zlib data directly. You
70+
can either work your way through it manually (the format is described in
71+
link:../technical/pack-format.html[Documentation/technical/pack-format.txt]),
72+
or you can walk through it in a debugger. I did the latter, creating a
73+
valid pack like:
74+
75+
------------
76+
# pack magic and version
77+
printf 'PACK\0\0\0\2' >tmp.pack
78+
# pack has one object
79+
printf '\0\0\0\1' >>tmp.pack
80+
# now add our object data
81+
cat object >>tmp.pack
82+
# and then append the pack trailer
83+
/path/to/git.git/test-sha1 -b <tmp.pack >trailer
84+
cat trailer >>tmp.pack
85+
------------
86+
87+
and then running "git index-pack tmp.pack" in the debugger (stop at
88+
unpack_raw_entry). Doing this, I found that there were 3 bytes of header
89+
(and the header itself had a sane type and size). So I stripped those
90+
off with:
91+
92+
------------
93+
dd if=object of=zlib bs=1 skip=3
94+
------------
95+
96+
I ran the result through zlib's inflate using a custom C program. And
97+
while it did report the error, I did get the right number of output
98+
bytes (i.e., it matched git's size header that we decoded above). But
99+
feeding the result back to "git hash-object" didn't produce the same
100+
sha1. So there were some wrong bytes, but I didn't know which. The file
101+
happened to be C source code, so I hoped I could notice something
102+
obviously wrong with it, but I didn't. I even got it to compile!
103+
104+
I also tried comparing it to other versions of the same path in the
105+
repository, hoping that there would be some part of the diff that didn't
106+
make sense. Unfortunately, this happened to be the only revision of this
107+
particular file in the repository, so I had nothing to compare against.
108+
109+
So I took a different approach. Working under the guess that the
110+
corruption was limited to a single byte, I wrote a program to munge each
111+
byte individually, and try inflating the result. Since the object was
112+
only 10K compressed, that worked out to about 2.5M attempts, which took
113+
a few minutes.
114+
115+
The program I used is here:
116+
117+
----------------------------------------------
118+
#include <stdio.h>
119+
#include <unistd.h>
120+
#include <string.h>
121+
#include <signal.h>
122+
#include <zlib.h>
123+
124+
static int try_zlib(unsigned char *buf, int len)
125+
{
126+
/* make this absurdly large so we don't have to loop */
127+
static unsigned char out[1024*1024];
128+
z_stream z;
129+
int ret;
130+
131+
memset(&z, 0, sizeof(z));
132+
inflateInit(&z);
133+
134+
z.next_in = buf;
135+
z.avail_in = len;
136+
z.next_out = out;
137+
z.avail_out = sizeof(out);
138+
139+
ret = inflate(&z, 0);
140+
inflateEnd(&z);
141+
return ret >= 0;
142+
}
143+
144+
/* eye candy */
145+
static int counter = 0;
146+
static void progress(int sig)
147+
{
148+
fprintf(stderr, "\r%d", counter);
149+
alarm(1);
150+
}
151+
152+
int main(void)
153+
{
154+
/* oversized so we can read the whole buffer in */
155+
unsigned char buf[1024*1024];
156+
int len;
157+
unsigned i, j;
158+
159+
signal(SIGALRM, progress);
160+
alarm(1);
161+
162+
len = read(0, buf, sizeof(buf));
163+
for (i = 0; i < len; i++) {
164+
unsigned char c = buf[i];
165+
for (j = 0; j <= 0xff; j++) {
166+
buf[i] = j;
167+
168+
counter++;
169+
if (try_zlib(buf, len))
170+
printf("i=%d, j=%x\n", i, j);
171+
}
172+
buf[i] = c;
173+
}
174+
175+
alarm(0);
176+
fprintf(stderr, "\n");
177+
return 0;
178+
}
179+
----------------------------------------------
180+
181+
I compiled and ran with:
182+
183+
-------
184+
gcc -Wall -Werror -O3 munge.c -o munge -lz
185+
./munge <zlib
186+
-------
187+
188+
189+
There were a few false positives early on (if you write "no data" in the
190+
zlib header, zlib thinks it's just fine :) ). But I got a hit about
191+
halfway through:
192+
193+
-------
194+
i=5642, j=c7
195+
-------
196+
197+
I let it run to completion, and got a few more hits at the end (where it
198+
was munging the crc to match our broken data). So there was a good
199+
chance this middle hit was the source of the problem.
200+
201+
I confirmed by tweaking the byte in a hex editor, zlib inflating the
202+
result (no errors!), and then piping the output into "git hash-object",
203+
which reported the sha1 of the broken object. Success!
204+
205+
I fixed the packfile itself with:
206+
207+
-------
208+
chmod +w $pack
209+
printf '\xc7' | dd of=$pack bs=1 seek=51659518 conv=notrunc
210+
chmod -w $pack
211+
-------
212+
213+
The `\xc7` comes from the replacement byte our "munge" program found.
214+
The offset 51659518 is derived by taking the original object offset
215+
(51653873), adding the replacement offset found by "munge" (5642), and
216+
then adding back in the 3 bytes of git header we stripped.
217+
218+
After that, "git fsck" ran clean.
219+
220+
As for the corruption itself, I was lucky that it was indeed a single
221+
byte. In fact, it turned out to be a single bit. The byte 0xc7 was
222+
corrupted to 0xc5. So presumably it was caused by faulty hardware, or a
223+
cosmic ray.
224+
225+
And the aborted attempt to look at the inflated output to see what was
226+
wrong? I could have looked forever and never found it. Here's the diff
227+
between what the corrupted data inflates to, versus the real data:
228+
229+
--------------
230+
- cp = strtok (arg, "+");
231+
+ cp = strtok (arg, ".");
232+
--------------
233+
234+
It tweaked one byte and still ended up as valid, readable C that just
235+
happened to do something totally different! One takeaway is that on a
236+
less unlucky day, looking at the zlib output might have actually been
237+
helpful, as most random changes would actually break the C code.
238+
239+
But more importantly, git's hashing and checksumming noticed a problem
240+
that easily could have gone undetected in another system. The result
241+
still compiled, but would have caused an interesting bug (that would
242+
have been blamed on some random commit).

0 commit comments

Comments
 (0)