Skip to content

Commit 5143ac0

Browse files
jhattongfsgitster
authored andcommitted
Prevent git from rehashing 4GiB files
The index stores file sizes using a uint32_t. This causes any file that is a multiple of 2^32 to have a cached file size of zero. Zero is a special value used by racily clean. This causes git to rehash every file that is a multiple of 2^32 every time git status or git commit is run. This patch mitigates the problem by making all files that are a multiple of 2^32 appear to have a size of 1<<31 instead of zero. The value of 1<<31 is chosen to keep it as far away from zero as possible to help prevent things getting mixed up with unpatched versions of git. An example would be to have a 2^32 sized file in the index of patched git. Patched git would save the file as 2^31 in the cache. An unpatched git would very much see the file has changed in size and force it to rehash the file, which is safe. The file would have to grow or shrink by exactly 2^31 and retain all of its ctime, mtime, and other attributes for old git to not notice the change. This patch does not change the behavior of any file that is not an exact multiple of 2^32. Signed-off-by: Jason D. Hatton <[email protected]> Signed-off-by: brian m. carlson <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 678eb55 commit 5143ac0

File tree

2 files changed

+34
-2
lines changed

2 files changed

+34
-2
lines changed

statinfo.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,22 @@
22
#include "environment.h"
33
#include "statinfo.h"
44

5+
/*
6+
* Munge st_size into an unsigned int.
7+
*/
8+
static unsigned int munge_st_size(off_t st_size) {
9+
unsigned int sd_size = st_size;
10+
11+
/*
12+
* If the file is an exact multiple of 4 GiB, modify the value so it
13+
* doesn't get marked as racily clean (zero).
14+
*/
15+
if (!sd_size && st_size)
16+
return 0x80000000;
17+
else
18+
return sd_size;
19+
}
20+
521
void fill_stat_data(struct stat_data *sd, struct stat *st)
622
{
723
sd->sd_ctime.sec = (unsigned int)st->st_ctime;
@@ -12,7 +28,7 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
1228
sd->sd_ino = st->st_ino;
1329
sd->sd_uid = st->st_uid;
1430
sd->sd_gid = st->st_gid;
15-
sd->sd_size = st->st_size;
31+
sd->sd_size = munge_st_size(st->st_size);
1632
}
1733

1834
int match_stat_data(const struct stat_data *sd, struct stat *st)
@@ -51,7 +67,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
5167
changed |= INODE_CHANGED;
5268
#endif
5369

54-
if (sd->sd_size != (unsigned int) st->st_size)
70+
if (sd->sd_size != munge_st_size(st->st_size))
5571
changed |= DATA_CHANGED;
5672

5773
return changed;

t/t7508-status.sh

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1745,4 +1745,20 @@ test_expect_success 'slow status advice when core.untrackedCache true, and fsmon
17451745
)
17461746
'
17471747

1748+
test_expect_success EXPENSIVE 'status does not re-read unchanged 4 or 8 GiB file' '
1749+
(
1750+
mkdir large-file &&
1751+
cd large-file &&
1752+
# Files are 2 GiB, 4 GiB, and 8 GiB sparse files.
1753+
test-tool truncate file-a 0x080000000 &&
1754+
test-tool truncate file-b 0x100000000 &&
1755+
test-tool truncate file-c 0x200000000 &&
1756+
# This will be slow.
1757+
git add file-a file-b file-c &&
1758+
git commit -m "add large files" &&
1759+
git diff-index HEAD file-a file-b file-c >actual &&
1760+
test_must_be_empty actual
1761+
)
1762+
'
1763+
17481764
test_done

0 commit comments

Comments
 (0)