Skip to content

Commit 519e17f

Browse files
ttaylorrgitster
authored andcommitted
pack-bitmap: prepare to mark objects from multiple packs for reuse
Now that the pack-objects code is equipped to handle reusing objects from multiple packs, prepare the pack-bitmap code to mark objects from multiple packs as reuse candidates. In order to prepare the pack-bitmap code for this change, remove the same set of assumptions we unwound in previous commits from the helper function `reuse_partial_packfile_from_bitmap_1()`, in preparation for it to be called in a loop over the set of bitmapped packs in a following commit. Most importantly, we can no longer assume that the bit position corresponding to the first object in a given reuse pack candidate is at the beginning of the bitmap itself. For the single pack that this assumption is still true for (in MIDX bitmaps, this is the preferred pack, in single-pack bitmaps it is the pack the bitmap is tied to), we can still use our whole-words optimization. But for all subsequent packs, we can not make use of this optimization, since it assumes that all delta bases are being sent from the same pack, which would break if we are sending OFS_DELTAs down to the client. To understand why, consider two packs, P1 and P2 where: - P1 has object A which is a delta on base B - P2 has its own copy of B, in addition to other objects Suppose that the MIDX which covers P1 and P2 selected its copy of A from P1, but selected its copy of B from P2. Since A is a delta of B, but the base was selected from a different pack, sending the bytes corresponding to A as an OFS_DELTA verbatim from P1 would be incorrect, since we don't guarantee that B is in the same place relative to A in the generated pack as in P1. For now, we detect and reject these cross-pack deltas by searching for the (pack_id, offset) pair for the delta's base object (using the same pack_id as the pack containing the delta'd object) in the MIDX. If we find a match, that means that the MIDX did indeed pick the base object from the same pack, and we are OK to reuse the delta. If we don't find a match, however, that means that the base object was selected from a different pack in the MIDX, and we can let the slower path handle re-delta'ing our candidate object. In the future, there are a couple of other things we could do, namely: - Turn any cross-pack deltas (which are stored as OFS_DELTAs) into REF_DELTAs. We already do this today when reusing an OFS_DELTA without `--delta-base-offset` enabled, so it's not a huge stretch to do the same for cross-pack deltas even when `--delta-base-offset` is enabled. This would work, but would obviously result in larger-than-necessary packs, as we in theory *could* represent these cross-pack deltas by patching an existing OFS_DELTA. But it's not clear how much that would matter in practice. I suspect it would have a lot to do with how you pack your repository in the first place. - Finally, we could patch OFS_DELTAs across packs in a similar fashion as we do today for OFS_DELTAs within a single pack on either side of a gap. This would result in the smallest packs of the three options here, but implementing this would be more involved. At minimum, you'd have to keep the reusable chunks list for all reused packs, not just the one we're currently processing. And you'd have to ensure that any bases which are a part of cross-pack deltas appear before the delta. I think this is possible to do, but would require assembling the reusable chunks list potentially in a different order than they appear in the source packs. For now, let's pursue the simplest approach and reject any cross-pack deltas. Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent dbd5c52 commit 519e17f

File tree

1 file changed

+106
-66
lines changed

1 file changed

+106
-66
lines changed

pack-bitmap.c

Lines changed: 106 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1841,49 +1841,29 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
18411841
* -1 means "stop trying further objects"; 0 means we may or may not have
18421842
* reused, but you can keep feeding bits.
18431843
*/
1844-
static int try_partial_reuse(struct bitmapped_pack *pack,
1845-
size_t pos,
1844+
static int try_partial_reuse(struct bitmap_index *bitmap_git,
1845+
struct bitmapped_pack *pack,
1846+
size_t bitmap_pos,
1847+
uint32_t pack_pos,
18461848
struct bitmap *reuse,
18471849
struct pack_window **w_curs)
18481850
{
18491851
off_t offset, delta_obj_offset;
18501852
enum object_type type;
18511853
unsigned long size;
18521854

1853-
/*
1854-
* try_partial_reuse() is called either on (a) objects in the
1855-
* bitmapped pack (in the case of a single-pack bitmap) or (b)
1856-
* objects in the preferred pack of a multi-pack bitmap.
1857-
* Importantly, the latter can pretend as if only a single pack
1858-
* exists because:
1859-
*
1860-
* - The first pack->num_objects bits of a MIDX bitmap are
1861-
* reserved for the preferred pack, and
1862-
*
1863-
* - Ties due to duplicate objects are always resolved in
1864-
* favor of the preferred pack.
1865-
*
1866-
* Therefore we do not need to ever ask the MIDX for its copy of
1867-
* an object by OID, since it will always select it from the
1868-
* preferred pack. Likewise, the selected copy of the base
1869-
* object for any deltas will reside in the same pack.
1870-
*
1871-
* This means that we can reuse pos when looking up the bit in
1872-
* the reuse bitmap, too, since bits corresponding to the
1873-
* preferred pack precede all bits from other packs.
1874-
*/
1855+
if (pack_pos >= pack->p->num_objects)
1856+
return -1; /* not actually in the pack */
18751857

1876-
if (pos >= pack->p->num_objects)
1877-
return -1; /* not actually in the pack or MIDX preferred pack */
1878-
1879-
offset = delta_obj_offset = pack_pos_to_offset(pack->p, pos);
1858+
offset = delta_obj_offset = pack_pos_to_offset(pack->p, pack_pos);
18801859
type = unpack_object_header(pack->p, w_curs, &offset, &size);
18811860
if (type < 0)
18821861
return -1; /* broken packfile, punt */
18831862

18841863
if (type == OBJ_REF_DELTA || type == OBJ_OFS_DELTA) {
18851864
off_t base_offset;
18861865
uint32_t base_pos;
1866+
uint32_t base_bitmap_pos;
18871867

18881868
/*
18891869
* Find the position of the base object so we can look it up
@@ -1897,20 +1877,44 @@ static int try_partial_reuse(struct bitmapped_pack *pack,
18971877
delta_obj_offset);
18981878
if (!base_offset)
18991879
return 0;
1900-
if (offset_to_pack_pos(pack->p, base_offset, &base_pos) < 0)
1901-
return 0;
19021880

1903-
/*
1904-
* We assume delta dependencies always point backwards. This
1905-
* lets us do a single pass, and is basically always true
1906-
* due to the way OFS_DELTAs work. You would not typically
1907-
* find REF_DELTA in a bitmapped pack, since we only bitmap
1908-
* packs we write fresh, and OFS_DELTA is the default). But
1909-
* let's double check to make sure the pack wasn't written with
1910-
* odd parameters.
1911-
*/
1912-
if (base_pos >= pos)
1913-
return 0;
1881+
offset_to_pack_pos(pack->p, base_offset, &base_pos);
1882+
1883+
if (bitmap_is_midx(bitmap_git)) {
1884+
/*
1885+
* Cross-pack deltas are rejected for now, but could
1886+
* theoretically be supported in the future.
1887+
*
1888+
* We would need to ensure that we're sending both
1889+
* halves of the delta/base pair, regardless of whether
1890+
* or not the two cross a pack boundary. If they do,
1891+
* then we must convert the delta to an REF_DELTA to
1892+
* refer back to the base in the other pack.
1893+
* */
1894+
if (midx_pair_to_pack_pos(bitmap_git->midx,
1895+
pack->pack_int_id,
1896+
base_offset,
1897+
&base_bitmap_pos) < 0) {
1898+
return 0;
1899+
}
1900+
} else {
1901+
if (offset_to_pack_pos(pack->p, base_offset,
1902+
&base_pos) < 0)
1903+
return 0;
1904+
/*
1905+
* We assume delta dependencies always point backwards.
1906+
* This lets us do a single pass, and is basically
1907+
* always true due to the way OFS_DELTAs work. You would
1908+
* not typically find REF_DELTA in a bitmapped pack,
1909+
* since we only bitmap packs we write fresh, and
1910+
* OFS_DELTA is the default). But let's double check to
1911+
* make sure the pack wasn't written with odd
1912+
* parameters.
1913+
*/
1914+
if (base_pos >= pack_pos)
1915+
return 0;
1916+
base_bitmap_pos = pack->bitmap_pos + base_pos;
1917+
}
19141918

19151919
/*
19161920
* And finally, if we're not sending the base as part of our
@@ -1920,14 +1924,14 @@ static int try_partial_reuse(struct bitmapped_pack *pack,
19201924
* to REF_DELTA on the fly. Better to just let the normal
19211925
* object_entry code path handle it.
19221926
*/
1923-
if (!bitmap_get(reuse, pack->bitmap_pos + base_pos))
1927+
if (!bitmap_get(reuse, base_bitmap_pos))
19241928
return 0;
19251929
}
19261930

19271931
/*
19281932
* If we got here, then the object is OK to reuse. Mark it.
19291933
*/
1930-
bitmap_set(reuse, pack->bitmap_pos + pos);
1934+
bitmap_set(reuse, bitmap_pos);
19311935
return 0;
19321936
}
19331937

@@ -1937,36 +1941,72 @@ static void reuse_partial_packfile_from_bitmap_1(struct bitmap_index *bitmap_git
19371941
{
19381942
struct bitmap *result = bitmap_git->result;
19391943
struct pack_window *w_curs = NULL;
1940-
size_t i = 0;
1941-
1942-
while (i < result->word_alloc && result->words[i] == (eword_t)~0)
1943-
i++;
1944-
1945-
/*
1946-
* Don't mark objects not in the packfile or preferred pack. This bitmap
1947-
* marks objects eligible for reuse, but the pack-reuse code only
1948-
* understands how to reuse a single pack. Since the preferred pack is
1949-
* guaranteed to have all bases for its deltas (in a multi-pack bitmap),
1950-
* we use it instead of another pack. In single-pack bitmaps, the choice
1951-
* is made for us.
1952-
*/
1953-
if (i > pack->p->num_objects / BITS_IN_EWORD)
1954-
i = pack->p->num_objects / BITS_IN_EWORD;
1944+
size_t pos = pack->bitmap_pos / BITS_IN_EWORD;
19551945

1956-
memset(reuse->words, 0xFF, i * sizeof(eword_t));
1946+
if (!pack->bitmap_pos) {
1947+
/*
1948+
* If we're processing the first (in the case of a MIDX, the
1949+
* preferred pack) or the only (in the case of single-pack
1950+
* bitmaps) pack, then we can reuse whole words at a time.
1951+
*
1952+
* This is because we know that any deltas in this range *must*
1953+
* have their bases chosen from the same pack, since:
1954+
*
1955+
* - In the single pack case, there is no other pack to choose
1956+
* them from.
1957+
*
1958+
* - In the MIDX case, the first pack is the preferred pack, so
1959+
* all ties are broken in favor of that pack (i.e. the one
1960+
* we're currently processing). So any duplicate bases will be
1961+
* resolved in favor of the pack we're processing.
1962+
*/
1963+
while (pos < result->word_alloc &&
1964+
pos < pack->bitmap_nr / BITS_IN_EWORD &&
1965+
result->words[pos] == (eword_t)~0)
1966+
pos++;
1967+
memset(reuse->words, 0xFF, pos * sizeof(eword_t));
1968+
}
19571969

1958-
for (; i < result->word_alloc; ++i) {
1959-
eword_t word = result->words[i];
1960-
size_t pos = (i * BITS_IN_EWORD);
1970+
for (; pos < result->word_alloc; pos++) {
1971+
eword_t word = result->words[pos];
19611972
size_t offset;
19621973

1963-
for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
1964-
if ((word >> offset) == 0)
1974+
for (offset = 0; offset < BITS_IN_EWORD; offset++) {
1975+
size_t bit_pos;
1976+
uint32_t pack_pos;
1977+
1978+
if (word >> offset == 0)
19651979
break;
19661980

19671981
offset += ewah_bit_ctz64(word >> offset);
1968-
if (try_partial_reuse(pack, pos + offset,
1969-
reuse, &w_curs) < 0) {
1982+
1983+
bit_pos = pos * BITS_IN_EWORD + offset;
1984+
if (bit_pos < pack->bitmap_pos)
1985+
continue;
1986+
if (bit_pos >= pack->bitmap_pos + pack->bitmap_nr)
1987+
goto done;
1988+
1989+
if (bitmap_is_midx(bitmap_git)) {
1990+
uint32_t midx_pos;
1991+
off_t ofs;
1992+
1993+
midx_pos = pack_pos_to_midx(bitmap_git->midx, bit_pos);
1994+
ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
1995+
1996+
if (offset_to_pack_pos(pack->p, ofs, &pack_pos) < 0)
1997+
BUG("could not find object in pack %s "
1998+
"at offset %"PRIuMAX" in MIDX",
1999+
pack_basename(pack->p), (uintmax_t)ofs);
2000+
} else {
2001+
pack_pos = cast_size_t_to_uint32_t(st_sub(bit_pos, pack->bitmap_pos));
2002+
if (pack_pos >= pack->p->num_objects)
2003+
BUG("advanced beyond the end of pack %s (%"PRIuMAX" > %"PRIu32")",
2004+
pack_basename(pack->p), (uintmax_t)pack_pos,
2005+
pack->p->num_objects);
2006+
}
2007+
2008+
if (try_partial_reuse(bitmap_git, pack, bit_pos,
2009+
pack_pos, reuse, &w_curs) < 0) {
19702010
/*
19712011
* try_partial_reuse indicated we couldn't reuse
19722012
* any bits, so there is no point in trying more

0 commit comments

Comments
 (0)