Skip to content

Commit b757353

Browse files
ttaylorrgitster
authored andcommitted
builtin/pack-objects.c: --cruft without expiration
Teach `pack-objects` how to generate a cruft pack when no objects are dropped (i.e., `--cruft-expiration=never`). Later patches will teach `pack-objects` how to generate a cruft pack that prunes objects. When generating a cruft pack which does not prune objects, we want to collect all unreachable objects into a single pack (noting and updating their mtimes as we accumulate them). Ordinary use will pass the result of a `git repack -A` as a kept pack, so when this patch says "kept pack", readers should think "reachable objects". Generating a non-expiring cruft packs works as follows: - Callers provide a list of every pack they know about, and indicate which packs are about to be removed. - All packs which are going to be removed (we'll call these the redundant ones) are marked as kept in-core. Any packs the caller did not mention (but are known to the `pack-objects` process) are also marked as kept in-core. Packs not mentioned by the caller are assumed to be unknown to them, i.e., they entered the repository after the caller decided which packs should be kept and which should be discarded. Since we do not want to include objects in these "unknown" packs (because we don't know which of their objects are or aren't reachable), these are also marked as kept in-core. - Then, we enumerate all objects in the repository, and add them to our packing list if they do not appear in an in-core kept pack. This results in a new cruft pack which contains all known objects that aren't included in the kept packs. When the kept pack is the result of `git repack -A`, the resulting pack contains all unreachable objects. Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent fa23090 commit b757353

File tree

5 files changed

+448
-5
lines changed

5 files changed

+448
-5
lines changed

Documentation/git-pack-objects.txt

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ SYNOPSIS
1313
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
1414
[--local] [--incremental] [--window=<n>] [--depth=<n>]
1515
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
16+
[--cruft] [--cruft-expiration=<time>]
1617
[--stdout [--filter=<filter-spec>] | <base-name>]
1718
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
1819

@@ -95,6 +96,35 @@ base-name::
9596
Incompatible with `--revs`, or options that imply `--revs` (such as
9697
`--all`), with the exception of `--unpacked`, which is compatible.
9798

99+
--cruft::
100+
Packs unreachable objects into a separate "cruft" pack, denoted
101+
by the existence of a `.mtimes` file. Typically used by `git
102+
repack --cruft`. Callers provide a list of pack names and
103+
indicate which packs will remain in the repository, along with
104+
which packs will be deleted (indicated by the `-` prefix). The
105+
contents of the cruft pack are all objects not contained in the
106+
surviving packs which have not exceeded the grace period (see
107+
`--cruft-expiration` below), or which have exceeded the grace
108+
period, but are reachable from an other object which hasn't.
109+
+
110+
When the input lists a pack containing all reachable objects (and lists
111+
all other packs as pending deletion), the corresponding cruft pack will
112+
contain all unreachable objects (with mtime newer than the
113+
`--cruft-expiration`) along with any unreachable objects whose mtime is
114+
older than the `--cruft-expiration`, but are reachable from an
115+
unreachable object whose mtime is newer than the `--cruft-expiration`).
116+
+
117+
Incompatible with `--unpack-unreachable`, `--keep-unreachable`,
118+
`--pack-loose-unreachable`, `--stdin-packs`, as well as any other
119+
options which imply `--revs`. Also incompatible with `--max-pack-size`;
120+
when this option is set, the maximum pack size is not inferred from
121+
`pack.packSizeLimit`.
122+
123+
--cruft-expiration=<approxidate>::
124+
If specified, objects are eliminated from the cruft pack if they
125+
have an mtime older than `<approxidate>`. If unspecified (and
126+
given `--cruft`), then no objects are eliminated.
127+
98128
--window=<n>::
99129
--depth=<n>::
100130
These two options affect how the objects contained in

builtin/pack-objects.c

Lines changed: 197 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
#include "trace2.h"
3737
#include "shallow.h"
3838
#include "promisor-remote.h"
39+
#include "pack-mtimes.h"
3940

4041
/*
4142
* Objects we are going to pack are collected in the `to_pack` structure.
@@ -194,6 +195,8 @@ static int reuse_delta = 1, reuse_object = 1;
194195
static int keep_unreachable, unpack_unreachable, include_tag;
195196
static timestamp_t unpack_unreachable_expiration;
196197
static int pack_loose_unreachable;
198+
static int cruft;
199+
static timestamp_t cruft_expiration;
197200
static int local;
198201
static int have_non_local_packs;
199202
static int incremental;
@@ -1260,6 +1263,9 @@ static void write_pack_file(void)
12601263
&to_pack, written_list, nr_written);
12611264
}
12621265

1266+
if (cruft)
1267+
pack_idx_opts.flags |= WRITE_MTIMES;
1268+
12631269
stage_tmp_packfiles(&tmpname, pack_tmp_name,
12641270
written_list, nr_written,
12651271
&to_pack, &pack_idx_opts, hash,
@@ -3397,6 +3403,135 @@ static void read_packs_list_from_stdin(void)
33973403
string_list_clear(&exclude_packs, 0);
33983404
}
33993405

3406+
static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
3407+
struct packed_git *pack, off_t offset,
3408+
const char *name, uint32_t mtime)
3409+
{
3410+
struct object_entry *entry;
3411+
3412+
display_progress(progress_state, ++nr_seen);
3413+
3414+
entry = packlist_find(&to_pack, oid);
3415+
if (entry) {
3416+
if (name) {
3417+
entry->hash = pack_name_hash(name);
3418+
entry->no_try_delta = no_try_delta(name);
3419+
}
3420+
} else {
3421+
if (!want_object_in_pack(oid, 0, &pack, &offset))
3422+
return;
3423+
if (!pack && type == OBJ_BLOB && !has_loose_object(oid)) {
3424+
/*
3425+
* If a traversed tree has a missing blob then we want
3426+
* to avoid adding that missing object to our pack.
3427+
*
3428+
* This only applies to missing blobs, not trees,
3429+
* because the traversal needs to parse sub-trees but
3430+
* not blobs.
3431+
*
3432+
* Note we only perform this check when we couldn't
3433+
* already find the object in a pack, so we're really
3434+
* limited to "ensure non-tip blobs which don't exist in
3435+
* packs do exist via loose objects". Confused?
3436+
*/
3437+
return;
3438+
}
3439+
3440+
entry = create_object_entry(oid, type, pack_name_hash(name),
3441+
0, name && no_try_delta(name),
3442+
pack, offset);
3443+
}
3444+
3445+
if (mtime > oe_cruft_mtime(&to_pack, entry))
3446+
oe_set_cruft_mtime(&to_pack, entry, mtime);
3447+
return;
3448+
}
3449+
3450+
static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep)
3451+
{
3452+
struct string_list_item *item = NULL;
3453+
for_each_string_list_item(item, packs) {
3454+
struct packed_git *p = item->util;
3455+
if (!p)
3456+
die(_("could not find pack '%s'"), item->string);
3457+
p->pack_keep_in_core = keep;
3458+
}
3459+
}
3460+
3461+
static void add_unreachable_loose_objects(void);
3462+
static void add_objects_in_unpacked_packs(void);
3463+
3464+
static void enumerate_cruft_objects(void)
3465+
{
3466+
if (progress)
3467+
progress_state = start_progress(_("Enumerating cruft objects"), 0);
3468+
3469+
add_objects_in_unpacked_packs();
3470+
add_unreachable_loose_objects();
3471+
3472+
stop_progress(&progress_state);
3473+
}
3474+
3475+
static void read_cruft_objects(void)
3476+
{
3477+
struct strbuf buf = STRBUF_INIT;
3478+
struct string_list discard_packs = STRING_LIST_INIT_DUP;
3479+
struct string_list fresh_packs = STRING_LIST_INIT_DUP;
3480+
struct packed_git *p;
3481+
3482+
ignore_packed_keep_in_core = 1;
3483+
3484+
while (strbuf_getline(&buf, stdin) != EOF) {
3485+
if (!buf.len)
3486+
continue;
3487+
3488+
if (*buf.buf == '-')
3489+
string_list_append(&discard_packs, buf.buf + 1);
3490+
else
3491+
string_list_append(&fresh_packs, buf.buf);
3492+
strbuf_reset(&buf);
3493+
}
3494+
3495+
string_list_sort(&discard_packs);
3496+
string_list_sort(&fresh_packs);
3497+
3498+
for (p = get_all_packs(the_repository); p; p = p->next) {
3499+
const char *pack_name = pack_basename(p);
3500+
struct string_list_item *item;
3501+
3502+
item = string_list_lookup(&fresh_packs, pack_name);
3503+
if (!item)
3504+
item = string_list_lookup(&discard_packs, pack_name);
3505+
3506+
if (item) {
3507+
item->util = p;
3508+
} else {
3509+
/*
3510+
* This pack wasn't mentioned in either the "fresh" or
3511+
* "discard" list, so the caller didn't know about it.
3512+
*
3513+
* Mark it as kept so that its objects are ignored by
3514+
* add_unseen_recent_objects_to_traversal(). We'll
3515+
* unmark it before starting the traversal so it doesn't
3516+
* halt the traversal early.
3517+
*/
3518+
p->pack_keep_in_core = 1;
3519+
}
3520+
}
3521+
3522+
mark_pack_kept_in_core(&fresh_packs, 1);
3523+
mark_pack_kept_in_core(&discard_packs, 0);
3524+
3525+
if (cruft_expiration)
3526+
die("--cruft-expiration not yet implemented");
3527+
else
3528+
enumerate_cruft_objects();
3529+
3530+
strbuf_release(&buf);
3531+
string_list_clear(&discard_packs, 0);
3532+
string_list_clear(&fresh_packs, 0);
3533+
}
3534+
34003535
static void read_object_list_from_stdin(void)
34013536
{
34023537
char line[GIT_MAX_HEXSZ + 1 + PATH_MAX + 2];
@@ -3529,7 +3664,24 @@ static int add_object_in_unpacked_pack(const struct object_id *oid,
35293664
uint32_t pos,
35303665
void *_data)
35313666
{
3532-
add_object_entry(oid, OBJ_NONE, "", 0);
3667+
if (cruft) {
3668+
off_t offset;
3669+
time_t mtime;
3670+
3671+
if (pack->is_cruft) {
3672+
if (load_pack_mtimes(pack) < 0)
3673+
die(_("could not load cruft pack .mtimes"));
3674+
mtime = nth_packed_mtime(pack, pos);
3675+
} else {
3676+
mtime = pack->mtime;
3677+
}
3678+
offset = nth_packed_object_offset(pack, pos);
3679+
3680+
add_cruft_object_entry(oid, OBJ_NONE, pack, offset,
3681+
NULL, mtime);
3682+
} else {
3683+
add_object_entry(oid, OBJ_NONE, "", 0);
3684+
}
35333685
return 0;
35343686
}
35353687

@@ -3553,7 +3705,19 @@ static int add_loose_object(const struct object_id *oid, const char *path,
35533705
return 0;
35543706
}
35553707

3556-
add_object_entry(oid, type, "", 0);
3708+
if (cruft) {
3709+
struct stat st;
3710+
if (stat(path, &st) < 0) {
3711+
if (errno == ENOENT)
3712+
return 0;
3713+
return error_errno("unable to stat %s", oid_to_hex(oid));
3714+
}
3715+
3716+
add_cruft_object_entry(oid, type, NULL, 0, NULL,
3717+
st.st_mtime);
3718+
} else {
3719+
add_object_entry(oid, type, "", 0);
3720+
}
35573721
return 0;
35583722
}
35593723

@@ -3870,6 +4034,20 @@ static int option_parse_unpack_unreachable(const struct option *opt,
38704034
return 0;
38714035
}
38724036

4037+
static int option_parse_cruft_expiration(const struct option *opt,
4038+
const char *arg, int unset)
4039+
{
4040+
if (unset) {
4041+
cruft = 0;
4042+
cruft_expiration = 0;
4043+
} else {
4044+
cruft = 1;
4045+
if (arg)
4046+
cruft_expiration = approxidate(arg);
4047+
}
4048+
return 0;
4049+
}
4050+
38734051
struct po_filter_data {
38744052
unsigned have_revs:1;
38754053
struct rev_info revs;
@@ -3959,6 +4137,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
39594137
OPT_CALLBACK_F(0, "unpack-unreachable", NULL, N_("time"),
39604138
N_("unpack unreachable objects newer than <time>"),
39614139
PARSE_OPT_OPTARG, option_parse_unpack_unreachable),
4140+
OPT_BOOL(0, "cruft", &cruft, N_("create a cruft pack")),
4141+
OPT_CALLBACK_F(0, "cruft-expiration", NULL, N_("time"),
4142+
N_("expire cruft objects older than <time>"),
4143+
PARSE_OPT_OPTARG, option_parse_cruft_expiration),
39624144
OPT_BOOL(0, "sparse", &sparse,
39634145
N_("use the sparse reachability algorithm")),
39644146
OPT_BOOL(0, "thin", &thin,
@@ -4085,7 +4267,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
40854267

40864268
if (!HAVE_THREADS && delta_search_threads != 1)
40874269
warning(_("no threads support, ignoring --threads"));
4088-
if (!pack_to_stdout && !pack_size_limit)
4270+
if (!pack_to_stdout && !pack_size_limit && !cruft)
40894271
pack_size_limit = pack_size_limit_cfg;
40904272
if (pack_to_stdout && pack_size_limit)
40914273
die(_("--max-pack-size cannot be used to build a pack for transfer"));
@@ -4112,6 +4294,15 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
41124294
if (stdin_packs && use_internal_rev_list)
41134295
die(_("cannot use internal rev list with --stdin-packs"));
41144296

4297+
if (cruft) {
4298+
if (use_internal_rev_list)
4299+
die(_("cannot use internal rev list with --cruft"));
4300+
if (stdin_packs)
4301+
die(_("cannot use --stdin-packs with --cruft"));
4302+
if (pack_size_limit)
4303+
die(_("cannot use --max-pack-size with --cruft"));
4304+
}
4305+
41154306
/*
41164307
* "soft" reasons not to use bitmaps - for on-disk repack by default we want
41174308
*
@@ -4168,14 +4359,16 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
41684359
the_repository);
41694360
prepare_packing_data(the_repository, &to_pack);
41704361

4171-
if (progress)
4362+
if (progress && !cruft)
41724363
progress_state = start_progress(_("Enumerating objects"), 0);
41734364
if (stdin_packs) {
41744365
/* avoids adding objects in excluded packs */
41754366
ignore_packed_keep_in_core = 1;
41764367
read_packs_list_from_stdin();
41774368
if (rev_list_unpacked)
41784369
add_unreachable_loose_objects();
4370+
} else if (cruft) {
4371+
read_cruft_objects();
41794372
} else if (!use_internal_rev_list) {
41804373
read_object_list_from_stdin();
41814374
} else if (pfd.have_revs) {

object-file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -997,7 +997,7 @@ int has_loose_object_nonlocal(const struct object_id *oid)
997997
return check_and_freshen_nonlocal(oid, 0);
998998
}
999999

1000-
static int has_loose_object(const struct object_id *oid)
1000+
int has_loose_object(const struct object_id *oid)
10011001
{
10021002
return check_and_freshen(oid, 0);
10031003
}

object-store.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,8 @@ int repo_has_object_file_with_flags(struct repository *r,
339339
*/
340340
int has_loose_object_nonlocal(const struct object_id *);
341341

342+
int has_loose_object(const struct object_id *);
343+
342344
/**
343345
* format_object_header() is a thin wrapper around s xsnprintf() that
344346
* writes the initial "<type> <obj-len>" part of the loose object

0 commit comments

Comments
 (0)