-
Notifications
You must be signed in to change notification settings - Fork 157
CodeQL-inspired fixes #1891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeQL-inspired fixes #1891
Changes from 4 commits
767b1e7
c66eaee
5a3a888
8d712a0
80422a5
dff0a3e
7d92e08
a3f6018
077bcab
4dc3e23
7a54005
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2560,6 +2560,7 @@ int cmd_fetch(int argc, | |
if (server_options.nr) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Jeff King wrote (reply to this): On Thu, May 15, 2025 at 01:11:40PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <[email protected]>
>
> As pointed out by CodeQL, it is a potentially dangerous practice to
> store local variables' addresses in non-local structs. Yet this is
> exactly what happens with the `acked_commits` attribute that is used in
> `cmd_fetch()`: The pointer to a local variable is assigned to it.
>
> Now, it is Git's convention that `cmd_*()` functions are essentially
> only returning just before exiting the process, therefore there is
> little danger that this attribute is used after the code flow returns
> from that function.
I was going to say: the real sin here is using a global variable in the
first place, without which gtransport would not survive outside of
cmd_fetch(). But the issue is even worse than that. The acked_commits
variable is inside a conditional block, so the address is stale for the
rest of cmd_fetch(), too!
It doesn't look like we ever examine it after that, but it's hard to
trace, since it's a global. ;)
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index cda6eaf1fd6e..c1a1434c7096 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -2560,6 +2560,7 @@ int cmd_fetch(int argc,
> if (server_options.nr)
> gtransport->server_options = &server_options;
> result = transport_fetch_refs(gtransport, NULL);
> + gtransport->smart_options->acked_commits = NULL;
>
> oidset_iter_init(&acked_commits, &iter);
> while ((oid = oidset_iter_next(&iter)))
Here you unset it within that conditional block, which is the right
spot. Looks good.
-Peff |
||
gtransport->server_options = &server_options; | ||
result = transport_fetch_refs(gtransport, NULL); | ||
gtransport->smart_options->acked_commits = NULL; | ||
|
||
oidset_iter_init(&acked_commits, &iter); | ||
while ((oid = oidset_iter_next(&iter))) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb, | |
const struct commit_graph_opts *opts) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Jeff King wrote (reply to this): On Thu, May 15, 2025 at 01:11:41PM +0000, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <[email protected]>
>
> We do need a context to write the commit graph, but that context is only
> needed during the life time of `commit_graph_write()`, therefore it can
> easily be a stack variable.
Yay. I am in favor of using stack variables when possible as a general
rule.
> diff --git a/commit-graph.c b/commit-graph.c
> index 6394752b0b08..9f0115dac9b5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2509,7 +2509,17 @@ int write_commit_graph(struct object_directory *odb,
> const struct commit_graph_opts *opts)
> {
> struct repository *r = the_repository;
> - struct write_commit_graph_context *ctx;
> + struct write_commit_graph_context ctx = {
> + .r = r,
> + .odb = odb,
> + .append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0,
> + .report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0,
> + .split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0,
> + .opts = opts,
> + .total_bloom_filter_data_size = 0,
> + .write_generation_data = (get_configured_generation_version(r) == 2),
> + .num_generation_data_overflows = 0,
> + };
> uint32_t i;
> int res = 0;
> int replace = 0;
> @@ -2531,17 +2541,6 @@ int write_commit_graph(struct object_directory *odb,
> return 0;
> }
>
> - CALLOC_ARRAY(ctx, 1);
> - ctx->r = r;
> - ctx->odb = odb;
> - ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0;
> - ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0;
> - ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
> - ctx->opts = opts;
> - ctx->total_bloom_filter_data_size = 0;
> - ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> - ctx->num_generation_data_overflows = 0;
OK, this moves the initialization to the top of the function. So to
review this for correctness, we must make sure that we do not change the
values of any of those variables between the two spots (i.e., in the
diff context that is omitted).
Most of it looks fine. Our call to get_configured_generation_version()
now happens earlier, before the call to prepare_repo_settings(). I think
that is OK, because the former calls repo_config_get_int() directly. It
does seem like a potential maintenance problem if that call is ever
rolled into prepare_repo_settings().
So maybe OK, but the smaller change would be to just replace the calloc
with a memset(), and s/->/./ on the subsequent lines.
-Peff There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Junio C Hamano wrote (reply to this): Jeff King <[email protected]> writes:
> So maybe OK, but the smaller change would be to just replace the calloc
> with a memset(), and s/->/./ on the subsequent lines.
True, and it would be a bit easier to merge with other topics in
flight. The .oid member and parameter are both renamed IIUC. |
||
{ | ||
struct repository *r = the_repository; | ||
struct write_commit_graph_context *ctx; | ||
struct write_commit_graph_context ctx = { | ||
.r = r, | ||
.odb = odb, | ||
.append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0, | ||
.report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0, | ||
.split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0, | ||
.opts = opts, | ||
.total_bloom_filter_data_size = 0, | ||
.write_generation_data = (get_configured_generation_version(r) == 2), | ||
.num_generation_data_overflows = 0, | ||
}; | ||
uint32_t i; | ||
int res = 0; | ||
int replace = 0; | ||
|
@@ -2531,32 +2541,21 @@ int write_commit_graph(struct object_directory *odb, | |
return 0; | ||
} | ||
|
||
CALLOC_ARRAY(ctx, 1); | ||
ctx->r = r; | ||
ctx->odb = odb; | ||
ctx->append = flags & COMMIT_GRAPH_WRITE_APPEND ? 1 : 0; | ||
ctx->report_progress = flags & COMMIT_GRAPH_WRITE_PROGRESS ? 1 : 0; | ||
ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0; | ||
ctx->opts = opts; | ||
ctx->total_bloom_filter_data_size = 0; | ||
ctx->write_generation_data = (get_configured_generation_version(r) == 2); | ||
ctx->num_generation_data_overflows = 0; | ||
|
||
bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version; | ||
bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY", | ||
bloom_settings.bits_per_entry); | ||
bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES", | ||
bloom_settings.num_hashes); | ||
bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS", | ||
bloom_settings.max_changed_paths); | ||
ctx->bloom_settings = &bloom_settings; | ||
ctx.bloom_settings = &bloom_settings; | ||
|
||
init_topo_level_slab(&topo_levels); | ||
ctx->topo_levels = &topo_levels; | ||
ctx.topo_levels = &topo_levels; | ||
|
||
prepare_commit_graph(ctx->r); | ||
if (ctx->r->objects->commit_graph) { | ||
struct commit_graph *g = ctx->r->objects->commit_graph; | ||
prepare_commit_graph(ctx.r); | ||
if (ctx.r->objects->commit_graph) { | ||
struct commit_graph *g = ctx.r->objects->commit_graph; | ||
|
||
while (g) { | ||
g->topo_levels = &topo_levels; | ||
|
@@ -2565,15 +2564,15 @@ int write_commit_graph(struct object_directory *odb, | |
} | ||
|
||
if (flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS) | ||
ctx->changed_paths = 1; | ||
ctx.changed_paths = 1; | ||
if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) { | ||
struct commit_graph *g; | ||
|
||
g = ctx->r->objects->commit_graph; | ||
g = ctx.r->objects->commit_graph; | ||
|
||
/* We have changed-paths already. Keep them in the next graph */ | ||
if (g && g->bloom_filter_settings) { | ||
ctx->changed_paths = 1; | ||
ctx.changed_paths = 1; | ||
|
||
/* don't propagate the hash_version unless unspecified */ | ||
if (bloom_settings.hash_version == -1) | ||
|
@@ -2586,116 +2585,114 @@ int write_commit_graph(struct object_directory *odb, | |
|
||
bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1; | ||
|
||
if (ctx->split) { | ||
struct commit_graph *g = ctx->r->objects->commit_graph; | ||
if (ctx.split) { | ||
struct commit_graph *g = ctx.r->objects->commit_graph; | ||
|
||
while (g) { | ||
ctx->num_commit_graphs_before++; | ||
ctx.num_commit_graphs_before++; | ||
g = g->base_graph; | ||
} | ||
|
||
if (ctx->num_commit_graphs_before) { | ||
ALLOC_ARRAY(ctx->commit_graph_filenames_before, ctx->num_commit_graphs_before); | ||
i = ctx->num_commit_graphs_before; | ||
g = ctx->r->objects->commit_graph; | ||
if (ctx.num_commit_graphs_before) { | ||
ALLOC_ARRAY(ctx.commit_graph_filenames_before, ctx.num_commit_graphs_before); | ||
i = ctx.num_commit_graphs_before; | ||
g = ctx.r->objects->commit_graph; | ||
|
||
while (g) { | ||
ctx->commit_graph_filenames_before[--i] = xstrdup(g->filename); | ||
ctx.commit_graph_filenames_before[--i] = xstrdup(g->filename); | ||
g = g->base_graph; | ||
} | ||
} | ||
|
||
if (ctx->opts) | ||
replace = ctx->opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; | ||
if (ctx.opts) | ||
replace = ctx.opts->split_flags & COMMIT_GRAPH_SPLIT_REPLACE; | ||
} | ||
|
||
ctx->approx_nr_objects = repo_approximate_object_count(the_repository); | ||
ctx.approx_nr_objects = repo_approximate_object_count(the_repository); | ||
|
||
if (ctx->append && ctx->r->objects->commit_graph) { | ||
struct commit_graph *g = ctx->r->objects->commit_graph; | ||
if (ctx.append && ctx.r->objects->commit_graph) { | ||
struct commit_graph *g = ctx.r->objects->commit_graph; | ||
for (i = 0; i < g->num_commits; i++) { | ||
struct object_id oid; | ||
oidread(&oid, g->chunk_oid_lookup + st_mult(g->hash_len, i), | ||
the_repository->hash_algo); | ||
oid_array_append(&ctx->oids, &oid); | ||
oid_array_append(&ctx.oids, &oid); | ||
} | ||
} | ||
|
||
if (pack_indexes) { | ||
ctx->order_by_pack = 1; | ||
if ((res = fill_oids_from_packs(ctx, pack_indexes))) | ||
ctx.order_by_pack = 1; | ||
if ((res = fill_oids_from_packs(&ctx, pack_indexes))) | ||
goto cleanup; | ||
} | ||
|
||
if (commits) { | ||
if ((res = fill_oids_from_commits(ctx, commits))) | ||
if ((res = fill_oids_from_commits(&ctx, commits))) | ||
goto cleanup; | ||
} | ||
|
||
if (!pack_indexes && !commits) { | ||
ctx->order_by_pack = 1; | ||
fill_oids_from_all_packs(ctx); | ||
ctx.order_by_pack = 1; | ||
fill_oids_from_all_packs(&ctx); | ||
} | ||
|
||
close_reachable(ctx); | ||
close_reachable(&ctx); | ||
|
||
copy_oids_to_commits(ctx); | ||
copy_oids_to_commits(&ctx); | ||
|
||
if (ctx->commits.nr >= GRAPH_EDGE_LAST_MASK) { | ||
if (ctx.commits.nr >= GRAPH_EDGE_LAST_MASK) { | ||
error(_("too many commits to write graph")); | ||
res = -1; | ||
goto cleanup; | ||
} | ||
|
||
if (!ctx->commits.nr && !replace) | ||
if (!ctx.commits.nr && !replace) | ||
goto cleanup; | ||
|
||
if (ctx->split) { | ||
split_graph_merge_strategy(ctx); | ||
if (ctx.split) { | ||
split_graph_merge_strategy(&ctx); | ||
|
||
if (!replace) | ||
merge_commit_graphs(ctx); | ||
merge_commit_graphs(&ctx); | ||
} else | ||
ctx->num_commit_graphs_after = 1; | ||
ctx.num_commit_graphs_after = 1; | ||
|
||
ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph); | ||
ctx.trust_generation_numbers = validate_mixed_generation_chain(ctx.r->objects->commit_graph); | ||
|
||
compute_topological_levels(ctx); | ||
if (ctx->write_generation_data) | ||
compute_generation_numbers(ctx); | ||
compute_topological_levels(&ctx); | ||
if (ctx.write_generation_data) | ||
compute_generation_numbers(&ctx); | ||
|
||
if (ctx->changed_paths) | ||
compute_bloom_filters(ctx); | ||
if (ctx.changed_paths) | ||
compute_bloom_filters(&ctx); | ||
|
||
res = write_commit_graph_file(ctx); | ||
res = write_commit_graph_file(&ctx); | ||
|
||
if (ctx->changed_paths) | ||
if (ctx.changed_paths) | ||
deinit_bloom_filters(); | ||
|
||
if (ctx->split) | ||
mark_commit_graphs(ctx); | ||
if (ctx.split) | ||
mark_commit_graphs(&ctx); | ||
|
||
expire_commit_graphs(ctx); | ||
expire_commit_graphs(&ctx); | ||
|
||
cleanup: | ||
free(ctx->graph_name); | ||
free(ctx->base_graph_name); | ||
free(ctx->commits.list); | ||
oid_array_clear(&ctx->oids); | ||
free(ctx.graph_name); | ||
free(ctx.base_graph_name); | ||
free(ctx.commits.list); | ||
oid_array_clear(&ctx.oids); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Jeff King wrote (reply to this): On Thu, May 15, 2025 at 01:11:47PM +0000, Johannes Schindelin via GitGitGadget wrote:
> The code is a bit too hard to reason about to fully assess whether the
> `fill_commit_graph_info()` function is called at all after
> `write_commit_graph()` returns (and hence the stack variable
> `topo_levels` goes out of context).
>
> Let's simply make sure that the stack address is no longer used at that
> stage, thereby making the code quite a bit easier to reason about.
Yep, I think this is a good practice in general. If the topo_levels
member is never used outside of writing, I wonder if it could live in a
separate data structure. But that is a much bigger refactor that I don't
think we need to tackle here.
> diff --git a/commit-graph.c b/commit-graph.c
> index 9f0115dac9b5..d052c1bf15c5 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -2683,6 +2683,15 @@ cleanup:
> oid_array_clear(&ctx.oids);
> clear_topo_level_slab(&topo_levels);
>
> + if (ctx.r->objects->commit_graph) {
> + struct commit_graph *g = ctx.r->objects->commit_graph;
> +
> + while (g) {
> + g->topo_levels = NULL;
> + g = g->base_graph;
> + }
> + }
This just clears the pointers to the local variable. Looks good.
-Peff |
||
clear_topo_level_slab(&topo_levels); | ||
|
||
for (i = 0; i < ctx->num_commit_graphs_before; i++) | ||
free(ctx->commit_graph_filenames_before[i]); | ||
free(ctx->commit_graph_filenames_before); | ||
for (i = 0; i < ctx.num_commit_graphs_before; i++) | ||
free(ctx.commit_graph_filenames_before[i]); | ||
free(ctx.commit_graph_filenames_before); | ||
|
||
for (i = 0; i < ctx->num_commit_graphs_after; i++) { | ||
free(ctx->commit_graph_filenames_after[i]); | ||
free(ctx->commit_graph_hash_after[i]); | ||
for (i = 0; i < ctx.num_commit_graphs_after; i++) { | ||
free(ctx.commit_graph_filenames_after[i]); | ||
free(ctx.commit_graph_hash_after[i]); | ||
} | ||
free(ctx->commit_graph_filenames_after); | ||
free(ctx->commit_graph_hash_after); | ||
|
||
free(ctx); | ||
free(ctx.commit_graph_filenames_after); | ||
free(ctx.commit_graph_hash_after); | ||
|
||
return res; | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1780,16 +1780,16 @@ static void send_shallow_info(struct upload_pack_data *data) | |
packet_delim(1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Jeff King wrote (reply to this): On Thu, May 15, 2025 at 01:11:42PM +0000, Johannes Schindelin via GitGitGadget wrote:
> While 3145ea957d (upload-pack: introduce fetch server command,
> 2018-03-15) added support for the `fetch` command, from the server's
> point of view it is an upload, and hence the `enum` should really be
> called `upload_state` instead of `fetch_state`. Likewise, rename its
> values.
>
> This also helps unconfuse CodeQL which would otherwise be at sixes or
> sevens about having _two_ non-local definitions of the same `enum` with
> the same values.
It unconfuses me, too. Nice change.
-Peff |
||
} | ||
|
||
enum fetch_state { | ||
FETCH_PROCESS_ARGS = 0, | ||
FETCH_SEND_ACKS, | ||
FETCH_SEND_PACK, | ||
FETCH_DONE, | ||
enum upload_state { | ||
UPLOAD_PROCESS_ARGS = 0, | ||
UPLOAD_SEND_ACKS, | ||
UPLOAD_SEND_PACK, | ||
UPLOAD_DONE, | ||
}; | ||
|
||
int upload_pack_v2(struct repository *r, struct packet_reader *request) | ||
{ | ||
enum fetch_state state = FETCH_PROCESS_ARGS; | ||
enum upload_state state = UPLOAD_PROCESS_ARGS; | ||
struct upload_pack_data data; | ||
|
||
clear_object_flags(the_repository, ALL_FLAGS); | ||
|
@@ -1798,9 +1798,9 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request) | |
data.use_sideband = LARGE_PACKET_MAX; | ||
get_upload_pack_config(r, &data); | ||
|
||
while (state != FETCH_DONE) { | ||
while (state != UPLOAD_DONE) { | ||
switch (state) { | ||
case FETCH_PROCESS_ARGS: | ||
case UPLOAD_PROCESS_ARGS: | ||
process_args(request, &data); | ||
|
||
if (!data.want_obj.nr && !data.wait_for_done) { | ||
|
@@ -1811,27 +1811,27 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request) | |
* to just send 'have's without 'want's); guess | ||
* they didn't want anything. | ||
*/ | ||
state = FETCH_DONE; | ||
state = UPLOAD_DONE; | ||
} else if (data.seen_haves) { | ||
/* | ||
* Request had 'have' lines, so lets ACK them. | ||
*/ | ||
state = FETCH_SEND_ACKS; | ||
state = UPLOAD_SEND_ACKS; | ||
} else { | ||
/* | ||
* Request had 'want's but no 'have's so we can | ||
* immediately go to construct and send a pack. | ||
*/ | ||
state = FETCH_SEND_PACK; | ||
state = UPLOAD_SEND_PACK; | ||
} | ||
break; | ||
case FETCH_SEND_ACKS: | ||
case UPLOAD_SEND_ACKS: | ||
if (process_haves_and_send_acks(&data)) | ||
state = FETCH_SEND_PACK; | ||
state = UPLOAD_SEND_PACK; | ||
else | ||
state = FETCH_DONE; | ||
state = UPLOAD_DONE; | ||
break; | ||
case FETCH_SEND_PACK: | ||
case UPLOAD_SEND_PACK: | ||
send_wanted_ref_info(&data); | ||
send_shallow_info(&data); | ||
|
||
|
@@ -1841,9 +1841,9 @@ int upload_pack_v2(struct repository *r, struct packet_reader *request) | |
packet_writer_write(&data.writer, "packfile\n"); | ||
create_pack_file(&data, NULL); | ||
} | ||
state = FETCH_DONE; | ||
state = UPLOAD_DONE; | ||
break; | ||
case FETCH_DONE: | ||
case UPLOAD_DONE: | ||
continue; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):