Skip to content

Conversation

dscho
Copy link
Member

@dscho dscho commented Jul 7, 2025

This PR adds a CodeQL workflow to this repository. CodeQL is touted as a "semantic code analysis engine", i.e. its intention isn't really a full industry-grade static code analyzer like Coverity (which we already run in our CI builds). This shows in the number of queries we have to suppress because they would result in an unwieldy number of false positives.

Or, one might argue, it shows how convoluted part of Git's logic is, which may not only confuse CodeQL but also human readers. The latter might not be a direct security issue, but as Tony Hoare said:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

I tried to make the code simpler with gitgitgadget#1888 and with gitgitgadget#1890, but the discussion went astray from my original purpose, instead veering off into the direction of arguing in complicated lines of reasoning that the current code is actually correct. Nevertheless, I do include these patches here, structured via merge commits to make it easier to drop them should too many merge conflicts in rebases to upstream Git versions make that advisable.

However, for most of the alerts, the approach I took is exclude the queries wholesale, when they caused alerts. I could have taken the approach to suppress the alerts in a fine-grained way (via codeql[<query-name>]) so that true positives aren't suppressed in addition to false positives. However, I am loathe to take that approach because a voice inside me fully expects backlash on the Git mailing list along the lines of :"You're littering the code just to appease CodeQL!".

Theoretically, an alternative exists: To develop modified versions of those CodeQL queries, e.g. to ignore paths inside the .git/ directory in the cpp/toctou-race-condition query. However, CodeQL is a language that is not only (intentionally?) difficult to develop in by virtue of being declarative instead of imperative, its debugging facilities are pretty much non-existent.

Given all of the above, the obvious question begs itself: Why bother with CodeQL at all? The truthful answer is: It is mandated by the Secure Future Initiative. And who knows, maybe CodeQL will come in handy in the future? After all, it is a framework more than a solution, and should in principle be able to help with answering questions like: "Which call paths into libgit.a could result in die() being called?".

dscho and others added 25 commits July 6, 2025 19:49
When `clar_parse_args()` encounters an unhandled command-line option, it
wants to print out an error message that says so, and show the option
that it encountered. However, the format claims that it wants to show a
string, whereas using `argument[1]` provided only the first character.

This was side-stepped in upstream `clar` in 7eaea8b2da (clar: argument
parsing improvements, 2025-01-16) by not even showing the option
anymore.

Let's just fix this in Git until the next time Git synchronizes with an
upstream `clar` version again.

Signed-off-by: Johannes Schindelin <[email protected]>
When accessing an array element (or, in these instances, characters in
a string), the check whether the array index is within bounds should
always come before accessing said element.

Signed-off-by: Johannes Schindelin <[email protected]>
…alized

The large `switch` statement makes it a bit impractical to reason about
the code.

One of the code paths can technically lead to using `size` without being
initialized: if the `t` case is taken and the type name is set to the
empty string, we would actually leave `size` unintialized right until we
use it.

Practically, this cannot happen because the
`do_oid_object_info_extended()` function is expected to always populate
the `type_name` if asked for. However, it is quite unnecessary to leave
the code as unwieldy to reason about: Just initialize the variable to 0
and be done with it.

Signed-off-by: Johannes Schindelin <[email protected]>
In `fsck_commit()`, after counting the authors of a commit, we set the
`err` variable either when there was no author, or when there were more
than two authors recorded. Then we access the `err` variable to figure
out whether we should return early. But if there was exactly one author,
that variable is still uninitialized.

Let's just initialize the variable.

This issue was pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <[email protected]>
The `revindex_size` value is uninitialized in case the function is
erroring out, but we want to assign its value. Let's just initialize it.

Signed-off-by: Johannes Schindelin <[email protected]>
The `mtimes_size` variable is uninitialzed when the function errors out,
yet its value is assigned to another variable. Let's just initialize it.

Signed-off-by: Johannes Schindelin <[email protected]>
On the off chance that `lookup_decoration()` cannot find anything, let
`leave_one_treesame_to_parent()` return gracefully instead of crashing.

Pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit_reference()` can return NULL
values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `parse_object()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `parse_tree_indirect()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit_reference()` can return NULL
values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL is GitHub's native offering of a static code analyzer, and hence
integrates with GitHub Actions better than any other static code
analyzer.

By default, it comes with a large range of "queries" that test for
common code patterns that should be avoided.

For now, we only target source code written in C, via the `language:
cpp` directive. Just in case that other languages should be targeted,
too, this GitHub workflow job is set up as a matrix job to make that
easier in the future.

For full documentation, see
https://docs.github.com/en/code-security/code-scanning/introduction-to-code-scanning/about-code-scanning-with-codeql

Co-authored-by: Pierre Tempel <[email protected]>
Co-authored-by: Arthur Baars <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
In some instances, CodeQL's web UI on github.com leaves questions
unanswered. For example, in some alerts it is really necessary to follow
the entire "taint flow" to understand why something might be an issue.

The alerts for the `cpp/uncontrolled-allocation-size` rule, for example,
are all false positives, and only when inspecting the exact flow does it
become obvious that one alert wants to point out that the size of a
binary patch hunk, which is specified in the patch, is then used to
determine how much memory to allocate, which may potentially run out of
memory (and is hence just Git doing what it is asked to, and does not
need to be changed).

To help with those issues, publish the `.sarif` file as part of every
workflow run; This allows downloading that file and inspecting it e.g.
with the SARIF viewer extension in VS Code (for details, see
https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer).

Signed-off-by: Johannes Schindelin <[email protected]>
…fter release

As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs.

My original intention was to make sure to clear it out after it was
used, and before the function returns (which is when the address would
go stale).

However, I faced too much resistance in the Git project against such
patches, there seemed to always be the overwhelming sentiment that the
code isn't broken (even if it requires a complex and demanding analysis
to wrap one's head around _that_). Therefore, I will be pragmatic and
simply ask CodeQL to hold its peace about this issue forever.

Signed-off-by: Johannes Schindelin <[email protected]>
As pointed out by CodeQL, it could be NULL and we usually check for
that.

Signed-off-by: Johannes Schindelin <[email protected]>
On the off-chance that it's NULL...

Signed-off-by: Johannes Schindelin <[email protected]>
As pointed out by CodeQL, `lookup_commit()` can return NULL.

Signed-off-by: Johannes Schindelin <[email protected]>
The code is a bit too hard to reason about for CodeQL to figure out
whether the `fill_commit_graph_info()`  function is at all called after
`write_commit_graph()` returns (and hence whether `topo_levels` goes out
of context before it is used again).

The Git project insists that this is correct (and does not want to make
the code more obviously correct), so let's silence CodeQL's complaints
in this instance.

Signed-off-by: Johannes Schindelin <[email protected]>
A couple of CodeQL's queries are opinionated in a way that is obviously
not shared by Git's source code's state, and apparently intentionally so.

For example, the "For loop variable changed in body" query as well as
the "No trivial switch statements" one result in too many results that
are apparently intentional in Git's source code. Let's not worry about
those, then. Also, Git has plenty of instances where variables shadow
other variables.

Other valid yet not quite critical issues identified by CodeQL include
complex conditionals and nested switch statements spanning several
pages.

We probably want to address these issues at some stage, but they are not
as critical as other problems pointed out by CodeQL, so let's silence
those queries for now and take care of them at a later stage.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho requested a review from mjcheetham July 7, 2025 09:37
@dscho dscho self-assigned this Jul 7, 2025
dscho added 3 commits July 7, 2025 11:54
The `localtime()` function is inherently thread-unsafe and should not be
used anymore. Let's use `localtime_r()` instead.

Signed-off-by: Johannes Schindelin <[email protected]>
Some fixes in `clar`, pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <[email protected]>
A few places where CodeQL thinks that variables might be uninitialized.

Signed-off-by: Johannes Schindelin <[email protected]>
dscho added 4 commits July 7, 2025 11:54
These patches implement some defensive programming to address complaints
some static analyzers might have.

Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL pointed out a couple of issues, which are addressed in this patch
series.

Signed-off-by: Johannes Schindelin <[email protected]>
This patch series has been long in the making, ever since Johannes
Nicolai and myself spiked this in November/December 2020.

Signed-off-by: Johannes Schindelin <[email protected]>
This patch series has been long in the making, ever since Johannes
Nicolai and myself spiked this in November/December 2020.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho
Copy link
Member Author

dscho commented Jul 7, 2025

Scalar Functional Tests / Scalar Functional Tests (windows-2019, ignored) (pull_request) Cancelled after 5s

That's because windows-2019 was retired; That is fixed in 3710e34 over in #770.

@dscho dscho merged commit 290f9d0 into vfs-2.50.0 Jul 7, 2025
176 of 178 checks passed
@dscho dscho deleted the codeql branch July 7, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants