forked from git-for-windows/git
-
Notifications
You must be signed in to change notification settings - Fork 106
Run CodeQL as part of the CI #771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When `clar_parse_args()` encounters an unhandled command-line option, it wants to print out an error message that says so, and show the option that it encountered. However, the format claims that it wants to show a string, whereas using `argument[1]` provided only the first character. This was side-stepped in upstream `clar` in 7eaea8b2da (clar: argument parsing improvements, 2025-01-16) by not even showing the option anymore. Let's just fix this in Git until the next time Git synchronizes with an upstream `clar` version again. Signed-off-by: Johannes Schindelin <[email protected]>
When accessing an array element (or, in these instances, characters in a string), the check whether the array index is within bounds should always come before accessing said element. Signed-off-by: Johannes Schindelin <[email protected]>
…alized The large `switch` statement makes it a bit impractical to reason about the code. One of the code paths can technically lead to using `size` without being initialized: if the `t` case is taken and the type name is set to the empty string, we would actually leave `size` unintialized right until we use it. Practically, this cannot happen because the `do_oid_object_info_extended()` function is expected to always populate the `type_name` if asked for. However, it is quite unnecessary to leave the code as unwieldy to reason about: Just initialize the variable to 0 and be done with it. Signed-off-by: Johannes Schindelin <[email protected]>
In `fsck_commit()`, after counting the authors of a commit, we set the `err` variable either when there was no author, or when there were more than two authors recorded. Then we access the `err` variable to figure out whether we should return early. But if there was exactly one author, that variable is still uninitialized. Let's just initialize the variable. This issue was pointed out by CodeQL. Signed-off-by: Johannes Schindelin <[email protected]>
The `revindex_size` value is uninitialized in case the function is erroring out, but we want to assign its value. Let's just initialize it. Signed-off-by: Johannes Schindelin <[email protected]>
The `mtimes_size` variable is uninitialzed when the function errors out, yet its value is assigned to another variable. Let's just initialize it. Signed-off-by: Johannes Schindelin <[email protected]>
On the off chance that `lookup_decoration()` cannot find anything, let `leave_one_treesame_to_parent()` return gracefully instead of crashing. Pointed out by CodeQL. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit_reference()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `parse_object()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `parse_tree_indirect()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit_reference()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL is GitHub's native offering of a static code analyzer, and hence integrates with GitHub Actions better than any other static code analyzer. By default, it comes with a large range of "queries" that test for common code patterns that should be avoided. For now, we only target source code written in C, via the `language: cpp` directive. Just in case that other languages should be targeted, too, this GitHub workflow job is set up as a matrix job to make that easier in the future. For full documentation, see https://docs.github.com/en/code-security/code-scanning/introduction-to-code-scanning/about-code-scanning-with-codeql Co-authored-by: Pierre Tempel <[email protected]> Co-authored-by: Arthur Baars <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
In some instances, CodeQL's web UI on github.com leaves questions unanswered. For example, in some alerts it is really necessary to follow the entire "taint flow" to understand why something might be an issue. The alerts for the `cpp/uncontrolled-allocation-size` rule, for example, are all false positives, and only when inspecting the exact flow does it become obvious that one alert wants to point out that the size of a binary patch hunk, which is specified in the patch, is then used to determine how much memory to allocate, which may potentially run out of memory (and is hence just Git doing what it is asked to, and does not need to be changed). To help with those issues, publish the `.sarif` file as part of every workflow run; This allows downloading that file and inspecting it e.g. with the SARIF viewer extension in VS Code (for details, see https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer). Signed-off-by: Johannes Schindelin <[email protected]>
…fter release As pointed out by CodeQL, it is a potentially dangerous practice to store local variables' addresses in non-local structs. My original intention was to make sure to clear it out after it was used, and before the function returns (which is when the address would go stale). However, I faced too much resistance in the Git project against such patches, there seemed to always be the overwhelming sentiment that the code isn't broken (even if it requires a complex and demanding analysis to wrap one's head around _that_). Therefore, I will be pragmatic and simply ask CodeQL to hold its peace about this issue forever. Signed-off-by: Johannes Schindelin <[email protected]>
As pointed out by CodeQL, it could be NULL and we usually check for that. Signed-off-by: Johannes Schindelin <[email protected]>
On the off-chance that it's NULL... Signed-off-by: Johannes Schindelin <[email protected]>
As pointed out by CodeQL, `lookup_commit()` can return NULL. Signed-off-by: Johannes Schindelin <[email protected]>
The code is a bit too hard to reason about for CodeQL to figure out whether the `fill_commit_graph_info()` function is at all called after `write_commit_graph()` returns (and hence whether `topo_levels` goes out of context before it is used again). The Git project insists that this is correct (and does not want to make the code more obviously correct), so let's silence CodeQL's complaints in this instance. Signed-off-by: Johannes Schindelin <[email protected]>
A couple of CodeQL's queries are opinionated in a way that is obviously not shared by Git's source code's state, and apparently intentionally so. For example, the "For loop variable changed in body" query as well as the "No trivial switch statements" one result in too many results that are apparently intentional in Git's source code. Let's not worry about those, then. Also, Git has plenty of instances where variables shadow other variables. Other valid yet not quite critical issues identified by CodeQL include complex conditionals and nested switch statements spanning several pages. We probably want to address these issues at some stage, but they are not as critical as other problems pointed out by CodeQL, so let's silence those queries for now and take care of them at a later stage. Signed-off-by: Johannes Schindelin <[email protected]>
The `localtime()` function is inherently thread-unsafe and should not be used anymore. Let's use `localtime_r()` instead. Signed-off-by: Johannes Schindelin <[email protected]>
Some fixes in `clar`, pointed out by CodeQL. Signed-off-by: Johannes Schindelin <[email protected]>
A few places where CodeQL thinks that variables might be uninitialized. Signed-off-by: Johannes Schindelin <[email protected]>
These patches implement some defensive programming to address complaints some static analyzers might have. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL pointed out a couple of issues, which are addressed in this patch series. Signed-off-by: Johannes Schindelin <[email protected]>
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <[email protected]>
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <[email protected]>
That's because |
mjcheetham
approved these changes
Jul 7, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a CodeQL workflow to this repository. CodeQL is touted as a "semantic code analysis engine", i.e. its intention isn't really a full industry-grade static code analyzer like Coverity (which we already run in our CI builds). This shows in the number of queries we have to suppress because they would result in an unwieldy number of false positives.
Or, one might argue, it shows how convoluted part of Git's logic is, which may not only confuse CodeQL but also human readers. The latter might not be a direct security issue, but as Tony Hoare said:
I tried to make the code simpler with gitgitgadget#1888 and with gitgitgadget#1890, but the discussion went astray from my original purpose, instead veering off into the direction of arguing in complicated lines of reasoning that the current code is actually correct. Nevertheless, I do include these patches here, structured via merge commits to make it easier to drop them should too many merge conflicts in rebases to upstream Git versions make that advisable.
However, for most of the alerts, the approach I took is exclude the queries wholesale, when they caused alerts. I could have taken the approach to suppress the alerts in a fine-grained way (via
codeql[<query-name>]
) so that true positives aren't suppressed in addition to false positives. However, I am loathe to take that approach because a voice inside me fully expects backlash on the Git mailing list along the lines of :"You're littering the code just to appease CodeQL!".Theoretically, an alternative exists: To develop modified versions of those CodeQL queries, e.g. to ignore paths inside the
.git/
directory in thecpp/toctou-race-condition
query. However, CodeQL is a language that is not only (intentionally?) difficult to develop in by virtue of being declarative instead of imperative, its debugging facilities are pretty much non-existent.Given all of the above, the obvious question begs itself: Why bother with CodeQL at all? The truthful answer is: It is mandated by the Secure Future Initiative. And who knows, maybe CodeQL will come in handy in the future? After all, it is a framework more than a solution, and should in principle be able to help with answering questions like: "Which call paths into
libgit.a
could result indie()
being called?".