-
Notifications
You must be signed in to change notification settings - Fork 101
Rebase to v2.51.0-rc1 #781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vfs-2.51.0-rc1
Are you sure you want to change the base?
Conversation
In `fsck_commit()`, after counting the authors of a commit, we set the `err` variable either when there was no author, or when there were more than two authors recorded. Then we access the `err` variable to figure out whether we should return early. But if there was exactly one author, that variable is still uninitialized. Let's just initialize the variable. This issue was pointed out by CodeQL. Signed-off-by: Johannes Schindelin <[email protected]>
When accessing an array element (or, in these instances, characters in a string), the check whether the array index is within bounds should always come before accessing said element. Signed-off-by: Johannes Schindelin <[email protected]>
The `revindex_size` value is uninitialized in case the function is erroring out, but we want to assign its value. Let's just initialize it. Signed-off-by: Johannes Schindelin <[email protected]>
The `localtime()` function is inherently thread-unsafe and should not be used anymore. Let's use `localtime_r()` instead. Signed-off-by: Johannes Schindelin <[email protected]>
The `mtimes_size` variable is uninitialzed when the function errors out, yet its value is assigned to another variable. Let's just initialize it. Signed-off-by: Johannes Schindelin <[email protected]>
On the off chance that `lookup_decoration()` cannot find anything, let `leave_one_treesame_to_parent()` return gracefully instead of crashing. Pointed out by CodeQL. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit_reference()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `parse_object()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `parse_tree_indirect()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `lookup_commit_reference()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL points out that `branch_get()` can return NULL values. Signed-off-by: Johannes Schindelin <[email protected]>
CodeQL is GitHub's native offering of a static code analyzer, and hence integrates with GitHub Actions better than any other static code analyzer. By default, it comes with a large range of "queries" that test for common code patterns that should be avoided. For now, we only target source code written in C, via the `language: cpp` directive. Just in case that other languages should be targeted, too, this GitHub workflow job is set up as a matrix job to make that easier in the future. For full documentation, see https://docs.github.com/en/code-security/code-scanning/introduction-to-code-scanning/about-code-scanning-with-codeql Co-authored-by: Pierre Tempel <[email protected]> Co-authored-by: Arthur Baars <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
…fter release As pointed out by CodeQL, it is a potentially dangerous practice to store local variables' addresses in non-local structs. My original intention was to make sure to clear it out after it was used, and before the function returns (which is when the address would go stale). However, I faced too much resistance in the Git project against such patches, there seemed to always be the overwhelming sentiment that the code isn't broken (even if it requires a complex and demanding analysis to wrap one's head around _that_). Therefore, I will be pragmatic and simply ask CodeQL to hold its peace about this issue forever. Signed-off-by: Johannes Schindelin <[email protected]>
In some instances, CodeQL's web UI on github.com leaves questions unanswered. For example, in some alerts it is really necessary to follow the entire "taint flow" to understand why something might be an issue. The alerts for the `cpp/uncontrolled-allocation-size` rule, for example, are all false positives, and only when inspecting the exact flow does it become obvious that one alert wants to point out that the size of a binary patch hunk, which is specified in the patch, is then used to determine how much memory to allocate, which may potentially run out of memory (and is hence just Git doing what it is asked to, and does not need to be changed). To help with those issues, publish the `.sarif` file as part of every workflow run; This allows downloading that file and inspecting it e.g. with the SARIF viewer extension in VS Code (for details, see https://marketplace.visualstudio.com/items?itemName=MS-SarifVSCode.sarif-viewer). Signed-off-by: Johannes Schindelin <[email protected]>
As pointed out by CodeQL, it could be NULL and we usually check for that. Signed-off-by: Johannes Schindelin <[email protected]>
A couple of CodeQL's queries are opinionated in a way that is obviously not shared by Git's source code's state, and apparently intentionally so. For example, the "For loop variable changed in body" query as well as the "No trivial switch statements" one result in too many results that are apparently intentional in Git's source code. Let's not worry about those, then. Also, Git has plenty of instances where variables shadow other variables. Other valid yet not quite critical issues identified by CodeQL include complex conditionals and nested switch statements spanning several pages. We probably want to address these issues at some stage, but they are not as critical as other problems pointed out by CodeQL, so let's silence those queries for now and take care of them at a later stage. Signed-off-by: Johannes Schindelin <[email protected]>
On the off-chance that it's NULL... Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
As pointed out by CodeQL, `lookup_commit()` can return NULL. Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
The code is a bit too hard to reason about for CodeQL to figure out whether the `fill_commit_graph_info()` function is at all called after `write_commit_graph()` returns (and hence whether `topo_levels` goes out of context before it is used again). The Git project insists that this is correct (and does not want to make the code more obviously correct), so let's silence CodeQL's complaints in this instance. Signed-off-by: Johannes Schindelin <[email protected]>
…ray past end Signed-off-by: Johannes Schindelin <[email protected]>
Signed-off-by: Johannes Schindelin <[email protected]>
…oes NUL-terminate correctly Signed-off-by: Johannes Schindelin <[email protected]>
Let's exclude GitWeb from being scanned; It is not distributed by us. Signed-off-by: Johannes Schindelin <[email protected]>
Git v2.48.0 has become even more stringent about leaks. Signed-off-by: Johannes Schindelin <[email protected]>
The --path-walk option in `git pack-objects` is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within `git push` specifically. While this config does enable the path-walk feature, it does not lead to the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via `git repack -Ad` helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set `gc.auto=0` before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in `git push`, this change enforces the spawned `git pack-objects` process to use `--no-reuse-delta`. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). This commit also adds a test case that demonstrates that `git -c pack.usePathWalk=true push` now avoids reusing deltas. To do this, the test case constructs a pack with a horrendously inefficient delta object, then verifies that the pack on the receiving side of the `push` fails to have such an inefficient delta. The test case would probably be a lot more readable if hex numbers were used instead of octal numbers, but alas, `printf "\x<hex>"` is not portable, only `printf "\<octal>"` is. For example, dash's built-in `printf` function simply prints `\x` verbatim while bash's built-in happily converts this construct to the corresponding byte. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically. While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not. One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided. Alternative options considered included: * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned. * Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need. Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.
Tests in t7900 assume the state of the `maintenance.strategy` config setting; set/unset by previous tests. Correct this by explictly unsetting and re-setting the config at the start of the tests. Signed-off-by: Matthew John Cheetham <[email protected]>
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Migration of packfiles involves the following steps for each pack: 1. Hardlink (or copy): a. the .pack file b. the .keep file c. the .rev file 2. Move (or copy + delete) the .idx file 3. Delete/unlink: a. the .pack file b. the .keep file c. the .rev file Moving the index file after the others ensures the pack is not read from the new cache directory until all associated files (rev, keep) exist in the cache directory also. Moving loose objects operates as a move, or copy + delete. Signed-off-by: Matthew John Cheetham <[email protected]>
Add the `cache-local-objects` maintenance task to the list of tasks run by the `scalar run` command. It's often easier for users to run the shorter `scalar run` command than the equivalent `git maintenance` command. Signed-off-by: Matthew John Cheetham <[email protected]>
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Fixes #716
Add the ability to block built-in commands based on if the `core.gvfs` setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us to selectively block commands that use the GVFS protocol, but don't use VFS for Git (for example repos cloned via `scalar clone` against Azure DevOps). Signed-off-by: Matthew John Cheetham <[email protected]>
Loosen the blocking of the `repack` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `repack` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <[email protected]>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `fsck` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <[email protected]>
Loosen the blocking of the `prune` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `prune` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <[email protected]>
Replace the special casing of the `worktree` command being blocked on VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag. Signed-off-by: Matthew John Cheetham <[email protected]>
Emit a warning message when the `gvfs.sharedCache` option is set that the `repack` command will not perform repacking on the shared cache. In the future we can teach `repack` to operate on the shared cache, at which point we can drop this commit. Signed-off-by: Matthew John Cheetham <[email protected]>
The microsoft/git fork includes pre- and post-command hooks, with the initial intention of using these for VFS for Git. In that environment, these are important hooks to avoid concurrent issues when the virtualization is incomplete. However, in the Office monorepo the post-command hook is used in a different way. A custom hook is used to update the sparse-checkout, if necessary. To avoid this hook from being incredibly slow on every Git command, this hook checks for the existence of a "sentinel file" that is written by a custom post-index-change hook and no-ops if that file does not exist. However, even this "no-op" is 200ms due to the use of two scripts (one simple script in .git/hooks/ does some environment checking and then calls a script from the working directory which actually contains the logic). Add a new config option, 'postCommand.strategy', that will allow for multiple possible strategies in the future. For now, the one we are adding is 'worktree-change' which states that we should write a sentinel file instead of running the 'post-index-change' hook and then skip the 'post-command' hook if the proper sentinel file doesn't exist. If it does exist, then delete it and run the hook. This behavior is _only_ triggered, however, if a part of the index changes that is within the sparse checkout; If only parts of the index change that are not even checked out on disk, the hook is still skipped. I originally planned to put this into the repo-settings, but this caused the repo settings to load in more cases than they did previously. When there is an invalid boolean config option, this causes failure in new places. This was caught by t3007. This behavior is tested in t0401-post-command-hook.sh. Signed-off-by: Derrick Stolee <[email protected]>
Signed-off-by: Derrick Stolee <[email protected]>
This helps t0401 pass while under SANITIZE=leak. Signed-off-by: Derrick Stolee <[email protected]>
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
The microsoft/git fork includes pre- and post-command hooks, with the initial intention of using these for VFS for Git. In that environment, these are important hooks to avoid concurrent issues when the virtualization is incomplete. However, in the Office monorepo the post-command hook is used in a different way. A custom hook is used to update the sparse-checkout, if necessary. To avoid this hook from being incredibly slow on every Git command, this hook checks for the existence of a "sentinel file" that is written by a custom post-index-change hook and no-ops if that file does not exist. However, even this "no-op" is 200ms due to the use of two scripts (one simple script in .git/hooks/ does some environment checking and then calls a script from the working directory which actually contains the logic). Add a new config option, 'postCommand.strategy', that will allow for multiple possible strategies in the future. For now, the one we are adding is 'post-index-change' which states that we should write a sentinel file instead of running the 'post-index-change' hook and then skip the 'post-command' hook if the proper sentinel file doesn't exist. (If it does exist, then delete it and run the hook.) --- This fork contains changes specific to monorepo scenarios. If you are an external contributor, then please detail your reason for submitting to this fork: * [ ] This is an early version of work already under review upstream. * [ ] This change only applies to interactions with Azure DevOps and the GVFS Protocol. * [ ] This change only applies to the virtualization hook and VFS for Git. * [x] This change only applies to custom bits in the microsoft/git fork.
This new test demonstrates some behavior where a valid packfile is being rejected by the Git client due to the order in which it is resolving REF_DELTAs. The thin packfile has a REF_DELTA chain A->B->C where C is not included in the packfile. However, the client repository contains both C and B already. Thus, 'git index-pack' is able to resolve A before resolving B. When resolving B, it then attempts to resolve any other REF_DELTAs that are pointing to B as a base. This "revisits" A and complains as if there is a cycle, but it did not actually detect a cycle. A fix will arrive in the next change. Signed-off-by: Derrick Stolee <[email protected]>
This is an early version of work already under review upstream: gitgitgadget#1906
Just like we just did in the backport from my upstream contribution, let's convert the `curl_easy_setopt()` calls in `gvfs-helper.c` that still passed `int` constants to pass `long` instead. Signed-off-by: Johannes Schindelin <[email protected]>
This topic branch has backports of cURL compile fixes in the `osx-gcc` job, plus a bonus `gvfs-helper` follow-up fix. Signed-off-by: Johannes Schindelin <[email protected]>
On Linux, the following command would cause the terminal to be stuck waiting: git fetch origin foobar The issue would be that the fetch would fail with the error fatal: couldn't find remote ref foobar but the underlying git-gvfs-helper process wouldn't die. The `subprocess_exit_handler()` method would close its stdin and stdout, but that wouldn't be enough to cause the process to end, even though the `packet_read_line_gently()` call that is run in `do_sub_cmd__server()` in a loop should return -1 and the process should the terminate peacefully. While it is unclear why this does not happen, there may be other conditions where the `gvfs-helper` process would not terminate. Let's ensure that the gvfs-helper-client process cleans up the gvfs-helper server processes that it spawned upon exit. Reported-by: Stuart Wilcox Humilde <[email protected]> Co-authored-by: Johannes Schindelin <[email protected]> Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Johannes Schindelin <[email protected]>
On Linux, the following command would cause the terminal to be stuck waiting: ``` git fetch origin foobar ``` The issue would be that the fetch would fail with the error ``` fatal: couldn't find remote ref foobar ``` but the underlying `git-gvfs-helper` process wouldn't die. The `subprocess_exit_handler()` method would close its stdin and stdout, but that wouldn't be enough to cause the process to end. This PR addresses that by skipping the `finish_command()` call of the `clean_on_exit_handler` and instead lets `cleanup_children()` send a SIGTERM to terminate those spawned child processes.
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <[email protected]>
This patch series has been long in the making, ever since Johannes Nicolai and myself spiked this in November/December 2020. Signed-off-by: Johannes Schindelin <[email protected]>
Range-diff relative to -rc0
Basically, some non-trivial mechanics required to accommodate for more upstream refactors that avoid |
We probably want to add a test case to verify that I transmogrified 9150f1d correctly... |
The usual shtick.