- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 393
Description
Summary 💡
Possible tasks:
-  Document that gix_command::Preparewithuse_shelldoes not quote (unless it will) (#1800)
-  Document which data in a Prepareinstance are subject to shell expansions if unquoted (#1800)
-  Clarify the semantics of args(#1813)
-  Maybe: When use_shellistrue, let caller opt into POSIX shell style quoting (#1801)
It might make sense to split this into multiple issues, in that the first three items could be done by themselves, though they become more important and possibly also different if fourth is done.
When using a shell, the how varies based on the why
By default, gix_command::Prepare rightly does not use a shell. But it supports using a shell if requested. This shell is generally sh and assumed to have POSIX shell semantics. There are at least three reasons one might want to use a shell:
- To run a command in the same way gitwould run it, wheregitis guaranteed to use a shell.
- To run a command that is expressed as a single string, with arguments split how shsplits them.
- To get a different environment that a shell is hoped to confer.
Reasons (1) and (2) are the most important
In the absence of (1), a shell must be used very carefully, and basically must never be used for running a command git documents not to use a shell or that can otherwise be assumed not to be run with a shell when git runs it. For example, if a command that should not run in a shell has a \ in its name due that being a path separator on Windows, and it is run in a shell anyway, then that will turn into an escape character.
When (2) is the only reason to use a shell, and neither (1) nor (3) apply, or they apply but can be "optimized away," then shell-like argument splitting can be done without using a shell. Prepare currently does this on Windows, if it is it is allowed but not required to use a shell, i.e. if use_shell and allow_manual_arg_splitting were both set to true.
The feature suggested here is about making (3) easier and slightly more likely to be done correctly when done at all. But I mention (1) and (2) because I think they are the main uses of use_shell, and I recommend only pursuing this feature idea for (3) if it will not interfere with, complicate, or confuse (1) and (2).
Reason (3) is hard to get right
I think it is very hard to do (3) effectively and achieve the predicted effect, as discussed in #1752 and #1758. The sh.exe in Git for Windows probably does provide an environment that is different from the caller's in some ways that could help find or use some tools. But:
- Path interpretation could sometimes differ unexpectedly, since sh.exe/bash.exesupport Cygwin-style paths. As one but not the only case of this, some Windows paths thatstd::process::Commandsupports (it supports nearly all Windows paths, including\\?\,\\.\, and even\??\paths) can be rejected or misinterpreted.
- It probably also differs in other ways that make things work less reliably. For example, if it and gitoxide are built for different architectures and one of them is 32-bit, then they may resolve the same paths to different locations when running some programs, due to the file system redirector, which I suspect may have been a factor in #1432 (which I do plan to revisit; I have not forgotten about #1585 (comment)).
- Running something with sh.exe, or even withbash.exe, is not equivalent to using the interactive environment in a Git Bash environment. This is (mostly) because some environment customization is done through startup scripts that mainly run when a shell is used interactively.
Outside of Windows, I would expect nearly all of the difference between an environment using and not using a shell as an intermediary would be due to startup scripts that are unlikely to run, and probably should not be run, when using the shell in an automated, non-interactive way. This is because, unlike a Unix-style shell on Windows, a Unix-style on a Unix-like system is not doing anything extra to take an environment that is presumed to be non-Unix-like and make it Unix-like.
But we may be able to make (3) easier to get right on the occasion it is needed
If I understand correctly, the way gix_command::Prepare currently works when it uses an external shell process is that it assumes the shell is sh-like, invokes it with a -c argument followed by the value of the command field and, if args is nonempty and $@ does not already appear in the command field, adds a space followed by "$@" to the command and passes the arguments as separate argument to the shell, so that they will be treated as positional parameters and accessible through $1, $2, $3, etc., and with $@ and $*, with their usual semantics.
This assumes the command to be run should be subject to shell expansions, including expansions that might turn it into a command followed by some of its arguments, and then explicit args should not be supplied literally as positional parameters that the command can access. This seems generally reasonable when the goal of using a shell is (1) or (2), provided that the caller has considered all the effects of using a shell and understands exactly what is being passed to it. The first two proposed tasks ("Document ...") would help authors of code that use gix_command, or that use other code that uses gix_command and documents that it does so, so know what to expect.
But this may not be the right semantics for (3), where no expansions are wanted at all. Suppose (3) applies and (1) and (2) do not. Suppose further that using a shell has been confirmed in the specific use case to produce the desired environment (so the solution is not simply to abandon (3) as a goal or technique). Then the value of command should be quoted.
This seems feasible
command can only be quoted safely if enough is known about the shell language and if its quoting rules are simple enough.
Fortunately, POSIX-compatible sh, and most if not all modern Bourne-style shells, satisfy this requirement. For any string, so long as the string actually can be included in a script or -c operand (i.e., barring an encoding mismatch), the string can be quoted by replacing any preexisting occurrence of ' with the literal sequence '\'', prepending a ', and appending a '.
This works for causing it to be passed as an argument, so long as it is properly separated from the command name and other arguments with whitespace. It also works for using it as the command name. For example, C:\Users\O'Shaughnessy\foo bar.exe can be quoted as 'C:\Users\O'\''Shaughnessy\foo bar.exe'.
I don't think this should be done automatically. One reason it should not is that the caller may be quoting it already. But it might be useful as a feature that can be opted into.
Because--and so long as--args are not passed to the shell as part of text the shell considers a shell script, they should not be quoted and no option to quote them should be presented. But the semantics when doing this should be made clear, especially if this feature I am proposing is to be implemented. Furthermore, if this feature is implemented, then the circumstances under which quoting is added should be clearly distinguished from those under which it is not.
The process starts with the arguments, right?
It may be that the current summary of args will be understood correctly:
gitoxide/gix-command/src/lib.rs
Lines 25 to 26 in 97f85f7
| /// The arguments to pass to the spawned process. | |
| pub args: Vec<OsString>, | 
However, I was momentarily tripped up by it, reading it to mean that the process has been spawned and then receives the arguments. This may be possible to fix while revising the docstring, or other revisions might make it clear automatically by describing exactly how the args are passed, or maybe this specific point about the wording being ambiguous was just an initial misreading on my part.
Motivation 🔦
The commented discussions in #1752 and #1758 reveal (3) as a goal. Part of the view I expressed there is that whether, and how, this should be done, is probably higher level than gitoxide.
But even though I think that is largely true, and even though I think gitoxide itself should be especially reluctant to automatically use a shell for any operations when neither (1) nor (2) decisively applies, there does seem to be a fragment of how to do (3) that belongs in gix_command: helping the caller cause the correct string to be used as command depending on whether quoting is needed.
Another motivation is that I noticed what looked like a possible use case for this in gitbutlerapp/gitbutler#6028. Note, however, that the only non-documentation change proposed in this feature request is opt-in quoting for command when command is interpreted as a single path. The quotes would be added and then, if args are present, a space and "$@" appended.
What this doesn't propose
This does not currently propose even opt-in quoting for arguments, because "$@" is already in place and seems sufficient to neutralize anything in them that would be treated specially by a shell if it appeared in text the shell considers a script or command.
This does not propose anything to add a -- argument before "$@", because whether it is needed or not is independent of whether a shell is used. More broadly, if present, it's just a literal "--", so it can be included explicitly in args at least as clearly and accurately as if specified in any other way.