fix(printf): redesign %q implementation for gnu compatibility#9640
fix(printf): redesign %q implementation for gnu compatibility#9640naoNao89 wants to merge 1 commit intouutils:mainfrom
Conversation
647e8cb to
9f3a0f9
Compare
Merging this PR will improve performance by 3.7%
Performance Changes
Comparing Footnotes
|
|
GNU testsuite comparison: |
04050fa to
ac19251
Compare
|
GNU testsuite comparison: |
9312501 to
ac19251
Compare
|
GNU testsuite comparison: |
ac19251 to
ee02b5e
Compare
|
Please ask GNU before opening/merging this. |
|
💀 |
|
a701296 to
78d0e23
Compare
|
GNU testsuite comparison: |
|
Thanks for working on this!
I don't think it's the right approach. The naming of the cmdline option I think reusing I believe GNU Coreutils also fixed these two hand-in-hand, even though the NEWS entry doesn't specify the exact cmdline option of The story is more complicated than this since there are 4 different relevant formats of |
|
tks, found it |
|
use shared EscapedShellQuoter instead of separate PrintfQuoter,result: 12/21 tests pass. EscapedShellQuoter has bugs that affect both tools. |
|
Some notes from GNU ... The difference between shell and shell-escape is that "shell-escape" uses POSIX $'\xxx' syntax to output non-printing characters, whereas the "shell" just outputs ? for such characters. I.e. shell-escape is the most general in that its output should always be copy and pasteable back to the shell to specify any file name. "shell-escape" is the default mode in ls when outputting to a tty. Also this is what printf %q uses since it's the most general. I'll expand on this a bit in the gnu docs. So, yes ideally there would not be a separate implementation of ls --quoting-style and printf %q. As for conciseness of output, the GNU output can have redundant leading single quotes, though that is for subtle alignment reasons. BTW folks can be very sensitive about this. I got threatening personal voicemails when I changed this alignment output slightly 10 years ago now. For example: Notice how the file names are aligned, whereas if we used the more concise |
What the f.ck... I'm so sorry to hear this! Thanks for the additional info! There are multiple correct behaviors, and for I don't think it's a problem if the two coreutilses (how to say this correctly? :)) use different format, as long as they are both technically correct (the shell resolves them to the same original string), but surely following GNU Coreutils is one possible reasonable choice. |
92a9d4b to
dd82476
Compare
|
GNU testsuite comparison: |
dd82476 to
e449066
Compare
|
GNU testsuite comparison: |
e449066 to
f751ca1
Compare
|
GNU testsuite comparison: |
f751ca1 to
b987db6
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
1e6ef3e to
aec867c
Compare
|
GNU testsuite comparison: |
aec867c to
49c25d8
Compare
|
GNU testsuite comparison: |
49c25d8 to
65373e4
Compare
|
GNU testsuite comparison: |
9c29e2b to
f54f118
Compare
|
GNU testsuite comparison: |
f54f118 to
8743bea
Compare
Fixes CI failure from PR uutils#9640 where GNU test suite update (commit ba3442f) exposed fundamental design flaws in printf %q shell-quoting implementation. Problem: Original code pre-scanned for control characters and wrapped ENTIRE strings in $'...' if ANY control char was present (e.g., "a\r" → $'a\r'). Solution: Implemented selective quoting that only wraps control characters themselves (e.g., "a\r" → a$'\r'), matching GNU coreutils behavior. Key Changes: - Removed has_control_chars() pre-scanning logic - Never start in dollar mode - enter/exit dynamically - Exit dollar mode when encountering regular chars (selective quoting) - Keep consecutive control chars in single dollar quote - Handle apostrophes by exiting dollar mode and using \' escape - Updated test expectations to match selective quoting behavior Examples: - 'a\tb' → a$'\t'b (not $'a\tb') - '\x01\x02\x03' → $'\001\002\003' (not $'\001'$'\002'$'\003') - 'hello\x01world' → hello$'\001'world (not $'hello\001world')
|
GNU testsuite comparison: |
|
@naoNao89 is it still a draft ? thanks |
|
readyy |
Fixes CI failure from PR uutils#9640 where GNU test suite update (commit ba3442f) exposed fundamental design flaws in printf %q shell-quoting implementation. Problem: Original code pre-scanned for control characters and wrapped ENTIRE strings in $'...' if ANY control char was present (e.g., "a\r" → $'a\r'). Solution: Implemented selective quoting that only wraps control characters themselves (e.g., "a\r" → a$'\r'), matching GNU coreutils behavior. Key Changes: - Removed has_control_chars() pre-scanning logic - Never start in dollar mode - enter/exit dynamically - Exit dollar mode when encountering regular chars (selective quoting) - Keep consecutive control chars in single dollar quote - Handle apostrophes by exiting dollar mode and using \' escape - Updated test expectations to match selective quoting behavior Examples: - 'a\tb' → a$'\t'b (not $'a\tb') - '\x01\x02\x03' → $'\001\002\003' (not $'\001'$'\002'$'\003') - 'hello\x01world' → hello$'\001'world (not $'hello\001world')
f9b78e2 to
76194d6
Compare
|
GNU testsuite comparison: |
76194d6 to
9cb8de0
Compare
|
GNU testsuite comparison: |
352e1af to
9cb8de0
Compare
|
GNU testsuite comparison: |
9cb8de0 to
51161b0
Compare
Implement printf %q format specifier to match bash behavior for shell-escaping strings. Changes: - Fix integer overflow panic in extreme field width parsing - Update quoting logic for control characters with single quotes - Adjust test expectations for GNU compatibility Fixes uutils#9638
51161b0 to
afb3f3b
Compare
|
GNU testsuite comparison: |
Fixes #9638
Redesigned
printf %qimplementation to match bash behavior. Previous approach incorrectly usedSHELL_ESCAPE(designed forls). Created dedicatedPrintfQuoterwith proper algorithm: empty→'', simple→unchanged, metacharacters→backslash, control→$'...'. Includes 18 tests and related apostrophe bug fix.