-
Notifications
You must be signed in to change notification settings - Fork 3
better cli arg parsing #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughRefactors CLI argument parsing to use index-based traversal with an internal helper to locate closing quotes; updates double-quote extraction and whitespace handling. Tests updated to expect embedded single quotes inside a double-quoted token and two new Python -c parsing tests added. Changes
Sequence Diagram(s)(Skipped — changes are internal parsing logic within a single component and do not meet diagram generation criteria.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
inc/mkn/kul/src/cli/asArgs.ipp (1)
77-80: Dead code: this branch is now unreachable.The new handling at lines 58-65 resets
openQuotesD = falsebefore the switch is ever reached when inside double quotes. Therefore, the conditionopenQuotesD && !openQuotesSwill never be true here.Consider removing this dead branch or rethinking the parsing flow so that quote termination is handled in one place.
Suggested cleanup
case '"': - if (openQuotesD && !openQuotesS) { - openQuotesD = false; - args.push_back(arg); - arg.clear(); - } else { - openQuotesD = true; - } + openQuotesD = true; continue;tst/test/cli.cpp (1)
9-14: Test expectation doesn't match implementation behavior.The pipeline failures confirm that the implementation outputs
"words 'in' quotes"(preserving single quotes), but line 12 expects"words in quotes"(stripped).Standard shell behavior treats single quotes inside double quotes as literal characters. If that's the intended behavior, update the expectation:
Proposed fix to align with shell-like behavior
- EXPECT_EQ("words in quotes", v[1]); + EXPECT_EQ("words 'in' quotes", v[1]);Alternatively, if stripping nested quotes is the desired behavior, the implementation in
asArgs.ippneeds to be fixed to process the quoted content recursively.
🤖 Fix all issues with AI agents
In `@inc/mkn/kul/src/cli/asArgs.ipp`:
- Around line 58-65: The double-quote handling in asArgs.ipp (openQuotesD
branch) incorrectly uses sub_string_to_next_occurrence(cmd, i, "\"") which
ignores escaped quotes and bypasses the backSlashed state; replace this by
scanning the substring from i character-by-character honoring backSlashed (the
same logic used elsewhere) so that \" is treated as an escaped quote and not as
the terminator, build the arg while unescaping backslashes as needed, advance i
by the consumed length, and push the result into args; additionally, if the test
expects single quotes to be stripped inside double-quoted content, post-process
the collected double-quoted content (or apply the same recursive parsing) to
remove surrounding single-quote characters rather than preserving them so
content like "words 'in' quotes" becomes words in quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@inc/mkn/kul/src/cli/asArgs.ipp`:
- Around line 50-64: The quoted-substring handling in asArgs.ipp currently
overwrites any existing partial token and skips empty quoted args; change the
logic in the openQuotesD branch to append the quoted substring instead of
assigning (use arg += ... with sub_string_to_next_occurrence(cmd, i, "\"")),
ensure you push the arg into args even when it's empty (so "" becomes a token),
and advance the loop index by the substring length returned from
sub_string_to_next_occurrence (adjust i by that length) before clearing arg and
resetting openQuotesD; use the existing symbols openQuotesD, arg, args, cmd, and
sub_string_to_next_occurrence to locate and modify the code.
🧹 Nitpick comments (1)
inc/mkn/kul/src/cli/asArgs.ipp (1)
33-41: Avoid allocating a temporary substring in the helper.You can use
std::string::findwith a start index to avoid the extra allocation while keeping the same semantics.♻️ Suggested tweak
- std::string const rest = cmd.substr(s); - auto const pos = rest.find(needle); + auto const pos = cmd.find(needle, s); if (pos == std::string::npos) KEXCEPT(mkn::kul::Exception, "Error: CLI Arg parsing unclosed quotes!"); - return pos; + return pos - s;
| for (std::size_t i = 0; i < cmd.size(); ++i) { | ||
| auto const c = cmd[i]; | ||
|
|
||
| if (backSlashed) { | ||
| backSlashed = false; | ||
| arg += c; | ||
| continue; | ||
|
|
||
| } else if (openQuotesD) { | ||
| auto const pos = sub_string_to_next_occurrence(cmd, i, "\""); | ||
| arg = cmd.substr(i, pos); | ||
| if (arg.size() > 0) args.push_back(arg); | ||
| i += arg.size(); | ||
| arg.clear(); | ||
| openQuotesD = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoted substring overwrites existing token and drops empty quoted args.
arg = cmd.substr(i, pos) discards any prefix, so inputs like foo"bar" yield "bar", and the arg.size() check drops empty "" args. This is a regression vs. the prior char-by-char behavior.
🐛 Proposed fix
- arg = cmd.substr(i, pos);
- if (arg.size() > 0) args.push_back(arg);
- i += arg.size();
+ arg += cmd.substr(i, pos);
+ args.push_back(arg);
+ i += pos;
arg.clear();
openQuotesD = false;
continue;🤖 Prompt for AI Agents
In `@inc/mkn/kul/src/cli/asArgs.ipp` around lines 50 - 64, The quoted-substring
handling in asArgs.ipp currently overwrites any existing partial token and skips
empty quoted args; change the logic in the openQuotesD branch to append the
quoted substring instead of assigning (use arg += ... with
sub_string_to_next_occurrence(cmd, i, "\"")), ensure you push the arg into args
even when it's empty (so "" becomes a token), and advance the loop index by the
substring length returned from sub_string_to_next_occurrence (adjust i by that
length) before clearing arg and resetting openQuotesD; use the existing symbols
openQuotesD, arg, args, cmd, and sub_string_to_next_occurrence to locate and
modify the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at the test, is there some strings you can suggest that are considered valid for command line argument parsing?
really what I'm trying to fix is the following not working
process.run("python3", "-c", "\"import sys; print('lol ' +dir(sys))\"")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read the test file to understand the current test structure
cat tst/test/cli.cpp
Length of output: 393
@PhilipDeegan Here are comprehensive test strings for CLI argument parsing that would validate your implementation:
// Test 1: Empty quoted strings
Input: `cmd "" arg`
Expected: ["cmd", "", "arg"]
// Test 2: Single quotes inside double quotes (current test)
Input: `"words 'in' quotes"`
Expected: ["words 'in' quotes"]
// Test 3: Double quotes inside single quotes
Input: `'words "in" quotes'`
Expected: ["words \"in\" quotes"]
// Test 4: Adjacent quoted and unquoted content
Input: `foo"bar"baz`
Expected: ["foobarbaz"]
// Test 5: Multiple spaces between args
Input: `cmd arg1 arg2`
Expected: ["cmd", "arg1", "arg2"]
// Test 6: Backslash escaping special chars
Input: `cmd arg\ with\ spaces end`
Expected: ["cmd", "arg with spaces", "end"]
// Test 7: Your Python use case
Input: `python3 -c "import sys; print('lol ' +dir(sys))"`
Expected: ["python3", "-c", "import sys; print('lol ' +dir(sys))"]
// Test 8: Mixed quotes and escapes
Input: `cmd "quoted arg" 'single quoted' escaped\ space`
Expected: ["cmd", "quoted arg", "single quoted", "escaped space"]
// Test 9: Empty string at start/end
Input: `"" cmd ""`
Expected: ["", "cmd", ""]
// Test 10: Quote at end of unquoted content
Input: `prefix"quoted part"`
Expected: ["prefixquoted part"]For your specific Python case:
process.run("python3", "-c", "\"import sys; print('lol ' +dir(sys))\"")When the third argument ("\"import sys; print('lol ' +dir(sys))\"") is parsed by asArgs, it should produce:
- The outer
"..."are just Python string delimiters (not parsed by asArgs) - The escaped
\"become literal"characters in the input string - So asArgs sees:
"import sys; print('lol ' +dir(sys))" - Which should parse to:
import sys; print('lol ' +dir(sys))
Note: The bug I flagged earlier (using arg = cmd.substr(...) instead of arg += cmd.substr(...)) will cause Test 4 and Test 10 to fail because any content before the opening quote gets discarded. This will also affect your Python case if there's any prefix before quotes.
Would you like me to help create a comprehensive test suite with all these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@PhilipDeegan Yes! Let me demonstrate the bug with a concrete test case:
This will show that your specific Python case does work (because it starts with a quote), but the bug manifests in other valid CLI scenarios where content precedes the opening quote or when quotes are empty.
🏁 Script executed:
#!/bin/bash
# Create a test program to show the bug with the current implementation
cat > /tmp/test_cli_bug.cpp << 'EOF'
`#include` <string>
`#include` <vector>
`#include` <iostream>
namespace mkn { namespace kul {
class Exception : public std::exception {
std::string msg;
public:
Exception(std::string m) : msg(m) {}
const char* what() const noexcept override { return msg.c_str(); }
};
}}
`#define` KEXCEPT(type, msg) throw type(msg)
namespace {
std::size_t sub_string_to_next_occurrence(std::string const& cmd, std::size_t const s,
std::string const& needle) {
std::string const rest = cmd.substr(s);
auto const pos = rest.find(needle);
if (pos == std::string::npos)
KEXCEPT(mkn::kul::Exception, "Error: CLI Arg parsing unclosed quotes!");
return pos;
}
}
void asArgs(std::string const& cmd, std::vector<std::string>& args) {
std::string arg;
bool openQuotesS = false, openQuotesD = false, backSlashed = false;
for (std::size_t i = 0; i < cmd.size(); ++i) {
auto const c = cmd[i];
if (backSlashed) {
backSlashed = false;
arg += c;
continue;
} else if (openQuotesD) {
auto const pos = sub_string_to_next_occurrence(cmd, i, "\"");
arg = cmd.substr(i, pos); // BUG: overwrites instead of appending
if (arg.size() > 0) args.push_back(arg);
i += arg.size();
arg.clear();
openQuotesD = false;
continue;
}
switch (c) {
case ' ':
if (!openQuotesD && !openQuotesS) {
if (arg.size() > 0) args.push_back(arg);
arg.clear();
continue;
}
break;
case '"':
if (openQuotesD && !openQuotesS) {
openQuotesD = false;
args.push_back(arg);
arg.clear();
} else {
openQuotesD = true;
}
continue;
case '\'':
if (openQuotesS && !openQuotesD) {
openQuotesS = false;
args.push_back(arg);
arg.clear();
} else {
openQuotesS = true;
}
continue;
case '\\':
if (!openQuotesS && !openQuotesD) {
backSlashed = true;
continue;
}
break;
}
arg += c;
}
if (arg.size() > 0) args.push_back(arg);
}
int main() {
// Test 1: Your specific Python case (works because starts with quote)
{
std::vector<std::string> args;
asArgs("python3 -c \"import sys; print('lol ' +dir(sys))\"", args);
std::cout << "Test 1 - Your Python case:\n";
for (size_t i = 0; i < args.size(); ++i) {
std::cout << " [" << i << "] = \"" << args[i] << "\"\n";
}
std::cout << " ✓ This actually works!\n\n";
}
// Test 2: Adjacent unquoted + quoted content (BUG!)
{
std::vector<std::string> args;
asArgs("foo\"bar\"", args);
std::cout << "Test 2 - Adjacent content foo\"bar\":\n";
std::cout << " Expected: [\"foobar\"]\n";
std::cout << " Got: [\"" << (args.size() > 0 ? args[0] : "NOTHING") << "\"]\n";
std::cout << " ✗ FAILS - 'foo' prefix is lost!\n\n";
}
// Test 3: Prefix with quoted content
{
std::vector<std::string> args;
asArgs("prefix\"quoted part\"", args);
std::cout << "Test 3 - prefix\"quoted part\":\n";
std::cout << " Expected: [\"prefixquoted part\"]\n";
std::cout << " Got: [\"" << (args.size() > 0 ? args[0] : "NOTHING") << "\"]\n";
std::cout << " ✗ FAILS - 'prefix' is lost!\n\n";
}
// Test 4: Empty quoted string (also dropped!)
{
std::vector<std::string> args;
asArgs("cmd \"\" arg", args);
std::cout << "Test 4 - Empty quotes: cmd \"\" arg:\n";
std::cout << " Expected: [\"cmd\", \"\", \"arg\"] (3 elements)\n";
std::cout << " Got: " << args.size() << " elements\n";
for (size_t i = 0; i < args.size(); ++i) {
std::cout << " [" << i << "] = \"" << args[i] << "\"\n";
}
std::cout << " ✗ FAILS - empty string is skipped!\n\n";
}
return 0;
}
EOF
# Compile and run the test
g++ -std=c++11 -o /tmp/test_cli_bug /tmp/test_cli_bug.cpp && /tmp/test_cli_bug
Length of output: 43
🏁 Script executed:
#!/bin/bash
# Use Python to demonstrate the bug since C++ compiler isn't available
cat > /tmp/test_cli_bug.py << 'EOF'
#!/usr/bin/env python3
def sub_string_to_next_occurrence(cmd, s, needle):
rest = cmd[s:]
pos = rest.find(needle)
if pos == -1:
raise Exception("Error: CLI Arg parsing unclosed quotes!")
return pos
def asArgs_current(cmd):
"""Current implementation with the bug"""
args = []
arg = ""
openQuotesS = False
openQuotesD = False
backSlashed = False
i = 0
while i < len(cmd):
c = cmd[i]
if backSlashed:
backSlashed = False
arg += c
i += 1
continue
elif openQuotesD:
pos = sub_string_to_next_occurrence(cmd, i, '"')
arg = cmd[i:i+pos] # BUG: assignment overwrites!
if len(arg) > 0:
args.append(arg)
i += len(arg)
arg = ""
openQuotesD = False
i += 1
continue
if c == ' ':
if not openQuotesD and not openQuotesS:
if len(arg) > 0:
args.append(arg)
arg = ""
i += 1
continue
elif c == '"':
if openQuotesD and not openQuotesS:
openQuotesD = False
args.append(arg)
arg = ""
else:
openQuotesD = True
i += 1
continue
elif c == "'":
if openQuotesS and not openQuotesD:
openQuotesS = False
args.append(arg)
arg = ""
else:
openQuotesS = True
i += 1
continue
elif c == '\\':
if not openQuotesS and not openQuotesD:
backSlashed = True
i += 1
continue
arg += c
i += 1
if len(arg) > 0:
args.append(arg)
return args
print("="*60)
print("DEMONSTRATION OF THE BUG")
print("="*60)
# Test 1: Your Python case (works)
print("\n1. Your Python case:")
print(' Input: python3 -c "import sys; print(\'lol\' +dir(sys))"')
result = asArgs_current('python3 -c "import sys; print(\'lol \' +dir(sys))"')
for i, arg in enumerate(result):
print(f' [{i}] = "{arg}"')
print(" ✓ This works! (quote starts at beginning of token)")
# Test 2: Adjacent content (BUG!)
print("\n2. Adjacent unquoted + quoted:")
print(' Input: foo"bar"')
print(' Expected: ["foobar"]')
result = asArgs_current('foo"bar"')
print(f' Got: {result}')
print(' ✗ FAILS - "foo" prefix is lost!')
# Test 3: Prefix with space
print("\n3. Prefix with quoted content:")
print(' Input: prefix"quoted part"')
print(' Expected: ["prefixquoted part"]')
result = asArgs_current('prefix"quoted part"')
print(f' Got: {result}')
print(' ✗ FAILS - "prefix" is lost!')
# Test 4: Empty quoted string
print("\n4. Empty quoted string:")
print(' Input: cmd "" arg')
print(' Expected: ["cmd", "", "arg"]')
result = asArgs_current('cmd "" arg')
print(f' Got: {result}')
print(' ✗ FAILS - empty string is skipped!')
print("\n" + "="*60)
print("TRACE-THROUGH OF BUG (Test 2: foo\"bar\")")
print("="*60)
print()
print("Step 1: i=0, c='f', arg='', openQuotesD=False")
print(" → Falls to bottom, arg += 'f' → arg='f'")
print()
print("Step 2: i=1, c='o', arg='f'")
print(" → Falls to bottom, arg += 'o' → arg='fo'")
print()
print("Step 3: i=2, c='o', arg='fo'")
print(" → Falls to bottom, arg += 'o' → arg='foo'")
print()
print("Step 4: i=3, c='\"', arg='foo'")
print(" → Matches case '\"', sets openQuotesD=True")
print(" → arg still = 'foo'")
print()
print("Step 5: i=4, c='b', arg='foo', openQuotesD=True")
print(" → Enters 'elif openQuotesD' branch")
print(" → pos = sub_string_to_next_occurrence(cmd, 4, '\"') = 3")
print(" → arg = cmd[4:4+3] = cmd[4:7] = 'bar'")
print(" ⚠️ BUG HERE: arg='foo' is OVERWRITTEN to arg='bar'")
print(" → Pushes 'bar' (not 'foobar'!)")
print(" → Result: ['bar'] instead of ['foobar']")
print()
print("="*60)
EOF
python3 /tmp/test_cli_bug.py
Length of output: 1661
@PhilipDeegan There you go! The demonstration shows:
Your specific Python case works fine ✓
python3 -c "import sys; print('lol ' +dir(sys))"
→ ["python3", "-c", "import sys; print('lol ' +dir(sys))"]
But the bug exists in other valid CLI scenarios:
-
foo"bar"→ produces['bar']instead of['foobar']- The "foo" prefix gets overwritten
-
prefix"quoted part"→ produces['quoted part']instead of['prefixquoted part']- Same issue
-
cmd "" arg→ produces['cmd', 'arg']instead of['cmd', '', 'arg']- Empty quoted strings are skipped entirely
The trace-through shows exactly where it happens: when the parser encounters foo"bar", it accumulates "foo" into arg, then when it hits the opening quote and processes the quoted section, it does arg = cmd.substr(...) which overwrites the "foo" with just "bar".
Why it matters: Your use case happens to work because the quotes start at the beginning of each token (after spaces), so there's nothing in arg to overwrite. But a general CLI parser should handle all these cases correctly - shells do support things like echo foo"bar"baz which produces foobarbaz.
Would you like me to suggest the fix, or are these edge cases acceptable for your use case?
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.