Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 32 additions & 6 deletions inc/mkn/kul/src/cli/asArgs.ipp
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,48 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
// IWYU pragma: private, include "mkn/kul/cli.hpp"

namespace {
//
std::size_t sub_string_to_next_occurrence(std::string const& cmd, std::size_t const s,
std::string const& needle) {
std::string const rest = cmd.substr(s);
auto const pos = rest.find(needle);
if (pos == std::string::npos)
KEXCEPT(mkn::kul::Exception, "Error: CLI Arg parsing unclosed quotes!");
return pos;
}

} // namespace

void mkn::kul::cli::asArgs(std::string const& cmd, std::vector<std::string>& args) {
std::string arg;
bool openQuotesS = false, openQuotesD = false, backSlashed = false;
for (char const c : cmd) {

for (std::size_t i = 0; i < cmd.size(); ++i) {
auto const c = cmd[i];

if (backSlashed) {
backSlashed = false;
arg += c;
continue;

} else if (openQuotesD) {
auto const pos = sub_string_to_next_occurrence(cmd, i, "\"");
arg = cmd.substr(i, pos);
if (arg.size() > 0) args.push_back(arg);
i += arg.size();
arg.clear();
openQuotesD = false;
Comment on lines 50 to 62
Copy link

@coderabbitai coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Quoted substring overwrites existing token and drops empty quoted args.

arg = cmd.substr(i, pos) discards any prefix, so inputs like foo"bar" yield "bar", and the arg.size() check drops empty "" args. This is a regression vs. the prior char-by-char behavior.

🐛 Proposed fix
-      arg = cmd.substr(i, pos);
-      if (arg.size() > 0) args.push_back(arg);
-      i += arg.size();
+      arg += cmd.substr(i, pos);
+      args.push_back(arg);
+      i += pos;
       arg.clear();
       openQuotesD = false;
       continue;
🤖 Prompt for AI Agents
In `@inc/mkn/kul/src/cli/asArgs.ipp` around lines 50 - 64, The quoted-substring
handling in asArgs.ipp currently overwrites any existing partial token and skips
empty quoted args; change the logic in the openQuotesD branch to append the
quoted substring instead of assigning (use arg += ... with
sub_string_to_next_occurrence(cmd, i, "\"")), ensure you push the arg into args
even when it's empty (so "" becomes a token), and advance the loop index by the
substring length returned from sub_string_to_next_occurrence (adjust i by that
length) before clearing arg and resetting openQuotesD; use the existing symbols
openQuotesD, arg, args, cmd, and sub_string_to_next_occurrence to locate and
modify the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at the test, is there some strings you can suggest that are considered valid for command line argument parsing?

really what I'm trying to fix is the following not working

process.run("python3", "-c", "\"import sys; print('lol ' +dir(sys))\"")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the test file to understand the current test structure
cat tst/test/cli.cpp

Length of output: 393


@PhilipDeegan Here are comprehensive test strings for CLI argument parsing that would validate your implementation:

// Test 1: Empty quoted strings
Input:  `cmd "" arg`
Expected: ["cmd", "", "arg"]

// Test 2: Single quotes inside double quotes (current test)
Input:  `"words 'in' quotes"`
Expected: ["words 'in' quotes"]

// Test 3: Double quotes inside single quotes
Input:  `'words "in" quotes'`
Expected: ["words \"in\" quotes"]

// Test 4: Adjacent quoted and unquoted content
Input:  `foo"bar"baz`
Expected: ["foobarbaz"]

// Test 5: Multiple spaces between args
Input:  `cmd    arg1   arg2`
Expected: ["cmd", "arg1", "arg2"]

// Test 6: Backslash escaping special chars
Input:  `cmd arg\ with\ spaces end`
Expected: ["cmd", "arg with spaces", "end"]

// Test 7: Your Python use case
Input:  `python3 -c "import sys; print('lol ' +dir(sys))"`
Expected: ["python3", "-c", "import sys; print('lol ' +dir(sys))"]

// Test 8: Mixed quotes and escapes
Input:  `cmd "quoted arg" 'single quoted' escaped\ space`
Expected: ["cmd", "quoted arg", "single quoted", "escaped space"]

// Test 9: Empty string at start/end
Input:  `"" cmd ""`
Expected: ["", "cmd", ""]

// Test 10: Quote at end of unquoted content
Input:  `prefix"quoted part"`
Expected: ["prefixquoted part"]

For your specific Python case:

process.run("python3", "-c", "\"import sys; print('lol ' +dir(sys))\"")

When the third argument ("\"import sys; print('lol ' +dir(sys))\"") is parsed by asArgs, it should produce:

  • The outer "..." are just Python string delimiters (not parsed by asArgs)
  • The escaped \" become literal " characters in the input string
  • So asArgs sees: "import sys; print('lol ' +dir(sys))"
  • Which should parse to: import sys; print('lol ' +dir(sys))

Note: The bug I flagged earlier (using arg = cmd.substr(...) instead of arg += cmd.substr(...)) will cause Test 4 and Test 10 to fail because any content before the opening quote gets discarded. This will also affect your Python case if there's any prefix before quotes.

Would you like me to help create a comprehensive test suite with all these cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh really?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@PhilipDeegan Yes! Let me demonstrate the bug with a concrete test case:

This will show that your specific Python case does work (because it starts with a quote), but the bug manifests in other valid CLI scenarios where content precedes the opening quote or when quotes are empty.


🏁 Script executed:

#!/bin/bash
# Create a test program to show the bug with the current implementation

cat > /tmp/test_cli_bug.cpp << 'EOF'
`#include` <string>
`#include` <vector>
`#include` <iostream>

namespace mkn { namespace kul { 
  class Exception : public std::exception {
    std::string msg;
  public:
    Exception(std::string m) : msg(m) {}
    const char* what() const noexcept override { return msg.c_str(); }
  };
}}

`#define` KEXCEPT(type, msg) throw type(msg)

namespace {
std::size_t sub_string_to_next_occurrence(std::string const& cmd, std::size_t const s,
                                          std::string const& needle) {
  std::string const rest = cmd.substr(s);
  auto const pos = rest.find(needle);
  if (pos == std::string::npos)
    KEXCEPT(mkn::kul::Exception, "Error: CLI Arg parsing unclosed quotes!");
  return pos;
}
}

void asArgs(std::string const& cmd, std::vector<std::string>& args) {
  std::string arg;
  bool openQuotesS = false, openQuotesD = false, backSlashed = false;

  for (std::size_t i = 0; i < cmd.size(); ++i) {
    auto const c = cmd[i];

    if (backSlashed) {
      backSlashed = false;
      arg += c;
      continue;

    } else if (openQuotesD) {
      auto const pos = sub_string_to_next_occurrence(cmd, i, "\"");
      arg = cmd.substr(i, pos);  // BUG: overwrites instead of appending
      if (arg.size() > 0) args.push_back(arg);
      i += arg.size();
      arg.clear();
      openQuotesD = false;
      continue;
    }

    switch (c) {
      case ' ':
        if (!openQuotesD && !openQuotesS) {
          if (arg.size() > 0) args.push_back(arg);
          arg.clear();
          continue;
        }
        break;
      case '"':
        if (openQuotesD && !openQuotesS) {
          openQuotesD = false;
          args.push_back(arg);
          arg.clear();
        } else {
          openQuotesD = true;
        }
        continue;
      case '\'':
        if (openQuotesS && !openQuotesD) {
          openQuotesS = false;
          args.push_back(arg);
          arg.clear();
        } else {
          openQuotesS = true;
        }
        continue;
      case '\\':
        if (!openQuotesS && !openQuotesD) {
          backSlashed = true;
          continue;
        }
        break;
    }
    arg += c;
  }
  if (arg.size() > 0) args.push_back(arg);
}

int main() {
  // Test 1: Your specific Python case (works because starts with quote)
  {
    std::vector<std::string> args;
    asArgs("python3 -c \"import sys; print('lol ' +dir(sys))\"", args);
    std::cout << "Test 1 - Your Python case:\n";
    for (size_t i = 0; i < args.size(); ++i) {
      std::cout << "  [" << i << "] = \"" << args[i] << "\"\n";
    }
    std::cout << "  ✓ This actually works!\n\n";
  }
  
  // Test 2: Adjacent unquoted + quoted content (BUG!)
  {
    std::vector<std::string> args;
    asArgs("foo\"bar\"", args);
    std::cout << "Test 2 - Adjacent content foo\"bar\":\n";
    std::cout << "  Expected: [\"foobar\"]\n";
    std::cout << "  Got:      [\"" << (args.size() > 0 ? args[0] : "NOTHING") << "\"]\n";
    std::cout << "  ✗ FAILS - 'foo' prefix is lost!\n\n";
  }
  
  // Test 3: Prefix with quoted content
  {
    std::vector<std::string> args;
    asArgs("prefix\"quoted part\"", args);
    std::cout << "Test 3 - prefix\"quoted part\":\n";
    std::cout << "  Expected: [\"prefixquoted part\"]\n";
    std::cout << "  Got:      [\"" << (args.size() > 0 ? args[0] : "NOTHING") << "\"]\n";
    std::cout << "  ✗ FAILS - 'prefix' is lost!\n\n";
  }
  
  // Test 4: Empty quoted string (also dropped!)
  {
    std::vector<std::string> args;
    asArgs("cmd \"\" arg", args);
    std::cout << "Test 4 - Empty quotes: cmd \"\" arg:\n";
    std::cout << "  Expected: [\"cmd\", \"\", \"arg\"] (3 elements)\n";
    std::cout << "  Got:      " << args.size() << " elements\n";
    for (size_t i = 0; i < args.size(); ++i) {
      std::cout << "    [" << i << "] = \"" << args[i] << "\"\n";
    }
    std::cout << "  ✗ FAILS - empty string is skipped!\n\n";
  }
  
  return 0;
}
EOF

# Compile and run the test
g++ -std=c++11 -o /tmp/test_cli_bug /tmp/test_cli_bug.cpp && /tmp/test_cli_bug

Length of output: 43


🏁 Script executed:

#!/bin/bash
# Use Python to demonstrate the bug since C++ compiler isn't available

cat > /tmp/test_cli_bug.py << 'EOF'
#!/usr/bin/env python3

def sub_string_to_next_occurrence(cmd, s, needle):
    rest = cmd[s:]
    pos = rest.find(needle)
    if pos == -1:
        raise Exception("Error: CLI Arg parsing unclosed quotes!")
    return pos

def asArgs_current(cmd):
    """Current implementation with the bug"""
    args = []
    arg = ""
    openQuotesS = False
    openQuotesD = False
    backSlashed = False
    
    i = 0
    while i < len(cmd):
        c = cmd[i]
        
        if backSlashed:
            backSlashed = False
            arg += c
            i += 1
            continue
        
        elif openQuotesD:
            pos = sub_string_to_next_occurrence(cmd, i, '"')
            arg = cmd[i:i+pos]  # BUG: assignment overwrites!
            if len(arg) > 0:
                args.append(arg)
            i += len(arg)
            arg = ""
            openQuotesD = False
            i += 1
            continue
        
        if c == ' ':
            if not openQuotesD and not openQuotesS:
                if len(arg) > 0:
                    args.append(arg)
                arg = ""
                i += 1
                continue
        elif c == '"':
            if openQuotesD and not openQuotesS:
                openQuotesD = False
                args.append(arg)
                arg = ""
            else:
                openQuotesD = True
            i += 1
            continue
        elif c == "'":
            if openQuotesS and not openQuotesD:
                openQuotesS = False
                args.append(arg)
                arg = ""
            else:
                openQuotesS = True
            i += 1
            continue
        elif c == '\\':
            if not openQuotesS and not openQuotesD:
                backSlashed = True
                i += 1
                continue
        
        arg += c
        i += 1
    
    if len(arg) > 0:
        args.append(arg)
    
    return args

print("="*60)
print("DEMONSTRATION OF THE BUG")
print("="*60)

# Test 1: Your Python case (works)
print("\n1. Your Python case:")
print('   Input: python3 -c "import sys; print(\'lol\' +dir(sys))"')
result = asArgs_current('python3 -c "import sys; print(\'lol \' +dir(sys))"')
for i, arg in enumerate(result):
    print(f'   [{i}] = "{arg}"')
print("   ✓ This works! (quote starts at beginning of token)")

# Test 2: Adjacent content (BUG!)
print("\n2. Adjacent unquoted + quoted:")
print('   Input: foo"bar"')
print('   Expected: ["foobar"]')
result = asArgs_current('foo"bar"')
print(f'   Got:      {result}')
print('   ✗ FAILS - "foo" prefix is lost!')

# Test 3: Prefix with space
print("\n3. Prefix with quoted content:")
print('   Input: prefix"quoted part"')
print('   Expected: ["prefixquoted part"]')
result = asArgs_current('prefix"quoted part"')
print(f'   Got:      {result}')
print('   ✗ FAILS - "prefix" is lost!')

# Test 4: Empty quoted string
print("\n4. Empty quoted string:")
print('   Input: cmd "" arg')
print('   Expected: ["cmd", "", "arg"]')
result = asArgs_current('cmd "" arg')
print(f'   Got:      {result}')
print('   ✗ FAILS - empty string is skipped!')

print("\n" + "="*60)
print("TRACE-THROUGH OF BUG (Test 2: foo\"bar\")")
print("="*60)
print()
print("Step 1: i=0, c='f', arg='', openQuotesD=False")
print("  → Falls to bottom, arg += 'f' → arg='f'")
print()
print("Step 2: i=1, c='o', arg='f'")
print("  → Falls to bottom, arg += 'o' → arg='fo'")
print()
print("Step 3: i=2, c='o', arg='fo'")
print("  → Falls to bottom, arg += 'o' → arg='foo'")
print()
print("Step 4: i=3, c='\"', arg='foo'")
print("  → Matches case '\"', sets openQuotesD=True")
print("  → arg still = 'foo'")
print()
print("Step 5: i=4, c='b', arg='foo', openQuotesD=True")
print("  → Enters 'elif openQuotesD' branch")
print("  → pos = sub_string_to_next_occurrence(cmd, 4, '\"') = 3")
print("  → arg = cmd[4:4+3] = cmd[4:7] = 'bar'")
print("  ⚠️  BUG HERE: arg='foo' is OVERWRITTEN to arg='bar'")
print("  → Pushes 'bar' (not 'foobar'!)")
print("  → Result: ['bar'] instead of ['foobar']")
print()
print("="*60)
EOF

python3 /tmp/test_cli_bug.py

Length of output: 1661


@PhilipDeegan There you go! The demonstration shows:

Your specific Python case works fine

python3 -c "import sys; print('lol ' +dir(sys))"
→ ["python3", "-c", "import sys; print('lol ' +dir(sys))"]

But the bug exists in other valid CLI scenarios:

  1. foo"bar" → produces ['bar'] instead of ['foobar']

    • The "foo" prefix gets overwritten
  2. prefix"quoted part" → produces ['quoted part'] instead of ['prefixquoted part']

    • Same issue
  3. cmd "" arg → produces ['cmd', 'arg'] instead of ['cmd', '', 'arg']

    • Empty quoted strings are skipped entirely

The trace-through shows exactly where it happens: when the parser encounters foo"bar", it accumulates "foo" into arg, then when it hits the opening quote and processes the quoted section, it does arg = cmd.substr(...) which overwrites the "foo" with just "bar".

Why it matters: Your use case happens to work because the quotes start at the beginning of each token (after spaces), so there's nothing in arg to overwrite. But a general CLI parser should handle all these cases correctly - shells do support things like echo foo"bar"baz which produces foobarbaz.

Would you like me to suggest the fix, or are these edge cases acceptable for your use case?

continue;
}

switch (c) {
case ' ':
if (!openQuotesD && !openQuotesS) { // ||||| ||||| ||||| ||||| |||||
if (arg.size() > 0) args.push_back(arg); // || || || || || || ||
arg.clear(); // ||||| ||||| ||||| || |||||
continue; // || || || || || ||
} // ||||| || || || ||||| |||||
if (!openQuotesD && !openQuotesS) {
if (arg.size() > 0) args.push_back(arg);
arg.clear();
continue;
}
break;
case '"':
if (openQuotesD && !openQuotesS) {
Expand Down
15 changes: 12 additions & 3 deletions tst/test/cli.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,19 @@

TEST(CLI_Test, ParseCommandLineArguments) {
std::vector<std::string> v;
mkn::kul::cli::asArgs("/path/to \"words in quotes\" words\\ not\\ in\\ quotes end", v);
EXPECT_EQ((size_t)4, v.size());
mkn::kul::cli::asArgs("/path/to \"words 'in' quotes\" words\\ not\\ in\\ quotes end", v);
ASSERT_EQ(4ull, v.size());
EXPECT_EQ("/path/to", v[0]);
EXPECT_EQ("words in quotes", v[1]);
EXPECT_EQ("words 'in' quotes", v[1]);
EXPECT_EQ("words not in quotes", v[2]);
EXPECT_EQ("end", v[3]);
}

TEST(CLI_Test, ParseCommandPythonExample) {
std::vector<std::string> v =
mkn::kul::cli::asArgs("python3 -c \"import sys; print('lol ' + dir(sys)\"");
ASSERT_EQ(3ull, v.size());
EXPECT_EQ("python3", v[0]);
EXPECT_EQ("-c", v[1]);
EXPECT_EQ("import sys; print('lol ' + dir(sys)", v[2]);
}