fix: YAML parser error from diff-style lines in nested lists #2061

isExample · 2025-10-10T16:39:46Z

User description

While using pr-agent, I repeatedly ran into YAML parsing errors when using the describe or improve tools on GitHub PRs.
Here’s one of the error examples I encountered:

I found that the issue was caused by AI-generated diffs inserting diff-style prefixes (-, +, -+, etc.) inside YAML lists. This confused the parser and caused every fallback in try_fix_yaml to return None.

So I added a regression test reproducing the failure and implemented the 5.5 fallback to handle it.
As a result, the previously failing cases now pass cleanly:

PR Type

Bug fix

Description

Fix YAML parser error from diff-style markers in nested lists
Add fallback to normalize diff markers before parsing
Include regression test for diff marker handling

Diagram Walkthrough

flowchart LR
  A["AI generates YAML with diff markers"] --> B["try_fix_yaml function"]
  B --> C["New fallback 5.5 normalizes diff markers"]
  C --> D["YAML parses successfully"]
  E["Regression test"] --> F["Validates fix works"]

File Walkthrough

Relevant files

Bug fix

utils.py `Add YAML diff marker normalization fallback` pr_agent/algo/utils.py Add fallback 5.5 to normalize diff-style markers in YAML Handle lines starting with `-`, `+`, `-+` patterns Clean diff markers while preserving YAML list structure Add logging for successful parsing after normalization	+34/-0

Tests

test_try_fix_yaml.py `Add regression test for diff marker handling` tests/unittest/test_try_fix_yaml.py Add regression test for diff markers in nested lists Test YAML with `-`, `+`, `-+` markers in list items Verify expected output structure after normalization	+33/-0

qodo-merge-for-open-source · 2025-10-10T16:40:16Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Consistent Naming Conventions Objective: All new variables, functions, and classes must follow the project's established naming standards Status: Passed
	No Dead or Commented-Out Code Objective: Keep the codebase clean by ensuring all submitted code is active and necessary Status: Passed
	When relevant, utilize early return Objective: In a code snippet containing multiple logic conditions (such as 'if-else'), prefer an early return on edge cases than deep nesting Status: Passed
⚪	Robust Error Handling Objective: Ensure potential errors and edge cases are anticipated and handled gracefully throughout the code Status: Broad except: The new fallback uses bare except blocks without specifying exception types or logging error details, which may mask parsing issues. Referred Code try: data = yaml.safe_load('\n'.join(response_text_lines_copy)) get_logger().info("Successfully parsed AI prediction after normalizing diff removal markers") return data except: pass
	Single Responsibility for Functions Objective: Each function should have a single, well-defined responsibility Status: Multi-purpose function: The added fallback increases the scope of try_fix_yaml with another transformation pass, suggesting the function is handling many concerns. Referred Code # 5.5 fallback - try to normalize diff-style removal markers ('-') within list items response_text_lines_copy = response_text_lines.copy() modified = False for i, line in enumerate(response_text_lines_copy): if line.startswith('+'): response_text_lines_copy[i] = ' ' + line[1:] modified = True for i, line in enumerate(response_text_lines_copy): if not line.startswith('-'): continue remainder = line[1:] if line.startswith('- '): second_char = remainder[1] if len(remainder) > 1 else '' if second_char and second_char not in (' ', '\t', '+', '-'): continue cleaned = remainder while cleaned and cleaned[0] in ('+', '-'): cleaned = cleaned[1:] if cleaned and cleaned[0] not in (' ', '\t'): ... (clipped 10 lines)
Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

isExample · 2025-10-10T16:41:29Z

/review

qodo-merge-for-open-source · 2025-10-10T16:41:41Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
✅ No TODO sections
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The normalization compares cleaned to the original line but assigns cleaned back without the original leading '-' prefix, which may break list structure; verify yaml.safe_load still interprets list items correctly for lines beginning with '-' that are transformed. for i, line in enumerate(response_text_lines_copy): if not line.startswith('-'): continue remainder = line[1:] if line.startswith('- '): second_char = remainder[1] if len(remainder) > 1 else '' if second_char and second_char not in (' ', '\t', '+', '-'): continue cleaned = remainder while cleaned and cleaned[0] in ('+', '-'): cleaned = cleaned[1:] if cleaned and cleaned[0] not in (' ', '\t'): cleaned = ' ' + cleaned if cleaned != line: response_text_lines_copy[i] = cleaned modified = True Logging Clarity The log message mentions normalizing removal markers but the code also strips leading '+' earlier; consider clarifying the message or splitting logs for '+' and '-' normalization to aid debugging. if modified: try: data = yaml.safe_load('\n'.join(response_text_lines_copy)) get_logger().info("Successfully parsed AI prediction after normalizing diff removal markers") return data except:

qodo-merge-for-open-source · 2025-10-10T16:41:45Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	✅ ~~Prevent corruption of valid YAML lists~~ Suggestion Impact: The commit adjusted the logic for lines starting with '-' to better distinguish real YAML list items from diff deletions, adding comments and preserving real list items (continue) before cleaning diff markers. This aligns with the suggestion’s intent to avoid corrupting valid list entries. code diff: + # normalize lines starting with '-'. Distinguish real YAML list items from diff deletions. for i, line in enumerate(response_text_lines_copy): if not line.startswith('-'): continue + remainder = line[1:] if line.startswith('- '): second_char = remainder[1] if len(remainder) > 1 else '' if second_char and second_char not in (' ', '\t', '+', '-'): - continue + continue # real list item → keep as-is + + # treat it as a diff "removed" marker inside block content cleaned = remainder while cleaned and cleaned[0] in ('+', '-'): cleaned = cleaned[1:] Modify the YAML cleaning logic to prevent the corruption of valid list items. The current implementation incorrectly strips the leading `-` from lines like `-` `key: value`, which breaks the YAML structure. pr_agent/algo/utils.py [880-895] for i, line in enumerate(response_text_lines_copy): if not line.startswith('-'): continue remainder = line[1:] if line.startswith('- '): second_char = remainder[1] if len(remainder) > 1 else '' if second_char and second_char not in (' ', '\t', '+', '-'): continue + # If we are here, it's likely a valid list item, but it might have extra diff markers. + # Example: '- - key: value' or '- + key: value' + # However, if it's just '- key: value', we should not process it further in this block. + if not (remainder.lstrip().startswith('+') or remainder.lstrip().startswith('-')): + continue + cleaned = remainder while cleaned and cleaned[0] in ('+', '-'): cleaned = cleaned[1:] if cleaned and cleaned[0] not in (' ', '\t'): cleaned = ' ' + cleaned if cleaned != line: response_text_lines_copy[i] = cleaned modified = True `[Suggestion processed]` Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a bug in the new code where valid YAML list items like `- key: value` would be corrupted. The proposed fix is logical and effectively resolves this critical issue, preventing potential YAML parsing failures.	Medium
Learned best practice	Safely use optional logger Ensure logger retrieval is resilient in environments where logging may be stubbed by safely accessing `get_logger()` and method existence before use. pr_agent/algo/utils.py [899] -get_logger().info("Successfully parsed AI prediction after normalizing diff removal markers") +logger = get_logger() +if hasattr(logger, "info"): + logger.info("Successfully parsed AI prediction after normalizing diff removal markers") Apply / Chat Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Guard access to nested settings and attributes with safe accessors and validation, providing defaults or clear errors when missing.	Low
Update

Author self-review: I have reviewed the PR code suggestions, and addressed the relevant ones.

isExample added 2 commits October 11, 2025 01:31

test: add regression test for diff markers in suggestion blocks

6561d04

fix: normalize nested diff markers before YAML parsing

1a4df62

qodo-merge-for-open-source bot added the Review effort 2/5 label Oct 10, 2025

docs: add inline comments explaining diff marker normalization logic

725a26b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: YAML parser error from diff-style lines in nested lists #2061

fix: YAML parser error from diff-style lines in nested lists #2061

isExample commented Oct 10, 2025 •

edited by qodo-merge-for-open-source bot

Loading

Uh oh!

qodo-merge-for-open-source bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

isExample commented Oct 10, 2025

Uh oh!

qodo-merge-for-open-source bot commented Oct 10, 2025

Uh oh!

qodo-merge-for-open-source bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix: YAML parser error from diff-style lines in nested lists #2061

Are you sure you want to change the base?

fix: YAML parser error from diff-style lines in nested lists #2061

Conversation

isExample commented Oct 10, 2025 • edited by qodo-merge-for-open-source bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-for-open-source bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

isExample commented Oct 10, 2025

Uh oh!

qodo-merge-for-open-source bot commented Oct 10, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-for-open-source bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isExample commented Oct 10, 2025 •

edited by qodo-merge-for-open-source bot

Loading

qodo-merge-for-open-source bot commented Oct 10, 2025 •

edited

Loading

qodo-merge-for-open-source bot commented Oct 10, 2025 •

edited

Loading