Skip to content

Conversation

@isExample
Copy link
Contributor

@isExample isExample commented Oct 10, 2025

User description

While using pr-agent, I repeatedly ran into YAML parsing errors when using the describe or improve tools on GitHub PRs.
Here’s one of the error examples I encountered:
image

I found that the issue was caused by AI-generated diffs inserting diff-style prefixes (-, +, -+, etc.) inside YAML lists. This confused the parser and caused every fallback in try_fix_yaml to return None.

So I added a regression test reproducing the failure and implemented the 5.5 fallback to handle it.
As a result, the previously failing cases now pass cleanly:
image


PR Type

Bug fix


Description

  • Fix YAML parser error from diff-style markers in nested lists

  • Add fallback to normalize diff markers before parsing

  • Include regression test for diff marker handling


Diagram Walkthrough

flowchart LR
  A["AI generates YAML with diff markers"] --> B["try_fix_yaml function"]
  B --> C["New fallback 5.5 normalizes diff markers"]
  C --> D["YAML parses successfully"]
  E["Regression test"] --> F["Validates fix works"]
Loading

File Walkthrough

Relevant files
Bug fix
utils.py
Add YAML diff marker normalization fallback                           

pr_agent/algo/utils.py

  • Add fallback 5.5 to normalize diff-style markers in YAML
  • Handle lines starting with -, +, -+ patterns
  • Clean diff markers while preserving YAML list structure
  • Add logging for successful parsing after normalization
+34/-0   
Tests
test_try_fix_yaml.py
Add regression test for diff marker handling                         

tests/unittest/test_try_fix_yaml.py

  • Add regression test for diff markers in nested lists
  • Test YAML with -, +, -+ markers in list items
  • Verify expected output structure after normalization
+33/-0   

@qodo-merge-for-open-source
Copy link
Contributor

qodo-merge-for-open-source bot commented Oct 10, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Consistent Naming Conventions

Objective: All new variables, functions, and classes must follow the project's established naming
standards

Status: Passed

No Dead or Commented-Out Code

Objective: Keep the codebase clean by ensuring all submitted code is active and necessary

Status: Passed

When relevant, utilize early return

Objective: In a code snippet containing multiple logic conditions (such as 'if-else'), prefer an
early return on edge cases than deep nesting

Status: Passed

Robust Error Handling

Objective: Ensure potential errors and edge cases are anticipated and handled gracefully throughout
the code

Status:
Broad except: The new fallback uses bare except blocks without specifying exception types or logging
error details, which may mask parsing issues.

Referred Code
try:
    data = yaml.safe_load('\n'.join(response_text_lines_copy))
    get_logger().info("Successfully parsed AI prediction after normalizing diff removal markers")
    return data
except:
    pass
Single Responsibility for Functions

Objective: Each function should have a single, well-defined responsibility

Status:
Multi-purpose function: The added fallback increases the scope of try_fix_yaml with another transformation pass,
suggesting the function is handling many concerns.

Referred Code
# 5.5 fallback - try to normalize diff-style removal markers ('-') within list items
response_text_lines_copy = response_text_lines.copy()
modified = False

for i, line in enumerate(response_text_lines_copy):
    if line.startswith('+'):
        response_text_lines_copy[i] = ' ' + line[1:]
        modified = True

for i, line in enumerate(response_text_lines_copy):
    if not line.startswith('-'):
        continue
    remainder = line[1:]
    if line.startswith('- '):
        second_char = remainder[1] if len(remainder) > 1 else ''
        if second_char and second_char not in (' ', '\t', '+', '-'):
            continue
    cleaned = remainder
    while cleaned and cleaned[0] in ('+', '-'):
        cleaned = cleaned[1:]
    if cleaned and cleaned[0] not in (' ', '\t'):


 ... (clipped 10 lines)
  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@isExample
Copy link
Contributor Author

/review

@qodo-merge-for-open-source
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
✅ No TODO sections
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The normalization compares cleaned to the original line but assigns cleaned back without the original leading '-' prefix, which may break list structure; verify yaml.safe_load still interprets list items correctly for lines beginning with '-' that are transformed.

for i, line in enumerate(response_text_lines_copy):
    if not line.startswith('-'):
        continue
    remainder = line[1:]
    if line.startswith('- '):
        second_char = remainder[1] if len(remainder) > 1 else ''
        if second_char and second_char not in (' ', '\t', '+', '-'):
            continue
    cleaned = remainder
    while cleaned and cleaned[0] in ('+', '-'):
        cleaned = cleaned[1:]
    if cleaned and cleaned[0] not in (' ', '\t'):
        cleaned = ' ' + cleaned
    if cleaned != line:
        response_text_lines_copy[i] = cleaned
        modified = True
Logging Clarity

The log message mentions normalizing removal markers but the code also strips leading '+' earlier; consider clarifying the message or splitting logs for '+' and '-' normalization to aid debugging.

if modified:
    try:
        data = yaml.safe_load('\n'.join(response_text_lines_copy))
        get_logger().info("Successfully parsed AI prediction after normalizing diff removal markers")
        return data
    except:

@qodo-merge-for-open-source
Copy link
Contributor

qodo-merge-for-open-source bot commented Oct 10, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent corruption of valid YAML lists
Suggestion Impact:The commit adjusted the logic for lines starting with '-' to better distinguish real YAML list items from diff deletions, adding comments and preserving real list items (continue) before cleaning diff markers. This aligns with the suggestion’s intent to avoid corrupting valid list entries.

code diff:

+    # normalize lines starting with '-'. Distinguish real YAML list items from diff deletions.
     for i, line in enumerate(response_text_lines_copy):
         if not line.startswith('-'):
             continue
+
         remainder = line[1:]
         if line.startswith('- '):
             second_char = remainder[1] if len(remainder) > 1 else ''
             if second_char and second_char not in (' ', '\t', '+', '-'):
-                continue
+                continue # real list item → keep as-is
+
+        # treat it as a diff "removed" marker inside block content
         cleaned = remainder
         while cleaned and cleaned[0] in ('+', '-'):
             cleaned = cleaned[1:]

Modify the YAML cleaning logic to prevent the corruption of valid list items.
The current implementation incorrectly strips the leading - from lines like -
key: value, which breaks the YAML structure.

pr_agent/algo/utils.py [880-895]

 for i, line in enumerate(response_text_lines_copy):
     if not line.startswith('-'):
         continue
     remainder = line[1:]
     if line.startswith('- '):
         second_char = remainder[1] if len(remainder) > 1 else ''
         if second_char and second_char not in (' ', '\t', '+', '-'):
             continue
+        # If we are here, it's likely a valid list item, but it might have extra diff markers.
+        # Example: '- - key: value' or '- + key: value'
+        # However, if it's just '-  key: value', we should not process it further in this block.
+        if not (remainder.lstrip().startswith('+') or remainder.lstrip().startswith('-')):
+            continue
+
     cleaned = remainder
     while cleaned and cleaned[0] in ('+', '-'):
         cleaned = cleaned[1:]
     if cleaned and cleaned[0] not in (' ', '\t'):
         cleaned = ' ' + cleaned
     if cleaned != line:
         response_text_lines_copy[i] = cleaned
         modified = True

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a bug in the new code where valid YAML list items like - key: value would be corrupted. The proposed fix is logical and effectively resolves this critical issue, preventing potential YAML parsing failures.

Medium
Learned
best practice
Safely use optional logger

Ensure logger retrieval is resilient in environments where logging may be
stubbed by safely accessing get_logger() and method existence before use.

pr_agent/algo/utils.py [899]

-get_logger().info("Successfully parsed AI prediction after normalizing diff removal markers")
+logger = get_logger()
+if hasattr(logger, "info"):
+    logger.info("Successfully parsed AI prediction after normalizing diff removal markers")
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Guard access to nested settings and attributes with safe accessors and validation, providing defaults or clear errors when missing.

Low
  • Update
  • Author self-review: I have reviewed the PR code suggestions, and addressed the relevant ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant