Skip to content

[Bug] _parse_code in ProgramOfThought corrupts valid Python code #9214

@srmsoumya

Description

@srmsoumya

Bug: _parse_code in ProgramOfThought corrupts valid Python code

Description

I've am using CodeAct and encountered issues where the _parse_code method corrupts valid Python code generated by the LLM. There are two separate problems in the parsing logic that break otherwise correct code.

Issue 1: .replace("\\n", "\n") corrupts escape sequences in string literals

Location: dspy/predict/program_of_thought.py, line 139

The .replace("\\n", "\n") call replaces all occurrences of \n with actual newline characters, including those inside Python string literals where they should remain as escape sequences.

Example:
When the LLM generates:

print(f"\nTotal Users: {total_users}")

After _parse_code, it becomes:

print(f"
Total Users: {total_users}")

This causes: Invalid Python syntax


Issue 2: Regex substitutions corrupt code with = in strings or complex expressions

Location: dspy/predict/program_of_thought.py, lines 144-158

The regex patterns attempt to fix single-line multi-assignment code but don't respect string boundaries or Python syntax:

code_block = re.sub(
    r"([a-zA-Z_]\w* *=.*?)(?=[a-zA-Z_]\w* *=)",
    r"\1\n",
    code_block,
)

Example:
When the LLM generates:

data = "users: Alice=25, Bob=30, Carol=28"
youngest = [name for name, age in users.items() if age == min_age]

After the regex substitutions:

data = "users: 
Alice=25, 
Bob=30, 
Carol=28"
youngest = [name for name, age in users.items() if 
age == min_age]

The regex inserts newlines inside string literals and breaks list comprehensions, causing syntax errors.

Steps to reproduce

import dspy
from dspy.predict import CodeAct

lm = dspy.configure(lm=dspy.LM("anthropic/claude-sonnet-4-5"))

class DataSummarySignature(dspy.Signature):
    """Analyze data and print a formatted summary report."""
    data: str = dspy.InputField(desc="Data to analyze")
    summary: str = dspy.OutputField(desc="Summary report")

act = CodeAct(signature=DataSummarySignature, tools=[], max_iters=3)
result = act(data="users: Alice=25, Bob=30, Carol=28")

Suggested Fix

Simplify _parse_code to only extract code from markdown blocks without attempting to reformat it:

def _parse_code(self, code_data):
    code = code_data.get("generated_code", "").split("---", 1)[0].split("\n\n\n", 1)[0]
    code_match = re.search(r"```python[ \n](.*?)[ \n]```?", code, re.DOTALL)
    code_block = code_match.group(1) if code_match else code
    if not code_block:
        return code, "Error: Empty code after parsing."
    return code_block, None

I'm happy to submit a PR with these changes if this approach looks reasonable. Thank you for the excellent work on DSPy!

DSPy version

3.1.0b1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions