-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Bug: _parse_code in ProgramOfThought corrupts valid Python code
Description
I've am using CodeAct and encountered issues where the _parse_code method corrupts valid Python code generated by the LLM. There are two separate problems in the parsing logic that break otherwise correct code.
Issue 1: .replace("\\n", "\n") corrupts escape sequences in string literals
Location: dspy/predict/program_of_thought.py, line 139
The .replace("\\n", "\n") call replaces all occurrences of \n with actual newline characters, including those inside Python string literals where they should remain as escape sequences.
Example:
When the LLM generates:
print(f"\nTotal Users: {total_users}")After _parse_code, it becomes:
print(f"
Total Users: {total_users}")This causes: Invalid Python syntax
Issue 2: Regex substitutions corrupt code with = in strings or complex expressions
Location: dspy/predict/program_of_thought.py, lines 144-158
The regex patterns attempt to fix single-line multi-assignment code but don't respect string boundaries or Python syntax:
code_block = re.sub(
r"([a-zA-Z_]\w* *=.*?)(?=[a-zA-Z_]\w* *=)",
r"\1\n",
code_block,
)Example:
When the LLM generates:
data = "users: Alice=25, Bob=30, Carol=28"
youngest = [name for name, age in users.items() if age == min_age]After the regex substitutions:
data = "users:
Alice=25,
Bob=30,
Carol=28"
youngest = [name for name, age in users.items() if
age == min_age]The regex inserts newlines inside string literals and breaks list comprehensions, causing syntax errors.
Steps to reproduce
import dspy
from dspy.predict import CodeAct
lm = dspy.configure(lm=dspy.LM("anthropic/claude-sonnet-4-5"))
class DataSummarySignature(dspy.Signature):
"""Analyze data and print a formatted summary report."""
data: str = dspy.InputField(desc="Data to analyze")
summary: str = dspy.OutputField(desc="Summary report")
act = CodeAct(signature=DataSummarySignature, tools=[], max_iters=3)
result = act(data="users: Alice=25, Bob=30, Carol=28")Suggested Fix
Simplify _parse_code to only extract code from markdown blocks without attempting to reformat it:
def _parse_code(self, code_data):
code = code_data.get("generated_code", "").split("---", 1)[0].split("\n\n\n", 1)[0]
code_match = re.search(r"```python[ \n](.*?)[ \n]```?", code, re.DOTALL)
code_block = code_match.group(1) if code_match else code
if not code_block:
return code, "Error: Empty code after parsing."
return code_block, NoneI'm happy to submit a PR with these changes if this approach looks reasonable. Thank you for the excellent work on DSPy!
DSPy version
3.1.0b1