-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Initial Checks
- I'm using the latest version of Pydantic AI
- I've searched for my issue in the issue tracker before opening this issue
Description
The .* quantifier is greedy — with re.DOTALL, it matches from the first { all the way to the last } in the entire text, crossing past the closing ``` fence.
Example
An LLM returns: ```json
{"result": "pass"}
This conforms to the schema {"type": "object"}
Expected extraction: {"result": "pass"}
Actual extraction: {"result": "pass"}\n ``` \nThis conforms to the schema {"type": "object"} the greedy `.*` matched from the first `{` to the last `}` in the entire string, gobbling up the closing fence and
everything after it.
This produces malformed JSON that fails downstream validation, potentially triggering unnecessary retries or errors.
The bug only manifests when there's a } character after the closing ``` fence. That's row 1 — the only case where broken and expected differ:
Input: json\n{"a": 1}\n + Context: {"b": 2}
Current (broken): {"a": 1}\n``` \nContext: {"b": 2}
Expected: {"a": 1}
Input: json\n{"nested": {"k": "v"}}\n
Current (broken): {"nested": {"k": "v"}}
Expected:: {"nested": {"k": "v"}}
Input: ```json\n{\n "foo": "bar"\n} (no closing fence)
Current (broken): {\n "foo": "bar"\n}
Expected:: {\n "foo": "bar"\n}
Minimal, Reproducible Example
from pydantic_ai._utils import strip_markdown_fences
# LLM returns JSON in a markdown fence, followed by commentary containing
braces
text = '\n{"result": "pass"}\n\nThis matches schema {"type":
"object"}'
extracted = strip_markdown_fences(text)
print(repr(extracted))
# Actual: '{"result": "pass"}\n\nThis matches schema {"type":
"object"}'
# Expected: '{"result": "pass"}'
The greedy .* in the regex r'(?:\w+)?\n(\{.*\})' matches from the first {
to the **last** } in the entire string, crossing past the closing fence
and capturing the commentary text as part of the "JSON".Logfire Trace
N/A — this bug is in the strip_markdown_fences() regex utility function (_utils.py:517), not in a model request or agent run. It can be reproduced with a direct function call without any LLM interaction.
Python, Pydantic AI & LLM client version
- Python: 3.12+
- Pydantic AI: main branch (latest)
- LLM provider SDK: Any (bug is in utility function, not provider-specific)