fix: json loads issue parsing LLM responses#254
fix: json loads issue parsing LLM responses#254tylerhutcherson wants to merge 3 commits intomainfrom
Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
The PR successfully addresses issue #236 by introducing a robust JSON parser for LLM responses that handles markdown code fences and prose wrapping. The implementation is clean and well-tested. I've identified one potential edge case issue in the parsing logic that could lead to incorrect behavior.
|
|
||
| # Fall back to scanning for embedded JSON objects/arrays inside prose. | ||
| for match in _JSON_START_RE.finditer(content): | ||
| candidate = content[match.start() :].lstrip() |
There was a problem hiding this comment.
Potential Bug: Hash collision with different whitespace
The deduplication logic using seen.add(candidate) may fail to skip actual duplicates when the same JSON content has different whitespace. For example:
# These would be treated as different candidates even though they're the same JSON:
candidate1 = '{"key": "value"}'
candidate2 = '{ "key": "value" }'This could cause:
- Unnecessary parsing attempts for the same JSON with different formatting
- Incorrect behavior if the first attempt fails due to trailing characters but a later whitespace variant would succeed
Suggested improvement:
Consider normalizing candidates before deduplication, or use a more robust approach:
def _iter_json_candidates(content: str) -> Iterator[str]:
"""Yield likely JSON payloads embedded within an LLM response."""
seen: set[str] = set()
for match in _CODE_FENCE_RE.finditer(content):
candidate = match.group(1).strip()
# Normalize whitespace for deduplication
normalized = ' '.join(candidate.split())
if candidate and normalized not in seen:
seen.add(normalized)
yield candidate
# Fall back to scanning for embedded JSON objects/arrays inside prose.
for match in _JSON_START_RE.finditer(content):
candidate = content[match.start() :].lstrip()
normalized = ' '.join(candidate.split())
if candidate and normalized not in seen:
seen.add(normalized)
yield candidateAlternatively, since you're already using a try/catch for parsing, you might consider removing the deduplication entirely and just rely on finding the first successfully parsed JSON, as the performance cost is likely minimal.
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #236 by making JSON parsing of LLM responses resilient to markdown code fences and surrounding prose, preventing extraction/memory-strategy flows from failing when models don’t return a bare JSON object.
Changes:
- Add a shared
parse_llm_jsonhelper to normalize/parse fenced or prose-wrapped JSON before decoding. - Apply the helper across entity extraction, topic extraction, and all memory strategy extraction paths.
- Add regression tests for wrapped JSON responses and update various tests/notebook cells for formatting consistency.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| agent_memory_server/utils/llm_json.py | Adds shared parser for fenced/prose-wrapped JSON responses. |
| agent_memory_server/extraction.py | Uses shared parser for entity/topic extraction JSON decoding. |
| agent_memory_server/memory_strategies.py | Uses shared parser for strategy extraction JSON decoding. |
| tests/test_issue_236.py | Adds regression tests covering wrapped JSON and all affected strategies. |
| tests/test_working_memory.py | Assertion formatting adjustments (no functional change). |
| tests/test_tool_contextual_grounding.py | Assertion formatting adjustments (no functional change). |
| tests/test_thread_aware_grounding.py | Assertion formatting adjustments (no functional change). |
| tests/test_mcp.py | Assertion formatting adjustments (no functional change). |
| tests/test_long_term_memory.py | Assertion formatting adjustments (no functional change). |
| tests/test_issue_235.py | Assertion formatting adjustments (no functional change). |
| tests/test_full_integration.py | Assertion formatting adjustments (no functional change). |
| tests/test_contextual_grounding_integration.py | Assertion formatting adjustments (no functional change). |
| tests/test_context_percentage_calculation.py | Assertion formatting adjustments (no functional change). |
| tests/test_client_tool_calls.py | Assertion formatting adjustments (no functional change). |
| tests/test_api.py | Assertion formatting adjustments (no functional change). |
| tests/integration/test_task_error_message_clearable.py | Assertion formatting adjustments (no functional change). |
| tests/integration/test_deduplication_e2e.py | Assertion formatting adjustments (no functional change). |
| agent-memory-client/tests/test_tool_schemas.py | Assertion formatting adjustments (no functional change). |
| examples/agent_memory_server_interactive_guide.ipynb | Minor formatting/consistency tweaks to example code cells. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _CODE_FENCE_RE = re.compile(r"```(?:json)?\s*(.*?)\s*```", re.IGNORECASE | re.DOTALL) | ||
| _JSON_START_RE = re.compile(r"[{\[]") | ||
|
|
There was a problem hiding this comment.
parse_llm_json’s fallback scan matches both { and [ (_JSON_START_RE = r"[{\[]"), so it can successfully parse a top-level list/scalar from a prose-wrapped response. All current call sites assume a dict and call .get(...), which will raise AttributeErrorfor non-dict JSON. Consider restricting the scan to{(since callers requestjson_object`) and/or validating the parsed value is a dict and raising a decode/validation error otherwise.
| try: | ||
| entities = json.loads(response.content).get("entities", []) | ||
| entities = parse_llm_json(response.content).get("entities", []) | ||
| except (json.JSONDecodeError, KeyError): |
There was a problem hiding this comment.
The exception handler includes KeyError, but .get("entities", []) won’t raise KeyError. The more likely failure mode here (especially with the new lenient parser) is AttributeError/TypeError if the parsed JSON isn’t a dict (e.g., model returns a top-level list). Catching those and treating them as a parse failure would avoid unexpected retries/crashes.
| except (json.JSONDecodeError, KeyError): | |
| except (json.JSONDecodeError, AttributeError, TypeError): |
| ) | ||
| try: | ||
| response_data = json.loads(response.content) | ||
| response_data = parse_llm_json(response.content) |
There was a problem hiding this comment.
parse_llm_json returns Any, but this code assumes it’s a dict (response_data.get(...)) and only catches JSONDecodeError. If the model returns a valid non-object JSON value (list/string/etc.), this will raise AttributeError and won’t be logged as a decode problem. Consider validating response_data is a dict (or catching AttributeError/TypeError and re-raising as a decode/validation error) so retries and logs behave consistently.
| response_data = parse_llm_json(response.content) | |
| response_data = parse_llm_json(response.content) | |
| if not isinstance(response_data, dict): | |
| logger.error( | |
| "Expected JSON object with 'memories' key, got %s: %r", | |
| type(response_data).__name__, | |
| response.content, | |
| ) | |
| # Treat non-object top-level JSON as a decode/validation error | |
| raise json.JSONDecodeError( | |
| "Expected JSON object at top level", | |
| response.content, | |
| 0, | |
| ) |
| for match in _JSON_START_RE.finditer(content): | ||
| candidate = content[match.start() :].lstrip() |
There was a problem hiding this comment.
From what I'm seeing, it looks like this regex approach would produce a lot of results if your JSON contained a list of dictionaries (or if you had a dictionary that contained lists).
>>> p = re.compile(r"[{\[]")
>>> s = '```json\n[{"a": 1}, {"a": 2}, {"a": 3}]\n```'
>>> for match in p.finditer(s):
... print(s[match.start() :].lstrip())
[{"a": 1}, {"a": 2}, {"a": 3}]
\```
{"a": 1}, {"a": 2}, {"a": 3}]
\```
{"a": 2}, {"a": 3}]
\```
{"a": 3}]
\```It probably won't matter since parse_llm_json should be able to parse an earlier match and not reach these awkward ones, which wouldn't be parseable anyway, but it leaves a small door open to strange behaviour.
|
I need to do more here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| candidate = content[match.start() :].lstrip() | ||
| if candidate and candidate not in seen: | ||
| seen.add(candidate) | ||
| yield candidate |
There was a problem hiding this comment.
Uncaught AttributeError when parser extracts non-dict JSON
Medium Severity
The _JSON_START_RE fallback matches every { and [ in the content. If prose contains a small valid JSON fragment before the intended object (e.g., "I found [3] entities: {\"entities\": [...]}") then raw_decode succeeds on [3], returning a list. All callers immediately call .get(...) on the result, which raises AttributeError — not caught by the existing except (json.JSONDecodeError, KeyError) handlers. Previously json.loads would have raised JSONDecodeError on this input, which was properly handled. This regression converts a graceful fallback into an uncaught exception that triggers three tenacity retries followed by a RetryError.


Fixes #236
This PR fixes LLM JSON parsing in extraction and memory strategy flows when models return JSON
wrapped in markdown code fences or surrounded by commentary instead of returning a bare JSON
object.
It adds a small shared parser that normalizes fenced and prose-wrapped JSON responses before
json.loads, and applies that parser to all affected call sites in entity extraction, topic
extraction, and memory strategy extraction. This prevents silent loss of extracted entities/topics
and avoids retry failures in memory strategy processing for models that ignore
response_format={"type": "json_object"}.
The PR also adds regression tests covering:
Validation:
Note
Medium Risk
Touches core LLM extraction paths; while behavior should only broaden accepted response formats, parsing heuristics could misinterpret unexpected content and change what gets extracted/indexed.
Overview
Fixes intermittent extraction failures when models ignore
response_formatand return JSON inside markdown code fences or with surrounding commentary.Adds
parse_llm_json()to recover JSON payloads from raw/fenced/prose-wrapped strings and wires it into entity/topic extraction (extraction.py) and all memory strategy extractors (memory_strategies.py), plus regression tests for issue #236 covering wrapped responses across strategies.Written by Cursor Bugbot for commit ae40be8. This will update automatically on new commits. Configure here.