Skip to content

fix: json loads issue parsing LLM responses#254

Open
tylerhutcherson wants to merge 3 commits intomainfrom
fix/json-loads-llm-response-mkdown
Open

fix: json loads issue parsing LLM responses#254
tylerhutcherson wants to merge 3 commits intomainfrom
fix/json-loads-llm-response-mkdown

Conversation

@tylerhutcherson
Copy link
Copy Markdown
Contributor

@tylerhutcherson tylerhutcherson commented Mar 27, 2026

Fixes #236

This PR fixes LLM JSON parsing in extraction and memory strategy flows when models return JSON
wrapped in markdown code fences or surrounded by commentary instead of returning a bare JSON
object.

It adds a small shared parser that normalizes fenced and prose-wrapped JSON responses before
json.loads, and applies that parser to all affected call sites in entity extraction, topic
extraction, and memory strategy extraction. This prevents silent loss of extracted entities/topics
and avoids retry failures in memory strategy processing for models that ignore
response_format={"type": "json_object"}.

The PR also adds regression tests covering:

  • Raw JSON responses
  • Markdown-fenced JSON responses
  • Prose-wrapped JSON responses
  • All affected memory strategies

Validation:

  • ruff check passed
  • Targeted regression tests passed
  • Full test suite passed with pytest --run-api-tests

Note

Medium Risk
Touches core LLM extraction paths; while behavior should only broaden accepted response formats, parsing heuristics could misinterpret unexpected content and change what gets extracted/indexed.

Overview
Fixes intermittent extraction failures when models ignore response_format and return JSON inside markdown code fences or with surrounding commentary.

Adds parse_llm_json() to recover JSON payloads from raw/fenced/prose-wrapped strings and wires it into entity/topic extraction (extraction.py) and all memory strategy extractors (memory_strategies.py), plus regression tests for issue #236 covering wrapped responses across strategies.

Written by Cursor Bugbot for commit ae40be8. This will update automatically on new commits. Configure here.

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Mar 27, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR successfully addresses issue #236 by introducing a robust JSON parser for LLM responses that handles markdown code fences and prose wrapping. The implementation is clean and well-tested. I've identified one potential edge case issue in the parsing logic that could lead to incorrect behavior.


# Fall back to scanning for embedded JSON objects/arrays inside prose.
for match in _JSON_START_RE.finditer(content):
candidate = content[match.start() :].lstrip()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential Bug: Hash collision with different whitespace

The deduplication logic using seen.add(candidate) may fail to skip actual duplicates when the same JSON content has different whitespace. For example:

# These would be treated as different candidates even though they're the same JSON:
candidate1 = '{"key": "value"}'  
candidate2 = '{  "key":   "value"  }'

This could cause:

  1. Unnecessary parsing attempts for the same JSON with different formatting
  2. Incorrect behavior if the first attempt fails due to trailing characters but a later whitespace variant would succeed

Suggested improvement:

Consider normalizing candidates before deduplication, or use a more robust approach:

def _iter_json_candidates(content: str) -> Iterator[str]:
    """Yield likely JSON payloads embedded within an LLM response."""
    seen: set[str] = set()

    for match in _CODE_FENCE_RE.finditer(content):
        candidate = match.group(1).strip()
        # Normalize whitespace for deduplication
        normalized = ' '.join(candidate.split())
        if candidate and normalized not in seen:
            seen.add(normalized)
            yield candidate

    # Fall back to scanning for embedded JSON objects/arrays inside prose.
    for match in _JSON_START_RE.finditer(content):
        candidate = content[match.start() :].lstrip()
        normalized = ' '.join(candidate.split())
        if candidate and normalized not in seen:
            seen.add(normalized)
            yield candidate

Alternatively, since you're already using a try/catch for parsing, you might consider removing the deduplication entirely and just rely on finding the first successfully parsed JSON, as the performance cost is likely minimal.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #236 by making JSON parsing of LLM responses resilient to markdown code fences and surrounding prose, preventing extraction/memory-strategy flows from failing when models don’t return a bare JSON object.

Changes:

  • Add a shared parse_llm_json helper to normalize/parse fenced or prose-wrapped JSON before decoding.
  • Apply the helper across entity extraction, topic extraction, and all memory strategy extraction paths.
  • Add regression tests for wrapped JSON responses and update various tests/notebook cells for formatting consistency.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
agent_memory_server/utils/llm_json.py Adds shared parser for fenced/prose-wrapped JSON responses.
agent_memory_server/extraction.py Uses shared parser for entity/topic extraction JSON decoding.
agent_memory_server/memory_strategies.py Uses shared parser for strategy extraction JSON decoding.
tests/test_issue_236.py Adds regression tests covering wrapped JSON and all affected strategies.
tests/test_working_memory.py Assertion formatting adjustments (no functional change).
tests/test_tool_contextual_grounding.py Assertion formatting adjustments (no functional change).
tests/test_thread_aware_grounding.py Assertion formatting adjustments (no functional change).
tests/test_mcp.py Assertion formatting adjustments (no functional change).
tests/test_long_term_memory.py Assertion formatting adjustments (no functional change).
tests/test_issue_235.py Assertion formatting adjustments (no functional change).
tests/test_full_integration.py Assertion formatting adjustments (no functional change).
tests/test_contextual_grounding_integration.py Assertion formatting adjustments (no functional change).
tests/test_context_percentage_calculation.py Assertion formatting adjustments (no functional change).
tests/test_client_tool_calls.py Assertion formatting adjustments (no functional change).
tests/test_api.py Assertion formatting adjustments (no functional change).
tests/integration/test_task_error_message_clearable.py Assertion formatting adjustments (no functional change).
tests/integration/test_deduplication_e2e.py Assertion formatting adjustments (no functional change).
agent-memory-client/tests/test_tool_schemas.py Assertion formatting adjustments (no functional change).
examples/agent_memory_server_interactive_guide.ipynb Minor formatting/consistency tweaks to example code cells.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +13
_CODE_FENCE_RE = re.compile(r"```(?:json)?\s*(.*?)\s*```", re.IGNORECASE | re.DOTALL)
_JSON_START_RE = re.compile(r"[{\[]")

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_llm_json’s fallback scan matches both { and [ (_JSON_START_RE = r"[{\[]"), so it can successfully parse a top-level list/scalar from a prose-wrapped response. All current call sites assume a dict and call .get(...), which will raise AttributeErrorfor non-dict JSON. Consider restricting the scan to{(since callers requestjson_object`) and/or validating the parsed value is a dict and raising a decode/validation error otherwise.

Copilot uses AI. Check for mistakes.
try:
entities = json.loads(response.content).get("entities", [])
entities = parse_llm_json(response.content).get("entities", [])
except (json.JSONDecodeError, KeyError):
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handler includes KeyError, but .get("entities", []) won’t raise KeyError. The more likely failure mode here (especially with the new lenient parser) is AttributeError/TypeError if the parsed JSON isn’t a dict (e.g., model returns a top-level list). Catching those and treating them as a parse failure would avoid unexpected retries/crashes.

Suggested change
except (json.JSONDecodeError, KeyError):
except (json.JSONDecodeError, AttributeError, TypeError):

Copilot uses AI. Check for mistakes.
)
try:
response_data = json.loads(response.content)
response_data = parse_llm_json(response.content)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_llm_json returns Any, but this code assumes it’s a dict (response_data.get(...)) and only catches JSONDecodeError. If the model returns a valid non-object JSON value (list/string/etc.), this will raise AttributeError and won’t be logged as a decode problem. Consider validating response_data is a dict (or catching AttributeError/TypeError and re-raising as a decode/validation error) so retries and logs behave consistently.

Suggested change
response_data = parse_llm_json(response.content)
response_data = parse_llm_json(response.content)
if not isinstance(response_data, dict):
logger.error(
"Expected JSON object with 'memories' key, got %s: %r",
type(response_data).__name__,
response.content,
)
# Treat non-object top-level JSON as a decode/validation error
raise json.JSONDecodeError(
"Expected JSON object at top level",
response.content,
0,
)

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +47
for match in _JSON_START_RE.finditer(content):
candidate = content[match.start() :].lstrip()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I'm seeing, it looks like this regex approach would produce a lot of results if your JSON contained a list of dictionaries (or if you had a dictionary that contained lists).

>>> p = re.compile(r"[{\[]")
>>> s = '```json\n[{"a": 1}, {"a": 2}, {"a": 3}]\n```'	
>>> for match in p.finditer(s):
... 	print(s[match.start() :].lstrip())
	
[{"a": 1}, {"a": 2}, {"a": 3}]
\```
{"a": 1}, {"a": 2}, {"a": 3}]
\```
{"a": 2}, {"a": 3}]
\```
{"a": 3}]
\```

It probably won't matter since parse_llm_json should be able to parse an earlier match and not reach these awkward ones, which wouldn't be parseable anyway, but it leaves a small door open to strange behaviour.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@tylerhutcherson
Copy link
Copy Markdown
Contributor Author

I need to do more here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

candidate = content[match.start() :].lstrip()
if candidate and candidate not in seen:
seen.add(candidate)
yield candidate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncaught AttributeError when parser extracts non-dict JSON

Medium Severity

The _JSON_START_RE fallback matches every { and [ in the content. If prose contains a small valid JSON fragment before the intended object (e.g., "I found [3] entities: {\"entities\": [...]}") then raw_decode succeeds on [3], returning a list. All callers immediately call .get(...) on the result, which raises AttributeError — not caught by the existing except (json.JSONDecodeError, KeyError) handlers. Previously json.loads would have raised JSONDecodeError on this input, which was properly handled. This regression converts a graceful fallback into an uncaught exception that triggers three tenacity retries followed by a RetryError.

Additional Locations (2)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

json.loads() fails on LLM responses wrapped in markdown code fences

3 participants