Skip to content

Feat: Implement contextual grounding #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 13, 2025
359 changes: 359 additions & 0 deletions TASK_MEMORY.md

Large diffs are not rendered by default.

39 changes: 36 additions & 3 deletions agent_memory_server/extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,12 +225,35 @@ async def handle_extraction(text: str) -> tuple[list[str], list[str]]:
2. SEMANTIC: User preferences and general knowledge outside of your training data.
Example: "Trek discontinued the Trek 520 steel touring bike in 2023"

CONTEXTUAL GROUNDING REQUIREMENTS:
When extracting memories, you must resolve all contextual references to their concrete referents:

1. PRONOUNS: Replace ALL pronouns (he/she/they/him/her/them/his/hers/theirs) with the actual person's name
- "He loves coffee" → "John loves coffee" (if "he" refers to John)
- "I told her about it" → "User told Sarah about it" (if "her" refers to Sarah)
- "Her experience is valuable" → "Sarah's experience is valuable" (if "her" refers to Sarah)
- "His work is excellent" → "John's work is excellent" (if "his" refers to John)
- NEVER leave pronouns unresolved - always replace with the specific person's name

2. TEMPORAL REFERENCES: Convert relative time expressions to absolute dates/times
- "yesterday" → "March 15, 2025" (if today is March 16, 2025)
- "last year" → "2024" (if current year is 2025)
- "three months ago" → "December 2024" (if current date is March 2025)

3. SPATIAL REFERENCES: Resolve place references to specific locations
- "there" → "San Francisco" (if referring to San Francisco)
- "that place" → "Chez Panisse restaurant" (if referring to that restaurant)
- "here" → "the office" (if referring to the office)

4. DEFINITE REFERENCES: Resolve definite articles to specific entities
- "the meeting" → "the quarterly planning meeting"
- "the document" → "the budget proposal document"

For each memory, return a JSON object with the following fields:
- type: str --The memory type, either "episodic" or "semantic"
- text: str -- The actual information to store
- type: str -- The memory type, either "episodic" or "semantic"
- text: str -- The actual information to store (with all contextual references grounded)
- topics: list[str] -- The topics of the memory (top {top_k_topics})
- entities: list[str] -- The entities of the memory
-

Return a list of memories, for example:
{{
Expand All @@ -254,10 +277,20 @@ async def handle_extraction(text: str) -> tuple[list[str], list[str]]:
1. Only extract information that would be genuinely useful for future interactions.
2. Do not extract procedural knowledge - that is handled by the system's built-in tools and prompts.
3. You are a large language model - do not extract facts that you already know.
4. CRITICAL: ALWAYS ground ALL contextual references - never leave ANY pronouns, relative times, or vague place references unresolved.
5. MANDATORY: Replace every instance of "he/she/they/him/her/them/his/hers/theirs" with the actual person's name.
6. MANDATORY: Replace possessive pronouns like "her experience" with "Sarah's experience" (if "her" refers to Sarah).
7. If you cannot determine what a contextual reference refers to, either omit that memory or use generic terms like "someone" instead of ungrounded pronouns.

Message:
{message}

STEP-BY-STEP PROCESS:
1. First, identify all pronouns in the text: he, she, they, him, her, them, his, hers, theirs
2. Determine what person each pronoun refers to based on the context
3. Replace every single pronoun with the actual person's name
4. Extract the grounded memories with NO pronouns remaining

Extracted memories:
"""

Expand Down
161 changes: 151 additions & 10 deletions agent_memory_server/long_term_memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,138 @@

logger = logging.getLogger(__name__)

# Debounce configuration for thread-aware extraction
EXTRACTION_DEBOUNCE_TTL = 300 # 5 minutes
EXTRACTION_DEBOUNCE_KEY_PREFIX = "extraction_debounce"


async def should_extract_session_thread(session_id: str, redis: Redis) -> bool:
"""
Check if enough time has passed since last thread-aware extraction for this session.

This implements a debounce mechanism to avoid constantly re-extracting memories
from the same conversation thread as new messages arrive.

Args:
session_id: The session ID to check
redis: Redis client

Returns:
True if extraction should proceed, False if debounced
"""

debounce_key = f"{EXTRACTION_DEBOUNCE_KEY_PREFIX}:{session_id}"

# Check if debounce key exists
exists = await redis.exists(debounce_key)
if not exists:
# Set debounce key with TTL to prevent extraction for the next period
await redis.setex(debounce_key, EXTRACTION_DEBOUNCE_TTL, "extracting")
logger.info(
f"Starting thread-aware extraction for session {session_id} (debounce set for {EXTRACTION_DEBOUNCE_TTL}s)"
)
return True

remaining_ttl = await redis.ttl(debounce_key)
logger.info(
f"Skipping thread-aware extraction for session {session_id} (debounced, {remaining_ttl}s remaining)"
)
return False


async def extract_memories_from_session_thread(
session_id: str,
namespace: str | None = None,
user_id: str | None = None,
llm_client: OpenAIClientWrapper | AnthropicClientWrapper | None = None,
) -> list[MemoryRecord]:
"""
Extract memories from the entire conversation thread in working memory.

This provides full conversational context for proper contextual grounding,
allowing pronouns and references to be resolved across the entire thread.

Args:
session_id: The session ID to extract memories from
namespace: Optional namespace for the memories
user_id: Optional user ID for the memories
llm_client: Optional LLM client for extraction

Returns:
List of extracted memory records with proper contextual grounding
"""
from agent_memory_server.working_memory import get_working_memory

# Get the complete working memory thread
working_memory = await get_working_memory(
session_id=session_id, namespace=namespace, user_id=user_id
)

if not working_memory or not working_memory.messages:
logger.info(f"No working memory messages found for session {session_id}")
return []

# Build full conversation context from all messages
conversation_messages = []
for msg in working_memory.messages:
# Include role and content for better context
role_prefix = (
f"[{msg.role.upper()}]: " if hasattr(msg, "role") and msg.role else ""
)
conversation_messages.append(f"{role_prefix}{msg.content}")

full_conversation = "\n".join(conversation_messages)

logger.info(
f"Extracting memories from {len(working_memory.messages)} messages in session {session_id}"
)
logger.debug(
f"Full conversation context length: {len(full_conversation)} characters"
)

# Use the enhanced extraction prompt with contextual grounding
from agent_memory_server.extraction import DISCRETE_EXTRACTION_PROMPT

client = llm_client or await get_model_client(settings.generation_model)

try:
response = await client.create_chat_completion(
model=settings.generation_model,
prompt=DISCRETE_EXTRACTION_PROMPT.format(
message=full_conversation, top_k_topics=settings.top_k_topics
),
response_format={"type": "json_object"},
)

extraction_result = json.loads(response.choices[0].message.content)
memories_data = extraction_result.get("memories", [])

logger.info(
f"Extracted {len(memories_data)} memories from session thread {session_id}"
)

# Convert to MemoryRecord objects
extracted_memories = []
for memory_data in memories_data:
memory = MemoryRecord(
id=str(ULID()),
text=memory_data["text"],
memory_type=memory_data.get("type", "semantic"),
topics=memory_data.get("topics", []),
entities=memory_data.get("entities", []),
session_id=session_id,
namespace=namespace,
user_id=user_id,
discrete_memory_extracted="t", # Mark as extracted
)
extracted_memories.append(memory)

return extracted_memories

except Exception as e:
logger.error(f"Error extracting memories from session thread {session_id}: {e}")
return []


async def extract_memory_structure(memory: MemoryRecord):
redis = await get_redis_conn()
Expand Down Expand Up @@ -1124,23 +1256,32 @@ async def promote_working_memory_to_long_term(
updated_memories = []
extracted_memories = []

# Find messages that haven't been extracted yet for discrete memory extraction
# Thread-aware discrete memory extraction with debouncing
unextracted_messages = [
message
for message in current_working_memory.messages
if message.discrete_memory_extracted == "f"
]

if settings.enable_discrete_memory_extraction and unextracted_messages:
logger.info(f"Extracting memories from {len(unextracted_messages)} messages")
extracted_memories = await extract_memories_from_messages(
messages=unextracted_messages,
session_id=session_id,
user_id=user_id,
namespace=namespace,
)
for message in unextracted_messages:
message.discrete_memory_extracted = "t"
# Check if we should run thread-aware extraction (debounced)
if await should_extract_session_thread(session_id, redis):
logger.info(
f"Running thread-aware extraction from {len(current_working_memory.messages)} total messages in session {session_id}"
)
extracted_memories = await extract_memories_from_session_thread(
session_id=session_id,
namespace=namespace,
user_id=user_id,
)

# Mark ALL messages in the session as extracted since we processed the full thread
for message in current_working_memory.messages:
message.discrete_memory_extracted = "t"

else:
logger.info(f"Skipping extraction for session {session_id} - debounced")
extracted_memories = []

for memory in current_working_memory.memories:
if memory.persisted_at is None:
Expand Down
21 changes: 21 additions & 0 deletions agent_memory_server/mcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,27 @@ async def create_long_term_memories(

This tool saves memories contained in the payload for future retrieval.

CONTEXTUAL GROUNDING REQUIREMENTS:
When creating memories, you MUST resolve all contextual references to their concrete referents:

1. PRONOUNS: Replace ALL pronouns (he/she/they/him/her/them/his/hers/theirs) with actual person names
- "He prefers Python" → "John prefers Python" (if "he" refers to John)
- "Her expertise is valuable" → "Sarah's expertise is valuable" (if "her" refers to Sarah)

2. TEMPORAL REFERENCES: Convert relative time expressions to absolute dates/times
- "yesterday" → "2024-03-15" (if today is March 16, 2024)
- "last week" → "March 4-10, 2024" (if current week is March 11-17, 2024)

3. SPATIAL REFERENCES: Resolve place references to specific locations
- "there" → "San Francisco office" (if referring to SF office)
- "here" → "the main conference room" (if referring to specific room)

4. DEFINITE REFERENCES: Resolve definite articles to specific entities
- "the project" → "the customer portal redesign project"
- "the bug" → "the authentication timeout issue"

MANDATORY: Never create memories with unresolved pronouns, vague time references, or unclear spatial references. Always ground contextual references using the full conversation context.

MEMORY TYPES - SEMANTIC vs EPISODIC:

There are two main types of long-term memories you can create:
Expand Down
Loading
Loading