Skip to content

Commit a42a2a9

Browse files
authored
Merge pull request #46 from redis/feature/implement-contextual-grounding
Feat: Implement contextual grounding
2 parents a930acc + 754939b commit a42a2a9

11 files changed

+3634
-14
lines changed

TASK_MEMORY.md

Lines changed: 359 additions & 0 deletions
Large diffs are not rendered by default.

agent_memory_server/extraction.py

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import json
22
import os
3+
from datetime import datetime
34
from typing import TYPE_CHECKING, Any
45

56
import ulid
@@ -218,19 +219,48 @@ async def handle_extraction(text: str) -> tuple[list[str], list[str]]:
218219
You are a long-memory manager. Your job is to analyze text and extract
219220
information that might be useful in future conversations with users.
220221
222+
CURRENT CONTEXT:
223+
Current date and time: {current_datetime}
224+
221225
Extract two types of memories:
222226
1. EPISODIC: Personal experiences specific to a user or agent.
223227
Example: "User prefers window seats" or "User had a bad experience in Paris"
224228
225229
2. SEMANTIC: User preferences and general knowledge outside of your training data.
226230
Example: "Trek discontinued the Trek 520 steel touring bike in 2023"
227231
232+
CONTEXTUAL GROUNDING REQUIREMENTS:
233+
When extracting memories, you must resolve all contextual references to their concrete referents:
234+
235+
1. PRONOUNS: Replace ALL pronouns (he/she/they/him/her/them/his/hers/theirs) with the actual person's name
236+
- "He loves coffee" → "John loves coffee" (if "he" refers to John)
237+
- "I told her about it" → "User told Sarah about it" (if "her" refers to Sarah)
238+
- "Her experience is valuable" → "Sarah's experience is valuable" (if "her" refers to Sarah)
239+
- "His work is excellent" → "John's work is excellent" (if "his" refers to John)
240+
- NEVER leave pronouns unresolved - always replace with the specific person's name
241+
242+
2. TEMPORAL REFERENCES: Convert relative time expressions to absolute dates/times using the current datetime provided above
243+
- "yesterday" → specific date (e.g., "March 15, 2025" if current date is March 16, 2025)
244+
- "last year" → specific year (e.g., "2024" if current year is 2025)
245+
- "three months ago" → specific month/year (e.g., "December 2024" if current date is March 2025)
246+
- "next week" → specific date range (e.g., "December 22-28, 2024" if current date is December 15, 2024)
247+
- "tomorrow" → specific date (e.g., "December 16, 2024" if current date is December 15, 2024)
248+
- "last month" → specific month/year (e.g., "November 2024" if current date is December 2024)
249+
250+
3. SPATIAL REFERENCES: Resolve place references to specific locations
251+
- "there" → "San Francisco" (if referring to San Francisco)
252+
- "that place" → "Chez Panisse restaurant" (if referring to that restaurant)
253+
- "here" → "the office" (if referring to the office)
254+
255+
4. DEFINITE REFERENCES: Resolve definite articles to specific entities
256+
- "the meeting" → "the quarterly planning meeting"
257+
- "the document" → "the budget proposal document"
258+
228259
For each memory, return a JSON object with the following fields:
229-
- type: str --The memory type, either "episodic" or "semantic"
230-
- text: str -- The actual information to store
260+
- type: str -- The memory type, either "episodic" or "semantic"
261+
- text: str -- The actual information to store (with all contextual references grounded)
231262
- topics: list[str] -- The topics of the memory (top {top_k_topics})
232263
- entities: list[str] -- The entities of the memory
233-
-
234264
235265
Return a list of memories, for example:
236266
{{
@@ -254,10 +284,20 @@ async def handle_extraction(text: str) -> tuple[list[str], list[str]]:
254284
1. Only extract information that would be genuinely useful for future interactions.
255285
2. Do not extract procedural knowledge - that is handled by the system's built-in tools and prompts.
256286
3. You are a large language model - do not extract facts that you already know.
287+
4. CRITICAL: ALWAYS ground ALL contextual references - never leave ANY pronouns, relative times, or vague place references unresolved.
288+
5. MANDATORY: Replace every instance of "he/she/they/him/her/them/his/hers/theirs" with the actual person's name.
289+
6. MANDATORY: Replace possessive pronouns like "her experience" with "Sarah's experience" (if "her" refers to Sarah).
290+
7. If you cannot determine what a contextual reference refers to, either omit that memory or use generic terms like "someone" instead of ungrounded pronouns.
257291
258292
Message:
259293
{message}
260294
295+
STEP-BY-STEP PROCESS:
296+
1. First, identify all pronouns in the text: he, she, they, him, her, them, his, hers, theirs
297+
2. Determine what person each pronoun refers to based on the context
298+
3. Replace every single pronoun with the actual person's name
299+
4. Extract the grounded memories with NO pronouns remaining
300+
261301
Extracted memories:
262302
"""
263303

@@ -319,7 +359,11 @@ async def extract_discrete_memories(
319359
response = await client.create_chat_completion(
320360
model=settings.generation_model,
321361
prompt=DISCRETE_EXTRACTION_PROMPT.format(
322-
message=memory.text, top_k_topics=settings.top_k_topics
362+
message=memory.text,
363+
top_k_topics=settings.top_k_topics,
364+
current_datetime=datetime.now().strftime(
365+
"%A, %B %d, %Y at %I:%M %p %Z"
366+
),
323367
),
324368
response_format={"type": "json_object"},
325369
)

agent_memory_server/long_term_memory.py

Lines changed: 155 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,142 @@
9999

100100
logger = logging.getLogger(__name__)
101101

102+
# Debounce configuration for thread-aware extraction
103+
EXTRACTION_DEBOUNCE_TTL = 300 # 5 minutes
104+
EXTRACTION_DEBOUNCE_KEY_PREFIX = "extraction_debounce"
105+
106+
107+
async def should_extract_session_thread(session_id: str, redis: Redis) -> bool:
108+
"""
109+
Check if enough time has passed since last thread-aware extraction for this session.
110+
111+
This implements a debounce mechanism to avoid constantly re-extracting memories
112+
from the same conversation thread as new messages arrive.
113+
114+
Args:
115+
session_id: The session ID to check
116+
redis: Redis client
117+
118+
Returns:
119+
True if extraction should proceed, False if debounced
120+
"""
121+
122+
debounce_key = f"{EXTRACTION_DEBOUNCE_KEY_PREFIX}:{session_id}"
123+
124+
# Check if debounce key exists
125+
exists = await redis.exists(debounce_key)
126+
if not exists:
127+
# Set debounce key with TTL to prevent extraction for the next period
128+
await redis.setex(debounce_key, EXTRACTION_DEBOUNCE_TTL, "extracting")
129+
logger.info(
130+
f"Starting thread-aware extraction for session {session_id} (debounce set for {EXTRACTION_DEBOUNCE_TTL}s)"
131+
)
132+
return True
133+
134+
remaining_ttl = await redis.ttl(debounce_key)
135+
logger.info(
136+
f"Skipping thread-aware extraction for session {session_id} (debounced, {remaining_ttl}s remaining)"
137+
)
138+
return False
139+
140+
141+
async def extract_memories_from_session_thread(
142+
session_id: str,
143+
namespace: str | None = None,
144+
user_id: str | None = None,
145+
llm_client: OpenAIClientWrapper | AnthropicClientWrapper | None = None,
146+
) -> list[MemoryRecord]:
147+
"""
148+
Extract memories from the entire conversation thread in working memory.
149+
150+
This provides full conversational context for proper contextual grounding,
151+
allowing pronouns and references to be resolved across the entire thread.
152+
153+
Args:
154+
session_id: The session ID to extract memories from
155+
namespace: Optional namespace for the memories
156+
user_id: Optional user ID for the memories
157+
llm_client: Optional LLM client for extraction
158+
159+
Returns:
160+
List of extracted memory records with proper contextual grounding
161+
"""
162+
from agent_memory_server.working_memory import get_working_memory
163+
164+
# Get the complete working memory thread
165+
working_memory = await get_working_memory(
166+
session_id=session_id, namespace=namespace, user_id=user_id
167+
)
168+
169+
if not working_memory or not working_memory.messages:
170+
logger.info(f"No working memory messages found for session {session_id}")
171+
return []
172+
173+
# Build full conversation context from all messages
174+
conversation_messages = []
175+
for msg in working_memory.messages:
176+
# Include role and content for better context
177+
role_prefix = (
178+
f"[{msg.role.upper()}]: " if hasattr(msg, "role") and msg.role else ""
179+
)
180+
conversation_messages.append(f"{role_prefix}{msg.content}")
181+
182+
full_conversation = "\n".join(conversation_messages)
183+
184+
logger.info(
185+
f"Extracting memories from {len(working_memory.messages)} messages in session {session_id}"
186+
)
187+
logger.debug(
188+
f"Full conversation context length: {len(full_conversation)} characters"
189+
)
190+
191+
# Use the enhanced extraction prompt with contextual grounding
192+
from agent_memory_server.extraction import DISCRETE_EXTRACTION_PROMPT
193+
194+
client = llm_client or await get_model_client(settings.generation_model)
195+
196+
try:
197+
response = await client.create_chat_completion(
198+
model=settings.generation_model,
199+
prompt=DISCRETE_EXTRACTION_PROMPT.format(
200+
message=full_conversation,
201+
top_k_topics=settings.top_k_topics,
202+
current_datetime=datetime.now().strftime(
203+
"%A, %B %d, %Y at %I:%M %p %Z"
204+
),
205+
),
206+
response_format={"type": "json_object"},
207+
)
208+
209+
extraction_result = json.loads(response.choices[0].message.content)
210+
memories_data = extraction_result.get("memories", [])
211+
212+
logger.info(
213+
f"Extracted {len(memories_data)} memories from session thread {session_id}"
214+
)
215+
216+
# Convert to MemoryRecord objects
217+
extracted_memories = []
218+
for memory_data in memories_data:
219+
memory = MemoryRecord(
220+
id=str(ULID()),
221+
text=memory_data["text"],
222+
memory_type=memory_data.get("type", "semantic"),
223+
topics=memory_data.get("topics", []),
224+
entities=memory_data.get("entities", []),
225+
session_id=session_id,
226+
namespace=namespace,
227+
user_id=user_id,
228+
discrete_memory_extracted="t", # Mark as extracted
229+
)
230+
extracted_memories.append(memory)
231+
232+
return extracted_memories
233+
234+
except Exception as e:
235+
logger.error(f"Error extracting memories from session thread {session_id}: {e}")
236+
return []
237+
102238

103239
async def extract_memory_structure(memory: MemoryRecord):
104240
redis = await get_redis_conn()
@@ -1131,23 +1267,32 @@ async def promote_working_memory_to_long_term(
11311267
updated_memories = []
11321268
extracted_memories = []
11331269

1134-
# Find messages that haven't been extracted yet for discrete memory extraction
1270+
# Thread-aware discrete memory extraction with debouncing
11351271
unextracted_messages = [
11361272
message
11371273
for message in current_working_memory.messages
11381274
if message.discrete_memory_extracted == "f"
11391275
]
11401276

11411277
if settings.enable_discrete_memory_extraction and unextracted_messages:
1142-
logger.info(f"Extracting memories from {len(unextracted_messages)} messages")
1143-
extracted_memories = await extract_memories_from_messages(
1144-
messages=unextracted_messages,
1145-
session_id=session_id,
1146-
user_id=user_id,
1147-
namespace=namespace,
1148-
)
1149-
for message in unextracted_messages:
1150-
message.discrete_memory_extracted = "t"
1278+
# Check if we should run thread-aware extraction (debounced)
1279+
if await should_extract_session_thread(session_id, redis):
1280+
logger.info(
1281+
f"Running thread-aware extraction from {len(current_working_memory.messages)} total messages in session {session_id}"
1282+
)
1283+
extracted_memories = await extract_memories_from_session_thread(
1284+
session_id=session_id,
1285+
namespace=namespace,
1286+
user_id=user_id,
1287+
)
1288+
1289+
# Mark ALL messages in the session as extracted since we processed the full thread
1290+
for message in current_working_memory.messages:
1291+
message.discrete_memory_extracted = "t"
1292+
1293+
else:
1294+
logger.info(f"Skipping extraction for session {session_id} - debounced")
1295+
extracted_memories = []
11511296

11521297
for memory in current_working_memory.memories:
11531298
if memory.persisted_at is None:

agent_memory_server/mcp.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,27 @@ async def create_long_term_memories(
181181
182182
This tool saves memories contained in the payload for future retrieval.
183183
184+
CONTEXTUAL GROUNDING REQUIREMENTS:
185+
When creating memories, you MUST resolve all contextual references to their concrete referents:
186+
187+
1. PRONOUNS: Replace ALL pronouns (he/she/they/him/her/them/his/hers/theirs) with actual person names
188+
- "He prefers Python" → "John prefers Python" (if "he" refers to John)
189+
- "Her expertise is valuable" → "Sarah's expertise is valuable" (if "her" refers to Sarah)
190+
191+
2. TEMPORAL REFERENCES: Convert relative time expressions to absolute dates/times
192+
- "yesterday" → "2024-03-15" (if today is March 16, 2024)
193+
- "last week" → "March 4-10, 2024" (if current week is March 11-17, 2024)
194+
195+
3. SPATIAL REFERENCES: Resolve place references to specific locations
196+
- "there" → "San Francisco office" (if referring to SF office)
197+
- "here" → "the main conference room" (if referring to specific room)
198+
199+
4. DEFINITE REFERENCES: Resolve definite articles to specific entities
200+
- "the project" → "the customer portal redesign project"
201+
- "the bug" → "the authentication timeout issue"
202+
203+
MANDATORY: Never create memories with unresolved pronouns, vague time references, or unclear spatial references. Always ground contextual references using the full conversation context.
204+
184205
MEMORY TYPES - SEMANTIC vs EPISODIC:
185206
186207
There are two main types of long-term memories you can create:
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
You are an expert evaluator of contextual grounding in text. Your task is to assess how well contextual references (pronouns, temporal expressions, spatial references, etc.) have been resolved to their concrete referents.
2+
3+
INPUT CONTEXT MESSAGES:
4+
{context_messages}
5+
6+
ORIGINAL TEXT WITH CONTEXTUAL REFERENCES:
7+
{original_text}
8+
9+
GROUNDED TEXT (what the system produced):
10+
{grounded_text}
11+
12+
EXPECTED GROUNDINGS:
13+
{expected_grounding}
14+
15+
Please evaluate the grounding quality on these dimensions:
16+
17+
1. PRONOUN_RESOLUTION (0-1): How well are pronouns (he/she/they/him/her/them) resolved to specific entities? If no pronouns are present, score as 1.0. If pronouns remain unchanged from the original text, this indicates no grounding was performed and should receive a low score (0.0-0.2).
18+
19+
2. TEMPORAL_GROUNDING (0-1): How well are relative time expressions converted to absolute times? If no temporal expressions are present, score as 1.0. If temporal expressions remain unchanged when they should be grounded, this indicates incomplete grounding.
20+
21+
3. SPATIAL_GROUNDING (0-1): How well are place references (there/here/that place) resolved to specific locations? If no spatial references are present, score as 1.0. If spatial references remain unchanged when they should be grounded, this indicates incomplete grounding.
22+
23+
4. COMPLETENESS (0-1): Are all context-dependent references that exist in the text properly resolved? This should be high (0.8-1.0) if all relevant references were grounded, moderate (0.4-0.7) if some were missed, and low (0.0-0.3) if most/all were missed.
24+
25+
5. ACCURACY (0-1): Are the groundings factually correct given the context?
26+
27+
IMPORTANT SCORING PRINCIPLES:
28+
- Only penalize dimensions that are actually relevant to the text
29+
- If no pronouns exist, pronoun_resolution_score = 1.0 (not applicable = perfect)
30+
- If no temporal expressions exist, temporal_grounding_score = 1.0 (not applicable = perfect)
31+
- If no spatial references exist, spatial_grounding_score = 1.0 (not applicable = perfect)
32+
- The overall_score should reflect performance on relevant dimensions only
33+
34+
CRITICAL: If the grounded text is identical to the original text, this means NO grounding was performed. In this case:
35+
- Set relevant dimension scores to 0.0 based on what should have been grounded
36+
- Set irrelevant dimension scores to 1.0 (not applicable)
37+
- COMPLETENESS should be 0.0 since nothing was resolved
38+
- OVERALL_SCORE should be very low (0.0-0.2) if grounding was expected
39+
40+
Return your evaluation as JSON in this format:
41+
{{
42+
"pronoun_resolution_score": 0.95,
43+
"temporal_grounding_score": 0.90,
44+
"spatial_grounding_score": 0.85,
45+
"completeness_score": 0.92,
46+
"accuracy_score": 0.88,
47+
"overall_score": 0.90,
48+
"explanation": "Brief explanation of the scoring rationale"
49+
}}
50+
51+
Be strict in your evaluation - only give high scores when grounding is complete and accurate.
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
You are an expert evaluator of memory extraction systems. Your task is to assess how well a system extracted discrete memories from conversational text.
2+
3+
ORIGINAL CONVERSATION:
4+
{original_conversation}
5+
6+
EXTRACTED MEMORIES:
7+
{extracted_memories}
8+
9+
EXPECTED EXTRACTION CRITERIA:
10+
{expected_criteria}
11+
12+
Please evaluate the memory extraction quality on these dimensions:
13+
14+
1. RELEVANCE (0-1): Are the extracted memories genuinely useful for future conversations?
15+
2. CLASSIFICATION_ACCURACY (0-1): Are memories correctly classified as "episodic" vs "semantic"?
16+
3. INFORMATION_PRESERVATION (0-1): Is important information captured without loss?
17+
4. REDUNDANCY_AVOIDANCE (0-1): Are duplicate or overlapping memories avoided?
18+
5. COMPLETENESS (0-1): Are all extractable valuable memories identified?
19+
6. ACCURACY (0-1): Are the extracted memories factually correct?
20+
21+
CLASSIFICATION GUIDELINES:
22+
- EPISODIC: Personal experiences, events, user preferences, specific interactions
23+
- SEMANTIC: General knowledge, facts, procedures, definitions not in training data
24+
25+
Return your evaluation as JSON in this format:
26+
{{
27+
"relevance_score": 0.95,
28+
"classification_accuracy_score": 0.90,
29+
"information_preservation_score": 0.85,
30+
"redundancy_avoidance_score": 0.92,
31+
"completeness_score": 0.88,
32+
"accuracy_score": 0.94,
33+
"overall_score": 0.90,
34+
"explanation": "Brief explanation of the scoring rationale",
35+
"suggested_improvements": "Specific suggestions for improvement"
36+
}}
37+
38+
Be strict in your evaluation - only give high scores when extraction is comprehensive and accurate.

0 commit comments

Comments
 (0)