Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
26c7932
docs: restructure memory documentation
abrookins Sep 25, 2025
22c1cde
docs: rename Memory Strategies to Memory Extraction Strategies and in…
abrookins Sep 25, 2025
ee979ff
Change optimize_query default from True to False across all interfaces
abrookins Sep 25, 2025
45e3e61
Skip flaky test_judge_comprehensive_grounding_evaluation
abrookins Sep 25, 2025
8f68ce4
Fix context percentage calculation returning null when model info pro…
abrookins Sep 25, 2025
f48bc33
Bump version to 0.12.2
abrookins Sep 26, 2025
a1bb191
Update memory documentation to clarify working memory persistence
abrookins Sep 26, 2025
af4bfa8
Add transparent working memory reconstruction from long-term storage
abrookins Sep 26, 2025
562d897
feat: Add recent_messages_limit parameter and fix critical extraction…
abrookins Sep 26, 2025
6d2a944
Add UpdateWorkingMemory schema for PUT requests to remove session_id …
abrookins Sep 26, 2025
f683dac
Fix flaky test that expects exact promotion count
abrookins Sep 26, 2025
163023c
Skip flaky test_multi_entity_conversation
abrookins Sep 26, 2025
db0db08
Fix datetime.UTC import for Python 3.10/3.11 compatibility
abrookins Sep 26, 2025
a081cfc
Bump client version to 0.12.2
abrookins Sep 26, 2025
5bb4915
Remove test logic from production code
abrookins Sep 26, 2025
67f1ee3
Fix count_memories method and test fallback behavior
abrookins Sep 29, 2025
2abefe1
Fix client test mock for synchronous json() method
abrookins Sep 29, 2025
215bbe8
Simplify count_memories to use proper vector search interface
abrookins Sep 29, 2025
2cca4af
Restructure memory documentation and add missing eager creation tool
abrookins Sep 29, 2025
f41aaca
Remove 'message' memory type from tool creation/editing schemas
abrookins Sep 29, 2025
ea5f617
Fix example agents to work with new working memory API behavior
abrookins Sep 29, 2025
cfbc478
Fix test to allow search_memory to include message type in enum
abrookins Sep 29, 2025
f18af75
Change Docker release to manual workflow dispatch
abrookins Sep 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
263 changes: 263 additions & 0 deletions docs/long-term-memory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
# Long-term Memory

Long-term memory is **persistent**, **cross-session** storage designed for knowledge that should be retained and searchable across all interactions. It's the "knowledge base" where important facts, preferences, and experiences are stored.

## Overview

Long-term memory provides persistent storage that survives server restarts and session expiration. It's optimized for semantic search, deduplication, and rich metadata to enable intelligent retrieval of relevant information.

| Feature | Details |
|---------|---------|
| **Scope** | Cross-session, persistent |
| **Lifespan** | Permanent until manually deleted |
| **Storage** | Redis with vector indexing |
| **Search** | Semantic vector search |
| **Capacity** | Unlimited (with compaction) |
| **Use Case** | Knowledge base, user preferences |
| **Indexing** | Vector embeddings + metadata |
| **Deduplication** | Hash-based and semantic |

## Characteristics

- **Cross-Session**: Accessible from any session
- **Persistent**: Survives server restarts and session expiration
- **Vector Indexed**: Semantic search with OpenAI embeddings
- **Deduplication**: Automatic hash-based and semantic deduplication
- **Rich Metadata**: Topics, entities, timestamps, memory types
- **Compaction**: Automatic cleanup and merging of duplicates

## Memory Types

Long-term memory supports three types of memories:

### 1. Semantic Memory
Facts, preferences, general knowledge

```json
{
"text": "User prefers dark mode interfaces",
"memory_type": "semantic",
"topics": ["preferences", "ui"],
"entities": ["dark mode"]
}
```

### 2. Episodic Memory
Events with temporal context

```json
{
"text": "User visited Paris in March 2024",
"memory_type": "episodic",
"event_date": "2024-03-15T10:00:00Z",
"topics": ["travel"],
"entities": ["Paris"]
}
```

### 3. Message Memory
Conversation records (auto-generated)

```json
{
"text": "user: What's the weather like?",
"memory_type": "message",
"session_id": "chat_123"
}
```

## When to Use Long-Term Memory

### 1. User Preferences and Profile

```python
# Store lasting user preferences
memories = [
MemoryRecord(
text="User prefers metric units for temperature",
id="pref_metric_temp",
memory_type="semantic",
topics=["preferences", "units"],
user_id="user_123"
)
]
```

### 2. Important Facts and Knowledge

```python
# Store domain knowledge
memories = [
MemoryRecord(
text="Customer's subscription expires on 2024-06-15",
id="sub_expiry_customer_456",
memory_type="episodic",
event_date=datetime(2024, 6, 15),
entities=["customer_456", "subscription"],
user_id="user_123"
)
]
```

### 3. Cross-Session Context

```python
# Store context that spans conversations
memories = [
MemoryRecord(
text="User is working on a Python machine learning project",
id="context_ml_project",
memory_type="semantic",
topics=["programming", "machine-learning", "python"],
namespace="work_context"
)
]
```

## API Endpoints

```http
# Create long-term memories
POST /v1/long-term-memory/

# Search long-term memories
POST /v1/long-term-memory/search
```

## Search Capabilities

Long-term memory provides powerful search features:

### Semantic Vector Search
```json
{
"text": "python programming help",
"limit": 10,
"distance_threshold": 0.8
}
```

### Advanced Filtering
```json
{
"text": "user preferences",
"filters": {
"user_id": {"eq": "user_123"},
"memory_type": {"eq": "semantic"},
"topics": {"any": ["preferences", "settings"]},
"created_at": {"gte": "2024-01-01T00:00:00Z"}
}
}
```

### Hybrid Search
```json
{
"text": "travel plans",
"filters": {
"namespace": {"eq": "personal"},
"event_date": {"gte": "2024-03-01T00:00:00Z"}
},
"include_working_memory": true,
"include_long_term_memory": true
}
```

## Deduplication and Compaction

Long-term memory automatically manages duplicates through:

### Hash-Based Deduplication
- Identical text content is automatically deduplicated
- Preserves the most recent version with complete metadata

### Semantic Deduplication
- Uses vector similarity to identify semantically similar memories
- LLM-powered merging of related memories
- Configurable similarity thresholds

### Automatic Compaction
```python
# Server automatically:
# - Identifies hash-based duplicates
# - Finds semantically similar memories
# - Merges related memories using LLM
# - Removes obsolete duplicates
```

## Memory Prompt Integration

The memory system integrates with AI prompts through the `/v1/memory/prompt` endpoint:

```python
# Get memory-enriched prompt
response = await memory_prompt({
"query": "Help me plan dinner",
"session": {
"session_id": "current_chat",
"model_name": "gpt-4o",
"context_window_max": 4000
},
"long_term_search": {
"text": "food preferences dietary restrictions",
"filters": {"user_id": {"eq": "user_123"}},
"limit": 5
}
})

# Returns ready-to-use messages with:
# - Conversation context from working memory
# - Relevant memories from long-term storage
# - User's query as final message
```

## Memory Extraction

By default, the system automatically extracts structured memories from conversations as they flow from working memory to long-term storage. This extraction process can be customized using different **memory strategies**.

!!! info "Memory Strategies"
The system supports multiple extraction strategies (discrete facts, summaries, preferences, custom prompts) that determine how conversations are processed into memories. See [Memory Strategies](memory-strategies.md) for complete documentation and examples.

## Best Practices

### Long-Term Memory Usage
- Store user preferences and lasting facts
- Include rich metadata (topics, entities, timestamps)
- Use meaningful IDs for easier retrieval
- Leverage semantic search for discovery

### Memory Design
- Use semantic memory for timeless facts
- Use episodic memory for time-bound events
- Include relevant topics and entities for better search
- Design memory text for LLM consumption

### Search Strategy
- Start with semantic search for discovery
- Add filters for precision
- Use unified search for comprehensive results
- Consider both working and long-term contexts

## Configuration

Long-term memory behavior can be configured through environment variables:

```bash
# Long-term memory settings
LONG_TERM_MEMORY=true # Enable long-term memory features
ENABLE_DISCRETE_MEMORY_EXTRACTION=true # Extract memories from messages
GENERATION_MODEL=gpt-4o-mini # Model for summarization/extraction

# Vector search settings
EMBEDDING_MODEL=text-embedding-3-small # OpenAI embedding model
DISTANCE_THRESHOLD=0.8 # Similarity threshold for search
```

For complete configuration options, see the [Configuration Guide](configuration.md).

## Related Documentation

- [Working Memory](working-memory.md) - Session-scoped, ephemeral memory storage
- [Memory Integration Patterns](memory-integration-patterns.md) - How to integrate memory with your applications
- [Memory Strategies](memory-strategies.md) - Different approaches to memory extraction and storage
- [Vector Store Backends](vector-store-backends.md) - Configuring different vector storage backends
3 changes: 2 additions & 1 deletion docs/memory-strategies.md
Original file line number Diff line number Diff line change
Expand Up @@ -412,7 +412,8 @@ pytest tests/test_prompt_security.py -v

## Related Documentation

- **[Memory Types](memory-types.md)** - Understanding working vs long-term memory
- **[Working Memory](working-memory.md)** - Session-scoped, ephemeral memory storage
- **[Long-term Memory](long-term-memory.md)** - Persistent, cross-session memory storage
- **[Security Guide](security-custom-prompts.md)** - Comprehensive security for custom strategies
- **[Memory Lifecycle](memory-lifecycle.md)** - How memories are managed over time
- **[API Reference](api.md)** - REST API for memory management
Expand Down
Loading