|
| 1 | +# SQLite Database Integration |
| 2 | + |
| 3 | +MemU supports SQLite as a lightweight, file-based database backend for memory storage. This is ideal for: |
| 4 | + |
| 5 | +- **Local development** and testing |
| 6 | +- **Single-user applications** with persistent storage |
| 7 | +- **Portable deployments** where you need a simple database solution |
| 8 | +- **Offline-capable applications** that can't rely on external databases |
| 9 | + |
| 10 | +## Quick Start |
| 11 | + |
| 12 | +### Basic Configuration |
| 13 | + |
| 14 | +```python |
| 15 | +from memu.app import MemoryService |
| 16 | + |
| 17 | +# Using default SQLite file (memu.db in current directory) |
| 18 | +service = MemoryService( |
| 19 | + llm_profiles={"default": {"api_key": "your-api-key"}}, |
| 20 | + database_config={ |
| 21 | + "metadata_store": { |
| 22 | + "provider": "sqlite", |
| 23 | + }, |
| 24 | + }, |
| 25 | +) |
| 26 | + |
| 27 | +# Or specify a custom database path |
| 28 | +service = MemoryService( |
| 29 | + llm_profiles={"default": {"api_key": "your-api-key"}}, |
| 30 | + database_config={ |
| 31 | + "metadata_store": { |
| 32 | + "provider": "sqlite", |
| 33 | + "dsn": "sqlite:///path/to/your/memory.db", |
| 34 | + }, |
| 35 | + }, |
| 36 | +) |
| 37 | +``` |
| 38 | + |
| 39 | +### In-Memory SQLite (No Persistence) |
| 40 | + |
| 41 | +For testing or temporary storage, you can use an in-memory SQLite database: |
| 42 | + |
| 43 | +```python |
| 44 | +service = MemoryService( |
| 45 | + llm_profiles={"default": {"api_key": "your-api-key"}}, |
| 46 | + database_config={ |
| 47 | + "metadata_store": { |
| 48 | + "provider": "sqlite", |
| 49 | + "dsn": "sqlite:///:memory:", |
| 50 | + }, |
| 51 | + }, |
| 52 | +) |
| 53 | +``` |
| 54 | + |
| 55 | +## Configuration Options |
| 56 | + |
| 57 | +| Option | Type | Default | Description | |
| 58 | +|--------|------|---------|-------------| |
| 59 | +| `provider` | `str` | `"inmemory"` | Set to `"sqlite"` to use SQLite backend | |
| 60 | +| `dsn` | `str` | `"sqlite:///memu.db"` | SQLite connection string | |
| 61 | + |
| 62 | +### DSN Format |
| 63 | + |
| 64 | +SQLite DSN follows this format: |
| 65 | +- **File-based**: `sqlite:///path/to/database.db` |
| 66 | +- **In-memory**: `sqlite:///:memory:` |
| 67 | +- **Relative path**: `sqlite:///./data/memu.db` |
| 68 | +- **Absolute path**: `sqlite:////home/user/data/memu.db` (note the 4 slashes) |
| 69 | + |
| 70 | +## Vector Search |
| 71 | + |
| 72 | +SQLite doesn't have native vector support like PostgreSQL's pgvector. MemU uses **brute-force cosine similarity** for vector search when using SQLite: |
| 73 | + |
| 74 | +```python |
| 75 | +service = MemoryService( |
| 76 | + llm_profiles={"default": {"api_key": "your-api-key"}}, |
| 77 | + database_config={ |
| 78 | + "metadata_store": { |
| 79 | + "provider": "sqlite", |
| 80 | + "dsn": "sqlite:///memu.db", |
| 81 | + }, |
| 82 | + "vector_index": { |
| 83 | + "provider": "bruteforce", # This is the default for SQLite |
| 84 | + }, |
| 85 | + }, |
| 86 | +) |
| 87 | +``` |
| 88 | + |
| 89 | +**Note**: Brute-force search loads all embeddings into memory and computes similarity for each. This works well for moderate dataset sizes (up to ~100k items) but may be slow for larger datasets. |
| 90 | + |
| 91 | +## Database Schema |
| 92 | + |
| 93 | +SQLite creates the following tables automatically: |
| 94 | + |
| 95 | +- `sqlite_resources` - Multimodal resource records (images, documents, etc.) |
| 96 | +- `sqlite_memory_items` - Extracted memory items with embeddings |
| 97 | +- `sqlite_memory_categories` - Memory categories with summaries |
| 98 | +- `sqlite_category_items` - Relationships between items and categories |
| 99 | + |
| 100 | +Embeddings are stored as JSON-serialized text in SQLite since there's no native vector type. |
| 101 | + |
| 102 | +## Data Import/Export |
| 103 | + |
| 104 | +### Export Data |
| 105 | + |
| 106 | +You can export your SQLite database for backup or migration: |
| 107 | + |
| 108 | +```python |
| 109 | +import shutil |
| 110 | + |
| 111 | +# Simply copy the database file |
| 112 | +shutil.copy("memu.db", "memu_backup.db") |
| 113 | +``` |
| 114 | + |
| 115 | +### Import from SQLite to PostgreSQL |
| 116 | + |
| 117 | +To migrate data from SQLite to PostgreSQL: |
| 118 | + |
| 119 | +```python |
| 120 | +import json |
| 121 | +from memu.database.sqlite import build_sqlite_database |
| 122 | +from memu.database.postgres import build_postgres_database |
| 123 | +from memu.app.settings import DatabaseConfig |
| 124 | +from pydantic import BaseModel |
| 125 | + |
| 126 | +class UserScope(BaseModel): |
| 127 | + user_id: str |
| 128 | + |
| 129 | +# Load from SQLite |
| 130 | +sqlite_config = DatabaseConfig( |
| 131 | + metadata_store={"provider": "sqlite", "dsn": "sqlite:///memu.db"} |
| 132 | +) |
| 133 | +sqlite_db = build_sqlite_database(config=sqlite_config, user_model=UserScope) |
| 134 | +sqlite_db.load_existing() |
| 135 | + |
| 136 | +# Connect to PostgreSQL |
| 137 | +postgres_config = DatabaseConfig( |
| 138 | + metadata_store={"provider": "postgres", "dsn": "postgresql://..."} |
| 139 | +) |
| 140 | +postgres_db = build_postgres_database(config=postgres_config, user_model=UserScope) |
| 141 | + |
| 142 | +# Migrate resources |
| 143 | +for res_id, resource in sqlite_db.resources.items(): |
| 144 | + postgres_db.resource_repo.create_resource( |
| 145 | + url=resource.url, |
| 146 | + modality=resource.modality, |
| 147 | + local_path=resource.local_path, |
| 148 | + caption=resource.caption, |
| 149 | + embedding=resource.embedding, |
| 150 | + user_data={"user_id": getattr(resource, "user_id", None)}, |
| 151 | + ) |
| 152 | + |
| 153 | +# Similar for categories, items, and relations... |
| 154 | +``` |
| 155 | + |
| 156 | +## Performance Considerations |
| 157 | + |
| 158 | +| Aspect | SQLite | PostgreSQL | |
| 159 | +|--------|--------|------------| |
| 160 | +| Setup | Zero configuration | Requires server setup | |
| 161 | +| Concurrency | Single writer, multiple readers | Full concurrent access | |
| 162 | +| Vector Search | Brute-force (in-memory) | Native pgvector (indexed) | |
| 163 | +| Scale | Up to ~100k items | Millions of items | |
| 164 | +| Deployment | Single file, portable | External service | |
| 165 | + |
| 166 | +## Example: Full Workflow |
| 167 | + |
| 168 | +```python |
| 169 | +import asyncio |
| 170 | +from memu.app import MemoryService |
| 171 | + |
| 172 | +async def main(): |
| 173 | + # Initialize with SQLite |
| 174 | + service = MemoryService( |
| 175 | + llm_profiles={"default": {"api_key": "your-api-key"}}, |
| 176 | + database_config={ |
| 177 | + "metadata_store": { |
| 178 | + "provider": "sqlite", |
| 179 | + "dsn": "sqlite:///my_memories.db", |
| 180 | + }, |
| 181 | + }, |
| 182 | + ) |
| 183 | + |
| 184 | + # Memorize a conversation |
| 185 | + result = await service.memorize( |
| 186 | + resource_url="conversation.json", |
| 187 | + modality="conversation", |
| 188 | + user={"user_id": "alice"}, |
| 189 | + ) |
| 190 | + print(f"Created {len(result['categories'])} categories") |
| 191 | + |
| 192 | + # Retrieve relevant memories |
| 193 | + memories = await service.retrieve( |
| 194 | + queries=[ |
| 195 | + {"role": "user", "content": {"text": "What are my preferences?"}} |
| 196 | + ], |
| 197 | + where={"user_id": "alice"}, |
| 198 | + ) |
| 199 | + |
| 200 | + for item in memories.get("items", []): |
| 201 | + print(f"- {item['summary']}") |
| 202 | + |
| 203 | +asyncio.run(main()) |
| 204 | +``` |
| 205 | + |
| 206 | +## Troubleshooting |
| 207 | + |
| 208 | +### Database Locked Error |
| 209 | + |
| 210 | +SQLite only allows one writer at a time. If you see "database is locked" errors: |
| 211 | + |
| 212 | +1. Ensure you're not running multiple processes writing to the same database |
| 213 | +2. Consider using PostgreSQL for concurrent access needs |
| 214 | +3. Use connection pooling with appropriate timeouts |
| 215 | + |
| 216 | +### Permission Denied |
| 217 | + |
| 218 | +Make sure the directory containing the SQLite file is writable: |
| 219 | + |
| 220 | +```bash |
| 221 | +chmod 755 /path/to/data/directory |
| 222 | +``` |
| 223 | + |
| 224 | +### Slow Vector Search |
| 225 | + |
| 226 | +If vector search is slow with large datasets: |
| 227 | + |
| 228 | +1. Consider migrating to PostgreSQL with pgvector |
| 229 | +2. Use more selective `where` filters to reduce the search space |
| 230 | +3. Reduce `top_k` parameters in your retrieve configuration |
0 commit comments