Skip to content

Commit 4b899d7

Browse files
feat(database): add SQLite storage backend (#201)
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
1 parent bf44de7 commit 4b899d7

16 files changed

Lines changed: 1850 additions & 2 deletions

docs/sqlite.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# SQLite Database Integration
2+
3+
MemU supports SQLite as a lightweight, file-based database backend for memory storage. This is ideal for:
4+
5+
- **Local development** and testing
6+
- **Single-user applications** with persistent storage
7+
- **Portable deployments** where you need a simple database solution
8+
- **Offline-capable applications** that can't rely on external databases
9+
10+
## Quick Start
11+
12+
### Basic Configuration
13+
14+
```python
15+
from memu.app import MemoryService
16+
17+
# Using default SQLite file (memu.db in current directory)
18+
service = MemoryService(
19+
llm_profiles={"default": {"api_key": "your-api-key"}},
20+
database_config={
21+
"metadata_store": {
22+
"provider": "sqlite",
23+
},
24+
},
25+
)
26+
27+
# Or specify a custom database path
28+
service = MemoryService(
29+
llm_profiles={"default": {"api_key": "your-api-key"}},
30+
database_config={
31+
"metadata_store": {
32+
"provider": "sqlite",
33+
"dsn": "sqlite:///path/to/your/memory.db",
34+
},
35+
},
36+
)
37+
```
38+
39+
### In-Memory SQLite (No Persistence)
40+
41+
For testing or temporary storage, you can use an in-memory SQLite database:
42+
43+
```python
44+
service = MemoryService(
45+
llm_profiles={"default": {"api_key": "your-api-key"}},
46+
database_config={
47+
"metadata_store": {
48+
"provider": "sqlite",
49+
"dsn": "sqlite:///:memory:",
50+
},
51+
},
52+
)
53+
```
54+
55+
## Configuration Options
56+
57+
| Option | Type | Default | Description |
58+
|--------|------|---------|-------------|
59+
| `provider` | `str` | `"inmemory"` | Set to `"sqlite"` to use SQLite backend |
60+
| `dsn` | `str` | `"sqlite:///memu.db"` | SQLite connection string |
61+
62+
### DSN Format
63+
64+
SQLite DSN follows this format:
65+
- **File-based**: `sqlite:///path/to/database.db`
66+
- **In-memory**: `sqlite:///:memory:`
67+
- **Relative path**: `sqlite:///./data/memu.db`
68+
- **Absolute path**: `sqlite:////home/user/data/memu.db` (note the 4 slashes)
69+
70+
## Vector Search
71+
72+
SQLite doesn't have native vector support like PostgreSQL's pgvector. MemU uses **brute-force cosine similarity** for vector search when using SQLite:
73+
74+
```python
75+
service = MemoryService(
76+
llm_profiles={"default": {"api_key": "your-api-key"}},
77+
database_config={
78+
"metadata_store": {
79+
"provider": "sqlite",
80+
"dsn": "sqlite:///memu.db",
81+
},
82+
"vector_index": {
83+
"provider": "bruteforce", # This is the default for SQLite
84+
},
85+
},
86+
)
87+
```
88+
89+
**Note**: Brute-force search loads all embeddings into memory and computes similarity for each. This works well for moderate dataset sizes (up to ~100k items) but may be slow for larger datasets.
90+
91+
## Database Schema
92+
93+
SQLite creates the following tables automatically:
94+
95+
- `sqlite_resources` - Multimodal resource records (images, documents, etc.)
96+
- `sqlite_memory_items` - Extracted memory items with embeddings
97+
- `sqlite_memory_categories` - Memory categories with summaries
98+
- `sqlite_category_items` - Relationships between items and categories
99+
100+
Embeddings are stored as JSON-serialized text in SQLite since there's no native vector type.
101+
102+
## Data Import/Export
103+
104+
### Export Data
105+
106+
You can export your SQLite database for backup or migration:
107+
108+
```python
109+
import shutil
110+
111+
# Simply copy the database file
112+
shutil.copy("memu.db", "memu_backup.db")
113+
```
114+
115+
### Import from SQLite to PostgreSQL
116+
117+
To migrate data from SQLite to PostgreSQL:
118+
119+
```python
120+
import json
121+
from memu.database.sqlite import build_sqlite_database
122+
from memu.database.postgres import build_postgres_database
123+
from memu.app.settings import DatabaseConfig
124+
from pydantic import BaseModel
125+
126+
class UserScope(BaseModel):
127+
user_id: str
128+
129+
# Load from SQLite
130+
sqlite_config = DatabaseConfig(
131+
metadata_store={"provider": "sqlite", "dsn": "sqlite:///memu.db"}
132+
)
133+
sqlite_db = build_sqlite_database(config=sqlite_config, user_model=UserScope)
134+
sqlite_db.load_existing()
135+
136+
# Connect to PostgreSQL
137+
postgres_config = DatabaseConfig(
138+
metadata_store={"provider": "postgres", "dsn": "postgresql://..."}
139+
)
140+
postgres_db = build_postgres_database(config=postgres_config, user_model=UserScope)
141+
142+
# Migrate resources
143+
for res_id, resource in sqlite_db.resources.items():
144+
postgres_db.resource_repo.create_resource(
145+
url=resource.url,
146+
modality=resource.modality,
147+
local_path=resource.local_path,
148+
caption=resource.caption,
149+
embedding=resource.embedding,
150+
user_data={"user_id": getattr(resource, "user_id", None)},
151+
)
152+
153+
# Similar for categories, items, and relations...
154+
```
155+
156+
## Performance Considerations
157+
158+
| Aspect | SQLite | PostgreSQL |
159+
|--------|--------|------------|
160+
| Setup | Zero configuration | Requires server setup |
161+
| Concurrency | Single writer, multiple readers | Full concurrent access |
162+
| Vector Search | Brute-force (in-memory) | Native pgvector (indexed) |
163+
| Scale | Up to ~100k items | Millions of items |
164+
| Deployment | Single file, portable | External service |
165+
166+
## Example: Full Workflow
167+
168+
```python
169+
import asyncio
170+
from memu.app import MemoryService
171+
172+
async def main():
173+
# Initialize with SQLite
174+
service = MemoryService(
175+
llm_profiles={"default": {"api_key": "your-api-key"}},
176+
database_config={
177+
"metadata_store": {
178+
"provider": "sqlite",
179+
"dsn": "sqlite:///my_memories.db",
180+
},
181+
},
182+
)
183+
184+
# Memorize a conversation
185+
result = await service.memorize(
186+
resource_url="conversation.json",
187+
modality="conversation",
188+
user={"user_id": "alice"},
189+
)
190+
print(f"Created {len(result['categories'])} categories")
191+
192+
# Retrieve relevant memories
193+
memories = await service.retrieve(
194+
queries=[
195+
{"role": "user", "content": {"text": "What are my preferences?"}}
196+
],
197+
where={"user_id": "alice"},
198+
)
199+
200+
for item in memories.get("items", []):
201+
print(f"- {item['summary']}")
202+
203+
asyncio.run(main())
204+
```
205+
206+
## Troubleshooting
207+
208+
### Database Locked Error
209+
210+
SQLite only allows one writer at a time. If you see "database is locked" errors:
211+
212+
1. Ensure you're not running multiple processes writing to the same database
213+
2. Consider using PostgreSQL for concurrent access needs
214+
3. Use connection pooling with appropriate timeouts
215+
216+
### Permission Denied
217+
218+
Make sure the directory containing the SQLite file is writable:
219+
220+
```bash
221+
chmod 755 /path/to/data/directory
222+
```
223+
224+
### Slow Vector Search
225+
226+
If vector search is slow with large datasets:
227+
228+
1. Consider migrating to PostgreSQL with pgvector
229+
2. Use more selective `where` filters to reduce the search space
230+
3. Reduce `top_k` parameters in your retrieve configuration

src/memu/app/settings.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -248,9 +248,9 @@ def default(self) -> LLMConfig:
248248

249249

250250
class MetadataStoreConfig(BaseModel):
251-
provider: Annotated[Literal["inmemory", "postgres"], Normalize] = "inmemory"
251+
provider: Annotated[Literal["inmemory", "postgres", "sqlite"], Normalize] = "inmemory"
252252
ddl_mode: Annotated[Literal["create", "validate"], Normalize] = "create"
253-
dsn: str | None = Field(default=None, description="Postgres connection string when provider=postgres.")
253+
dsn: str | None = Field(default=None, description="Database connection string (required for postgres/sqlite).")
254254

255255

256256
class VectorIndexConfig(BaseModel):

src/memu/database/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@
2424
"inmemory",
2525
"postgres",
2626
"schema",
27+
"sqlite",
2728
]

src/memu/database/factory.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ def build_database(
1919
) -> Database:
2020
"""
2121
Initialize a database backend for the configured provider.
22+
23+
Supported providers:
24+
- "inmemory": In-memory storage (default, no persistence)
25+
- "postgres": PostgreSQL with optional pgvector support
26+
- "sqlite": SQLite file-based storage (lightweight, portable)
2227
"""
2328
provider = config.metadata_store.provider
2429
if provider == "inmemory":
@@ -28,6 +33,11 @@ def build_database(
2833
from memu.database.postgres import build_postgres_database
2934

3035
return build_postgres_database(config=config, user_model=user_model)
36+
elif provider == "sqlite":
37+
# Lazy import to avoid loading SQLite dependencies when not needed
38+
from memu.database.sqlite import build_sqlite_database
39+
40+
return build_sqlite_database(config=config, user_model=user_model)
3141
else:
3242
msg = f"Unsupported metadata_store provider: {provider}"
3343
raise ValueError(msg)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
"""SQLite database backend for MemU."""
2+
3+
from __future__ import annotations
4+
5+
from pydantic import BaseModel
6+
7+
from memu.app.settings import DatabaseConfig
8+
from memu.database.sqlite.sqlite import SQLiteStore
9+
10+
11+
def build_sqlite_database(
12+
*,
13+
config: DatabaseConfig,
14+
user_model: type[BaseModel],
15+
) -> SQLiteStore:
16+
"""Build a SQLite database store instance.
17+
18+
Args:
19+
config: Database configuration containing metadata_store settings.
20+
user_model: Pydantic model for user scope fields.
21+
22+
Returns:
23+
Configured SQLiteStore instance.
24+
"""
25+
dsn = config.metadata_store.dsn
26+
if not dsn:
27+
# Default to a local file if no DSN provided
28+
dsn = "sqlite:///memu.db"
29+
30+
return SQLiteStore(
31+
dsn=dsn,
32+
scope_model=user_model,
33+
)
34+
35+
36+
__all__ = ["SQLiteStore", "build_sqlite_database"]

0 commit comments

Comments
 (0)