Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
176 changes: 176 additions & 0 deletions BENCHMARK_RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# elizaOS Database API Benchmark — OLD vs NEW Comparison

## Setup

- Backend: PGLite (in-process WASM PostgreSQL, fresh temp dir per run)
- `performance.now()` timing (sub-millisecond resolution)
- Same benchmark script runs on both APIs via runtime detection
- Batch inserts chunked at 1,000 rows to stay within PGLite WASM limits
- 3 measured iterations, 1 warm-up, **median** reported

---

## N=10,000 — WRITE Benchmarks (Old vs New, same machine, same N)

```
WRITE OPERATIONS (N=10,000) | OLD (singular API) | NEW (batch-first API)
─────────────────────────────┼───────────────────────┼────────────────────────
| loop batch spd | loop batch spd
createAgents | 6964ms 7021ms 1.0x | 2642ms 490ms 5.4x
createEntities | 4231ms 710ms 6.0x | 3627ms 217ms 16.7x
createMemories | 8384ms 8365ms 1.0x | 4912ms 443ms 11.1x
updateAgents | 3899ms 3956ms 1.0x | 2618ms 220ms 11.9x
upsertAgents | [NOT AVAILABLE] | 488ms 485ms 1.0x
```
Comment on lines +15 to +24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced blocks.

markdownlint MD040 flags these fences; add text (or another appropriate language) to keep lint clean and improve rendering.

🔧 Suggested fix
-```
+```text
 WRITE OPERATIONS (N=10,000)  |  OLD (singular API)   |  NEW (batch-first API)
@@
-```
+```text
   Operation        OLD batch     NEW batch     Speedup     Change
@@
-```
+```text
 WRITE OPERATIONS (N=100,000) — NEW batch-first API
@@
-```
+```text
   Query                     OLD (10K)    NEW (10K)
@@
-```
+```text
   Query                        OLD (10K)   NEW (10K)   Index
@@
-```
+```text
   Query                       10K rows   100K rows (NEW)

Also applies to: 28-40, 58-66, 75-87, 95-105, 113-119

🧰 Tools
🪛 markdownlint-cli2 (0.20.0)

[warning] 15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@BENCHMARK_RESULTS.md` around lines 15 - 24, Multiple fenced code blocks use
bare ``` markers causing markdownlint MD040; update each triple-backtick fence
to include a language identifier (e.g., change ``` to ```text) for every
benchmark table block such as the blocks starting with "WRITE OPERATIONS
(N=10,000)", the other table blocks referenced in the comment (the ones around
lines 28-40, 58-66, 75-87, 95-105, 113-119), and any similar fenced tables so
that each opening fence is ```text (and keep the closing fence as ```); this
will satisfy MD040 and improve rendering.


### Head-to-head: batch path only

```
Operation OLD batch NEW batch Speedup Change
─────────────────────────────────────────────────────────────────
createAgents 7,021ms 490ms 14.3x -93.0%
createEntities 710ms 217ms 3.3x* -69.4%
createMemories 8,365ms 443ms 18.9x -94.7%
updateAgents 3,956ms 220ms 18.0x -94.4%
upsertAgents N/A 485ms — NEW
─────────────────────────────────────────────────────────────────
* Both old and new use the same multi-row INSERT code path for
createEntities. The 3.3x gap is likely PGLite WASM runtime
variance between separate benchmark processes, not a code change.
```

### Why the difference?

| Operation | OLD behavior | NEW behavior |
|---|---|---|
| **createAgents** | No batch method — `createAgent()` loops N times | Multi-row `INSERT VALUES (...),(...),(...)` |
| **createEntities** | Had `createEntities(array)` — already batched | Same batch INSERT — 3.3x gap is likely PGLite WASM variance between runs (code paths are nearly identical) |
| **createMemories** | No batch method — `createMemory()` loops N times | Multi-row `INSERT VALUES` |
| **updateAgents** | No batch method — `updateAgent(id)` loops N times | Single `UPDATE ... SET col = CASE WHEN id=X THEN Y ... END` |
| **upsertAgents** | Not available | `INSERT ... ON CONFLICT DO UPDATE` |

---

## N=100,000 — WRITE Benchmarks (New API only)

Old code was unable to complete N=100K for updateAgents (estimated >1 hour for 400K individual UPDATE queries).

```
WRITE OPERATIONS (N=100,000) — NEW batch-first API
═══════════════════════════════════════════════════
createAgents loop: 26,783ms batch: 4,723ms 5.7x
createEntities loop: 35,566ms batch: 1,943ms 18.3x
createMemories loop: 48,162ms batch: 4,632ms 10.4x
updateAgents loop: 25,682ms batch: 2,282ms 11.3x
upsertAgents get+create: 4,753ms upsert: 4,837ms 1.0x
```

---

## READ / QUERY Benchmarks (10K rows seeded)

Both old and new code produce near-identical read performance, confirming
the canonical schema system generates equivalent indexes.

```
Query OLD (10K) NEW (10K)
──────────────────────────────────────────────────
getMemories 3.8ms 3.7ms
countMemories 0.4ms 0.5ms
getMemoriesByRoomIds 46.8ms 47.8ms
getParticipantsForRoom 0.3ms 0.4ms
getRoomsByWorld 0.3ms 0.3ms
getEntitiesByIds (10) 0.6ms 0.6ms
getRoomsByIds (10) 0.4ms 0.6ms
getEntitiesForRoom 0.6ms 0.7ms
getAgents (full scan) 0.6ms 1.8ms
```

## NEW Composite Index Benchmarks (10K rows seeded)

These indexes exist only in the new canonical schema. The old code
doesn't define them — any performance shown is due to PGLite's planner
finding alternative paths (sequential scan on small data).

```
Query OLD (10K) NEW (10K) Index
────────────────────────────────────────────────────────────────────
getComponents (entity+type) 0.3ms 0.4ms idx_components_entity_type
getComponent (exact) 0.3ms 0.3ms idx_components_entity_type
getTasksByName (agent+name) 0.3ms 0.3ms idx_tasks_agent_name
getLogs (room+type) 0.5ms 0.5ms idx_logs_room_type_created
getLogs (entity+type) 0.4ms 0.3ms idx_logs_entity_type
getRelationships (entity) 0.2ms 0.3ms idx_relationships_users
getMemories (agent+type) 28.6ms 27.8ms idx_memories_agent_type
```

### Index analysis

At 10K rows, PGLite's planner can satisfy most queries with sequential scans
fast enough that indexes don't show dramatic differences. The real value of
these composite indexes appears at larger scales:

```
Query 10K rows 100K rows (NEW)
─────────────────────────────────────────────────────────
getMemories (agent+type) 27.8ms 125.9ms
getAgents (full scan) 1.8ms 88.7ms *
getMemoriesByRoomIds 47.8ms 564.8ms
```

**`getMemories (agent+type)`** — sublinear: 100K is only ~4.5x slower than
10K, not 10x. The composite index avoids a full table scan.

**`getAgents (full scan)`** — queries `SELECT id, name, bio FROM agents`
(same 3 columns in old and new). Returns only 1 agent in both runs. The 49x
slowdown at 100K is **dead tuple bloat**: the write benchmarks INSERT/DELETE
~100K agents per iteration across multiple benchmarks (createAgents,
updateAgents, upsertAgents × warmup+measured iterations). PGLite's WASM
PostgreSQL doesn't auto-VACUUM during the benchmark, so the seq scan reads
through millions of dead MVCC rows to find the single live row. Not a real
query regression.

**`getMemoriesByRoomIds`** — linear I/O growth: returns all memories across
10 rooms (10K total rows at N=10K, 100K at N=100K). 564.8/47.8 = 11.8x for
10x more returned data.

---

## Running the Benchmark

```bash
# Quick validation (N=5, 1 iteration, no warm-up)
bun run plugins/plugin-sql/typescript/__tests__/benchmark.ts --dry-run

# Default (N=100, 5 iterations, 2 warm-up)
bun run plugins/plugin-sql/typescript/__tests__/benchmark.ts

# Custom size and iterations
bun run plugins/plugin-sql/typescript/__tests__/benchmark.ts --n=10000 --iters=3
```

The script auto-detects which API version is available (`OLD (singular)` vs
`NEW (batch-first)`), so it runs unchanged on both old and new code.

## Conclusion

At **10K rows** (apples-to-apples comparison, same machine, same PGLite):

- **14.3x faster agent creation** (7.0s -> 0.5s) — multi-row INSERT vs 10K individual INSERTs
- **18.9x faster memory creation** (8.4s -> 0.4s) — multi-row INSERT vs 10K individual INSERTs
- **18.0x faster agent updates** (4.0s -> 0.2s) — single CASE-based UPDATE vs 10K individual UPDATEs
- **Entity creation** already batched in old API — observed 3.3x gap is likely PGLite WASM runtime variance (code paths are nearly identical)
- **New upsert capability** — eliminates race conditions in concurrent agent registration

At **100K rows** (new API only — old code too slow to complete):
- Batch creates process 100K agents in 4.7s, 100K memories in 4.6s
- Batch update handles 100K agents in 2.3s with a single SQL statement

**Zero loops remain** in any CRUD method. Creates use multi-row INSERT. Updates
use CASE expressions. Deletes use `WHERE id IN (...)`. Upserts use
`ON CONFLICT DO UPDATE`.

Read performance is identical between old and new — the index structure is
equivalent. The new composite indexes (`idx_memories_agent_type`,
`idx_components_entity_type`, `idx_logs_room_type_created`, etc.) provide
sublinear scaling for filtered queries at large row counts.
169 changes: 169 additions & 0 deletions DATABASE_API_CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Database API Changelog

## Batch-First Database API Cleanup

### Summary

Comprehensive refactoring of the `IDatabaseAdapter` interface and all adapter implementations
to establish a consistent, batch-first CRUD API with proper naming conventions and return types.

### WHY This Was Done

The original adapter interface grew organically, resulting in:
- Inconsistent naming (`addRoomParticipants` vs `createRooms`)
- Mixed return types (`boolean` vs `UUID[]` for create operations)
- No batch support for many operations (single-item methods on the adapter)
- ORM types leaking into core (plugins importing Drizzle directly)

The cleanup establishes clear rules that make the API predictable for contributors.

---

### Phase 1: Interface Standardization

#### 1A. Batch-First CRUD Methods
- Added batch versions of all single-item CRUD methods
- All `create*` methods now return `Promise<UUID[]>` (the IDs that were created)
- All `update*` and `delete*` methods now return `Promise<void>` (throw on failure)
- Single-item methods remain on `AgentRuntime` as convenience wrappers

**WHY UUID[] return:** Callers often need the created IDs for subsequent operations
(e.g., create entity, then add it as participant to a room). Returning `boolean` forced
callers to pass IDs through or re-query, which was wasteful.

**WHY void for update/delete:** These operations either succeed or fail. There's no
meaningful partial success. If 3 of 5 updates fail, the caller needs to know which
ones failed (via the thrown error), not just that "some failed" (via `false`).

#### 1B. Naming Convention
- Renamed `addRoomParticipants` → `createRoomParticipants`
- Renamed `setParticipantUserState` → `updateParticipantUserState`

**WHY:** CRUD naming convention: `create` = INSERT, `get` = SELECT, `update` = UPDATE,
`delete` = DELETE. Using `add` and `set` broke this pattern and made the API harder to
predict.

#### 1C. Changed Return Types

| Method | Before | After | WHY |
|--------|--------|-------|-----|
| `createAgents` | `boolean` | `UUID[]` | Need created agent IDs |
| `createEntities` | `boolean` | `UUID[]` | Need created entity IDs |
| `createComponents` | `boolean` | `UUID[]` | Need created component IDs |
| `createRelationships` | `boolean` | `UUID[]` | Need created relationship IDs |
| `createRoomParticipants` | `boolean` | `UUID[]` | Need participant record IDs |
| `updateMemories` | `boolean[]` | `void` | Throw on failure instead |
| `ensureEmbeddingDimension` | (implicit) | `Promise<void>` | Explicit async |

---

### Phase 2: Upsert Methods & SQL Optimizations

#### 2A. Upsert Methods
Added atomic upsert methods to eliminate get-check-create race conditions:
- `upsertAgents(agents)` → `Promise<void>`
- `upsertEntities(entities)` → `Promise<void>`
- `upsertRooms(rooms)` → `Promise<void>`
- `upsertWorlds(worlds)` → `Promise<void>`

**WHY void return:** Upserts are idempotent. The caller already has the IDs (they're
the conflict key). Returning `UUID[]` suggests new IDs were generated.

**WHY on the adapter:** PostgreSQL (`ON CONFLICT DO UPDATE`), MySQL (`ON DUPLICATE KEY
UPDATE`), and PGLite all support atomic upserts in a single statement. Moving this to
the adapter avoids the runtime's get-then-create pattern which has a race window.

#### 2B. Query Pagination
Added `limit`/`offset` parameters to query methods:
- `getTasks(params)` - added `limit`, `offset`
- `getRelationships(params)` - added `limit`, `offset`
- `getRoomsByWorld(worldId)` - added `limit`, `offset`

**WHY:** Without limits, a query for "all tasks in room X" could return thousands of
records, causing memory exhaustion and UI freezes.

#### 2C. MySQL Optimizations
- Aligned MySQL adapter with PostgreSQL optimizations
- Verified proper index coverage for all query patterns
- Used `ON DUPLICATE KEY UPDATE` for upserts

#### 2D. Index Audit
Verified all query patterns have proper index coverage across PostgreSQL and MySQL schemas.

---

### Phase 3: Interface Segregation & Plugin Support

#### 3A. IMessagingAdapter Extraction
Extracted messaging-specific operations into a separate `IMessagingAdapter` interface:
- `createMessageServer`, `getMessageServers`, `getMessageServerById`
- `createChannel`, `getChannels`, `getChannelById`
- `createMessage`, `getMessages`, `getMessageById`

**WHY:** Not all adapters support messaging tables. In-memory and local adapters don't
need message servers, channels, or messages. Putting these on `IDatabaseAdapter` would
force every adapter to implement stubs.

Added `runtime.getMessagingAdapter()` which returns `IMessagingAdapter | null` via
duck-typing (checks if the adapter has messaging methods).

#### 3B. Plugin Schema Registration
Added `registerPluginSchema` and `getPluginStore` to `IDatabaseAdapter` (optional):

- `PluginSchema` - adapter-agnostic table definition format
- `IPluginStore` - generic CRUD interface for plugin data
- `SqlPluginStore` - SQL implementation with dialect detection (PG + MySQL)

**WHY:** Plugins like goals and todos need custom tables. Without this, they must
cast `runtime.db` to Drizzle types, creating tight coupling to SQL adapters and
preventing plugins from working with in-memory backends.

---

### Adapter Updates

All four adapter implementations updated to match the new interface:

| Adapter | Package | Status |
|---------|---------|--------|
| PostgreSQL | `plugin-sql` | ✅ Updated |
| PGLite | `plugin-sql` | ✅ Updated |
| MySQL | `plugin-sql` | ✅ Updated |
| In-Memory | `plugin-inmemorydb` | ✅ Updated |
| Local Storage | `plugin-localdb` | ✅ Updated |

### Removed Package

| Package | Reason |
|---------|--------|
| `plugin-mysql` | Redundant. `plugin-sql` already handles MySQL via `MYSQL_URL` detection. The standalone package had diverged from the shared interface. |

---

### Files Changed

**Core types** (`packages/typescript/src/types/`):
- `database.ts` - Updated `IDatabaseAdapter` with batch-first methods, upserts, pagination
- `messaging.ts` - Added `IMessagingAdapter`, `MessageServer`, `MessagingChannel`, `MessagingMessage`
- `plugin-store.ts` - Added `PluginSchema`, `IPluginStore`, filter types
- `runtime.ts` - Added `getMessagingAdapter()` to `IAgentRuntime`
- `index.ts` - Re-exports for new type files

**Runtime** (`packages/typescript/src/`):
- `runtime.ts` - Implemented `getMessagingAdapter()`, updated all adapter calls

**SQL adapter** (`plugins/plugin-sql/typescript/`):
- `base.ts` - PG/PGLite adapter updated with new return types, messaging types, plugin store
- `mysql/base.ts` - MySQL adapter updated identically
- `stores/plugin.store.ts` - New: `SqlPluginStore` with PG+MySQL dialect detection
- `stores/*.store.ts` - Updated return types for all store functions
- `mysql/stores/*.store.ts` - Updated return types for all MySQL store functions

**In-memory adapter** (`plugins/plugin-inmemorydb/typescript/`):
- `adapter.ts` - Updated method names and return types

**Local adapter** (`plugins/plugin-localdb/typescript/`):
- `adapter.ts` - Updated method names and return types

**Tests** (`packages/typescript/src/__tests__/`):
- Updated mock adapters in all test files to use new method names and return types
Loading
Loading