Benchmark Subqueries

# Subquery Performance Benchmarks

Create system-level benchmarks for subqueries to assess performance, scalability, and memory usage.

## Context

Subqueries are implemented using in-memory Materializer processes that maintain indexes of subquery results and trigger move-in/move-out events when data changes. We need benchmarks that run the full system (sync service + Postgres) and measure real-world performance characteristics.

## Memory Locations to Monitor

### Consumer Process State (`Consumer.State`)

| Field | What It Stores | Memory Impact |
|-------|----------------|---------------|
| `buffer` | Transactions queued while buffering (reverse order) | O(buffered_txns × txn_size) - can grow unbounded during initial snapshot or move-in waits |
| `txn_offset_mapping` | List of `{shape_offset, txn_boundary}` tuples for flush alignment | O(unflushed_txns × 32 bytes) |
| `writer` | Storage writer state (ETS refs, file handles, buffers) | Varies by storage backend |
| `shape` | Full Shape struct including `shape_dependencies` | O(query_complexity + nested_shapes) |
| `transaction_builder` | Partial transaction fragments being assembled | O(fragment_size) - transient |

### MoveIns State (`Consumer.MoveIns`)

| Field | What It Stores | Memory Impact |
|-------|----------------|---------------|
| `waiting_move_ins` | Map of `name → {pg_snapshot, {ref_key, MapSet[moved_values]}}` | O(concurrent_move_ins × values_per_move_in) |
| `filtering_move_ins` | List of `{pg_snapshot, MapSet[keys]}` for completed but filtering move-ins | O(filtering_move_ins × keys_per_move_in) - can be large if many rows moved in |
| `touch_tracker` | Map of `key → xid` tracking which keys were touched | O(touched_keys) - grows with change volume, GC'd periodically |
| `in_flight_values` | Precalculated map of all moved-in values to skip in WHERE evaluation | O(total_in_flight_values) |
| `moved_out_tags` | Map of `move_in_name → MapSet[tags]` for move-outs during move-in | O(concurrent_move_ins × moved_out_tags) |
| `move_in_buffering_snapshot` | Union snapshot `{xmin, xmax, xip_list}` | ~100 bytes (but `xip_list` can grow) |
| `maximum_resolved_snapshot` | Snapshot for visibility boundary | ~100 bytes |
| `minimum_unresolved_snapshot` | Snapshot for visibility boundary | ~100 bytes |

### InitialSnapshot State (`Consumer.InitialSnapshot`)

| Field | What It Stores | Memory Impact |
|-------|----------------|---------------|
| `pg_snapshot` | Tuple `{xmin, xmax, xip_list}` | O(in_progress_txns) - `xip_list` can be large under high concurrency |
| `awaiting_snapshot_start` | List of `GenServer.from()` references | O(waiting_clients) |

### Materializer Process State (`Consumer.Materializer`)

| Field | What It Stores | Memory Impact |
|-------|----------------|---------------|
| `index` | All rows in subquery result: `key → value` | O(rows × row_size) - **primary memory consumer** |
| `tag_indices` | Reverse index: `tag_hash → MapSet[keys]` | O(unique_tags × keys_per_tag) |
| `value_counts` | Reference counting: `value → count` | O(unique_values × 16 bytes) |
| `subscribers` | MapSet of subscribed PIDs | O(subscriber_count) |

---

## Proposed Benchmarks

### 1. Concurrent Shape Creation

**Measure:**
- Time to create all shapes
- Memory usage

**Vary:**
- Number of shapes
- Number of subqueries per shape
- Size of subquery result set
- Subquery nesting depth
- DB latency (higher latency increases buffer size)
- Composite vs single-column key

### 2. Replication Throughput

**Measure:**
- Latency from insert to client receipt
- Memory usage

**Vary:**
- Number of shapes
- Number of transactions
- Number of rows per transaction
- Number of subqueries per shape
- Size of subquery result set
- Composite vs single-column key

### 3. Move-In/Move-Out

**Measure:**
- Latency from insert/delete to client receipt
- Memory usage (peak and steady-state)
- Connection pool utilisation
- GC pause times
- Message queue lengths

**Vary:**
- Move-in vs move-out
- Number of subqueries per shape
- Number of shapes
- Batch size (rows affected per move-in/out)
- DB latency (higher latency increases buffer size)
- Subquery nesting depth
- Composite vs single-column key


Field	What It Stores	Memory Impact
`waiting_move_ins`	Map of `name → {pg_snapshot, {ref_key, MapSet[moved_values]}}`	O(concurrent_move_ins × values_per_move_in)
`filtering_move_ins`	List of `{pg_snapshot, MapSet[keys]}` for completed but filtering move-ins	O(filtering_move_ins × keys_per_move_in) - can be large if many rows moved in
`touch_tracker`	Map of `key → xid` tracking which keys were touched	O(touched_keys) - grows with change volume, GC'd periodically
`in_flight_values`	Precalculated map of all moved-in values to skip in WHERE evaluation	O(total_in_flight_values)
`moved_out_tags`	Map of `move_in_name → MapSet[tags]` for move-outs during move-in	O(concurrent_move_ins × moved_out_tags)
`move_in_buffering_snapshot`	Union snapshot `{xmin, xmax, xip_list}`	~100 bytes (but `xip_list` can grow)
`maximum_resolved_snapshot`	Snapshot for visibility boundary	~100 bytes
`minimum_unresolved_snapshot`	Snapshot for visibility boundary	~100 bytes

Field	What It Stores	Memory Impact
`pg_snapshot`	Tuple `{xmin, xmax, xip_list}`	O(in_progress_txns) - `xip_list` can be large under high concurrency
`awaiting_snapshot_start`	List of `GenServer.from()` references	O(waiting_clients)

Field	What It Stores	Memory Impact
`index`	All rows in subquery result: `key → value`	O(rows × row_size) - primary memory consumer
`tag_indices`	Reverse index: `tag_hash → MapSet[keys]`	O(unique_tags × keys_per_tag)
`value_counts`	Reference counting: `value → count`	O(unique_values × 16 bytes)
`subscribers`	MapSet of subscribed PIDs	O(subscriber_count)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark Subqueries #3613

Subquery Performance Benchmarks

Context

Memory Locations to Monitor

Consumer Process State (`Consumer.State`)

MoveIns State (`Consumer.MoveIns`)

InitialSnapshot State (`Consumer.InitialSnapshot`)

Materializer Process State (`Consumer.Materializer`)

Proposed Benchmarks

1. Concurrent Shape Creation

2. Replication Throughput

3. Move-In/Move-Out

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	What It Stores	Memory Impact
`buffer`	Transactions queued while buffering (reverse order)	O(buffered_txns × txn_size) - can grow unbounded during initial snapshot or move-in waits
`txn_offset_mapping`	List of `{shape_offset, txn_boundary}` tuples for flush alignment	O(unflushed_txns × 32 bytes)
`writer`	Storage writer state (ETS refs, file handles, buffers)	Varies by storage backend
`shape`	Full Shape struct including `shape_dependencies`	O(query_complexity + nested_shapes)
`transaction_builder`	Partial transaction fragments being assembled	O(fragment_size) - transient

Benchmark Subqueries #3613

Description

Subquery Performance Benchmarks

Context

Memory Locations to Monitor

Consumer Process State (Consumer.State)

MoveIns State (Consumer.MoveIns)

InitialSnapshot State (Consumer.InitialSnapshot)

Materializer Process State (Consumer.Materializer)

Proposed Benchmarks

1. Concurrent Shape Creation

2. Replication Throughput

3. Move-In/Move-Out

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consumer Process State (`Consumer.State`)

MoveIns State (`Consumer.MoveIns`)

InitialSnapshot State (`Consumer.InitialSnapshot`)

Materializer Process State (`Consumer.Materializer`)