Skip to content

Benchmark Subqueries #3613

@robacourt

Description

@robacourt

Subquery Performance Benchmarks

Create system-level benchmarks for subqueries to assess performance, scalability, and memory usage.

Context

Subqueries are implemented using in-memory Materializer processes that maintain indexes of subquery results and trigger move-in/move-out events when data changes. We need benchmarks that run the full system (sync service + Postgres) and measure real-world performance characteristics.

Memory Locations to Monitor

Consumer Process State (Consumer.State)

Field What It Stores Memory Impact
buffer Transactions queued while buffering (reverse order) O(buffered_txns × txn_size) - can grow unbounded during initial snapshot or move-in waits
txn_offset_mapping List of {shape_offset, txn_boundary} tuples for flush alignment O(unflushed_txns × 32 bytes)
writer Storage writer state (ETS refs, file handles, buffers) Varies by storage backend
shape Full Shape struct including shape_dependencies O(query_complexity + nested_shapes)
transaction_builder Partial transaction fragments being assembled O(fragment_size) - transient

MoveIns State (Consumer.MoveIns)

Field What It Stores Memory Impact
waiting_move_ins Map of name → {pg_snapshot, {ref_key, MapSet[moved_values]}} O(concurrent_move_ins × values_per_move_in)
filtering_move_ins List of {pg_snapshot, MapSet[keys]} for completed but filtering move-ins O(filtering_move_ins × keys_per_move_in) - can be large if many rows moved in
touch_tracker Map of key → xid tracking which keys were touched O(touched_keys) - grows with change volume, GC'd periodically
in_flight_values Precalculated map of all moved-in values to skip in WHERE evaluation O(total_in_flight_values)
moved_out_tags Map of move_in_name → MapSet[tags] for move-outs during move-in O(concurrent_move_ins × moved_out_tags)
move_in_buffering_snapshot Union snapshot {xmin, xmax, xip_list} ~100 bytes (but xip_list can grow)
maximum_resolved_snapshot Snapshot for visibility boundary ~100 bytes
minimum_unresolved_snapshot Snapshot for visibility boundary ~100 bytes

InitialSnapshot State (Consumer.InitialSnapshot)

Field What It Stores Memory Impact
pg_snapshot Tuple {xmin, xmax, xip_list} O(in_progress_txns) - xip_list can be large under high concurrency
awaiting_snapshot_start List of GenServer.from() references O(waiting_clients)

Materializer Process State (Consumer.Materializer)

Field What It Stores Memory Impact
index All rows in subquery result: key → value O(rows × row_size) - primary memory consumer
tag_indices Reverse index: tag_hash → MapSet[keys] O(unique_tags × keys_per_tag)
value_counts Reference counting: value → count O(unique_values × 16 bytes)
subscribers MapSet of subscribed PIDs O(subscriber_count)

Proposed Benchmarks

1. Concurrent Shape Creation

Measure:

  • Time to create all shapes
  • Memory usage

Vary:

  • Number of shapes
  • Number of subqueries per shape
  • Size of subquery result set
  • Subquery nesting depth
  • DB latency (higher latency increases buffer size)
  • Composite vs single-column key

2. Replication Throughput

Measure:

  • Latency from insert to client receipt
  • Memory usage

Vary:

  • Number of shapes
  • Number of transactions
  • Number of rows per transaction
  • Number of subqueries per shape
  • Size of subquery result set
  • Composite vs single-column key

3. Move-In/Move-Out

Measure:

  • Latency from insert/delete to client receipt
  • Memory usage (peak and steady-state)
  • Connection pool utilisation
  • GC pause times
  • Message queue lengths

Vary:

  • Move-in vs move-out
  • Number of subqueries per shape
  • Number of shapes
  • Batch size (rows affected per move-in/out)
  • DB latency (higher latency increases buffer size)
  • Subquery nesting depth
  • Composite vs single-column key

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions