-
Notifications
You must be signed in to change notification settings - Fork 125
Description
strfry Query Performance: Deep Historical Queries
Observation
Queries with until filters become progressively slower when fetching older events. Working with a relay containing 200k+ events of a specific kind/author, query times increase from <1s for recent data to 30+s for data a few hours older.
Environment
- Remote relay: strfry (wss://nip85.brainstorm.world)
- Hardware: DigitalOcean droplet, 8GB RAM, SSD storage
- Client: nostr-tools SimplePool.querySync()
- Query:
{ kinds: [30382], authors: ["48ec018359cac3c933f0f7a14550e36a4f683dcf55520c916dd8c61e7724f5de"], until: <timestamp>, limit: 500 } - Known event count: ~200,000 events matching this filter
Observed Behavior
Query Duration by Depth
Pagination working backward from present (until = now, decrementing):
| Timestamp Range | Duration | Events/page | Notes |
|---|---|---|---|
| Recent (now) | 0.8-1s | 490-500 | Fast, likely in cache |
| -1hr | 7-8s | 450-500 | Moderate slowdown |
| -2hr | 12-14s | 400-480 | Significant slowdown |
| -3hr | 17s | 400-470 | Approaching timeout limits |
| -4.5hr | 24-25s | 400-476 | Near timeout limit (30s) |
| -5hr+ | 26-27s | 410-490 | Consistently near timeout |
| -6hr+ | 28-29s | 450-485 | 96% of timeout limit |
| Older | Timeout | 0 | Eventually returns empty |
Actual Results
- With 4.4s timeout: Retrieved 28,000 events before timing out
- With 8.8s timeout: Retrieved 40,000 events before timing out
- With 30s timeout: Retrieved 92,000+ events (ongoing), queries now at 28-29s
Query duration increases exponentially as we go back in time. At 92k events retrieved (roughly 6 hours of history), queries are taking 28-29 seconds - 96% of the 30 second timeout limit. This represents only ~46% of the known 200k total events. At this rate, queries will exceed the timeout before retrieving the remaining ~108k events.
Hardware specs (8GB RAM, SSD) should be adequate for query performance, suggesting the bottleneck may be LMDB-specific behavior rather than raw storage speed.
Hypothesis
This could be due to:
- LMDB page cache behavior (recent data cached, older data requires disk I/O)
- B+ tree traversal characteristics for deep historical queries
- Index structure for compound filters (author+kind+timestamp)
Reproduction Steps
# Using nostr-tools or similar client
# Start with recent data
filter = {
kinds: [30382],
authors: ["48ec018359cac3c933f0f7a14550e36a4f683dcf55520c916dd8c61e7724f5de"],
until: Math.floor(Date.now() / 1000),
limit: 500
}
# Get first page
events = await pool.querySync(relays, filter)
# Query completes in ~1s
# Paginate backward: set until to oldest timestamp from previous page
oldest = Math.min(...events.map(e => e.created_at))
filter.until = oldest
events = await pool.querySync(relays, filter)
# Query completes in ~1s
# Continue pagination...
# After fetching ~50k events over a few hours of history:
# Query duration has increased to 15-20s+Questions
- Is this behavior expected for deep historical queries with LMDB?
- Are there strfry configuration options that could help (cache size, etc.)?
- Is there a better query pattern for backfilling large historical datasets?
- Would a local strfry sync + negentropy copy improve query performance significantly?