|
| 1 | +# EL Indexer Scale Analysis |
| 2 | +**Scale**: 40-50M transactions/day + 40-50M token transfers/day |
| 3 | +**Retention**: 6 months (~7.2-9B rows per table) |
| 4 | + |
| 5 | +## Critical Issues |
| 6 | + |
| 7 | +### 1. **Inefficient Cleanup Strategy** ⚠️ CRITICAL |
| 8 | +**Problem**: `DeleteElDataBeforeBlockUid` deletes hundreds of millions of rows in one transaction: |
| 9 | +```sql |
| 10 | +DELETE FROM el_transactions WHERE block_uid < $1 -- 7B+ rows |
| 11 | +``` |
| 12 | +- Locks table for hours |
| 13 | +- Generates massive WAL (hundreds of GB) |
| 14 | +- Causes replication lag |
| 15 | +- Risk of transaction timeouts |
| 16 | + |
| 17 | +**Solution**: Batched deletes with commits between batches |
| 18 | +- Delete in chunks (10k-100k rows per batch) |
| 19 | +- Commit between batches to allow other operations |
| 20 | +- Use `ctid` for efficient row selection |
| 21 | + |
| 22 | +**Impact**: Cleanup time: hours → minutes (non-blocking) |
| 23 | + |
| 24 | +### 2. **Single-Row Transaction Inserts** ⚠️ HIGH |
| 25 | +**Problem**: Inserting one transaction at a time: |
| 26 | +```go |
| 27 | +db.InsertElTransactions([]*dbtypes.ElTransaction{result.transaction}, dbTx) |
| 28 | +``` |
| 29 | + |
| 30 | +**Solution**: Batch transactions per block before inserting: |
| 31 | +```go |
| 32 | +// Collect all transactions for a block, then batch insert |
| 33 | +pendingTransactions = append(pendingTransactions, tx) |
| 34 | +if len(pendingTransactions) >= 1000 { |
| 35 | + db.InsertElTransactions(pendingTransactions, dbTx) |
| 36 | + pendingTransactions = pendingTransactions[:0] |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +**Impact**: 10-100x faster inserts (if batching implemented) |
| 41 | + |
| 42 | +### 3. **Missing Composite Indexes** ⚠️ HIGH |
| 43 | +**Problem**: Queries filter by multiple columns but indexes are single-column: |
| 44 | +- `WHERE from_id = X ORDER BY block_uid DESC` uses `from_id` index, then sorts |
| 45 | +- `WHERE token_id = X AND block_uid > Y` scans token_id index, filters block_uid |
| 46 | + |
| 47 | +**Solution**: Add composite indexes: |
| 48 | +```sql |
| 49 | +CREATE INDEX el_transactions_from_block_idx ON el_transactions (from_id, block_uid DESC); |
| 50 | +CREATE INDEX el_token_transfers_token_block_idx ON el_token_transfers (token_id, block_uid DESC); |
| 51 | +``` |
| 52 | + |
| 53 | +**Impact**: 10-100x faster filtered queries |
| 54 | + |
| 55 | +### 4. **Inefficient Pagination Queries** ⚠️ MEDIUM |
| 56 | +**Problem**: UNION ALL with count pattern: |
| 57 | +```sql |
| 58 | +SELECT count(*) AS id, ... FROM cte |
| 59 | +UNION ALL SELECT * FROM cte ORDER BY ... LIMIT ... |
| 60 | +``` |
| 61 | +- Counts entire result set (slow on billions of rows) |
| 62 | +- Two full scans of CTE |
| 63 | + |
| 64 | +**Solution**: Use window functions or separate count query: |
| 65 | +```sql |
| 66 | +-- Option 1: Window function (PostgreSQL 9.5+) |
| 67 | +SELECT *, COUNT(*) OVER() as total FROM cte ORDER BY ... LIMIT ... |
| 68 | + |
| 69 | +-- Option 2: Separate count (if count is approximate) |
| 70 | +-- Use pg_stat_user_tables for approximate counts |
| 71 | +``` |
| 72 | + |
| 73 | +**Impact**: 2-10x faster pagination |
| 74 | + |
| 75 | +### 5. **Index Maintenance** ⚠️ MEDIUM |
| 76 | +**Problem**: With billions of rows: |
| 77 | +- Indexes become huge (hundreds of GB) |
| 78 | +- VACUUM takes hours |
| 79 | +- REINDEX blocks writes |
| 80 | + |
| 81 | +**Solution**: |
| 82 | +- Use `CONCURRENTLY` for index creation |
| 83 | +- Regular `VACUUM ANALYZE` on partitions |
| 84 | +- Consider `pg_partman` for automatic maintenance |
| 85 | +- Monitor index bloat with `pg_stat_user_indexes` |
| 86 | + |
| 87 | +### 7. **Account Update Batching** ⚠️ MEDIUM |
| 88 | +**Problem**: `UpdateElAccountsLastNonce` loops with individual UPDATEs: |
| 89 | +```go |
| 90 | +for _, account := range accounts { |
| 91 | + dbTx.Exec("UPDATE el_accounts SET ... WHERE id = $3", ...) |
| 92 | +} |
| 93 | +``` |
| 94 | + |
| 95 | +**Solution**: Use batch UPDATE with VALUES: |
| 96 | +```sql |
| 97 | +UPDATE el_accounts AS a SET |
| 98 | + last_nonce = v.last_nonce, |
| 99 | + last_block_uid = v.last_block_uid |
| 100 | +FROM (VALUES ($1, $2, $3), ($4, $5, $6), ...) AS v(id, nonce, block_uid) |
| 101 | +WHERE a.id = v.id |
| 102 | +``` |
| 103 | + |
| 104 | +**Impact**: 10-50x faster account updates |
| 105 | + |
| 106 | +## Implemented Improvements |
| 107 | + |
| 108 | +### ✅ **Batched Cleanup** - `DeleteElDataBeforeBlockUid()` |
| 109 | +- Now uses batched deletes internally (50k rows per batch) |
| 110 | +- Commits between batches to avoid long locks |
| 111 | +- Non-blocking for other operations |
| 112 | + |
| 113 | +### ✅ **Composite Indexes Added** |
| 114 | +- `el_transactions_from_block_idx` - (from_id, block_uid DESC) |
| 115 | +- `el_transactions_to_block_idx` - (to_id, block_uid DESC) |
| 116 | +- `el_token_transfers_token_block_idx` - (token_id, block_uid DESC) |
| 117 | +- `el_token_transfers_from_block_idx` - (from_id, block_uid DESC) |
| 118 | +- `el_token_transfers_to_block_idx` - (to_id, block_uid DESC) |
| 119 | + |
| 120 | +### ✅ **Optimized Pagination Queries** |
| 121 | +- Replaced UNION ALL pattern with window functions (`COUNT(*) OVER()`) |
| 122 | +- Single scan instead of double scan |
| 123 | +- All pagination queries updated: |
| 124 | + - `GetElTransactionsByAccountID()` |
| 125 | + - `GetElTransactionsByAccountIDCombined()` |
| 126 | + - `GetElTokenTransfersByTokenID()` |
| 127 | + - `GetElTokenTransfersByAccountID()` |
| 128 | + - `GetElTokenTransfersByAccountIDCombined()` |
| 129 | + |
| 130 | +### ✅ **Batch Account Updates** - `UpdateElAccountsLastNonce()` |
| 131 | +- Now uses VALUES clause for efficient batch UPDATE |
| 132 | +- 10-50x faster than individual UPDATEs |
| 133 | + |
| 134 | +## Recommended Actions (Priority Order) |
| 135 | + |
| 136 | +### Immediate (Before Production) |
| 137 | +1. ✅ **Fix cleanup strategy** - `DeleteElDataBeforeBlockUid()` now uses batching |
| 138 | +2. ✅ **Add composite indexes** - Migration script ready |
| 139 | +3. ✅ **Optimize pagination queries** - Window functions implemented |
| 140 | + |
| 141 | +### Short-term (First Month) |
| 142 | +4. ⚠️ **Batch transaction inserts** - Collect per block, insert in batches (needs indexer changes) |
| 143 | +5. ✅ **Batch account updates** - Use `UpdateElAccountsLastNonceBatch()` instead of `UpdateElAccountsLastNonce()` |
| 144 | + |
| 145 | +### Long-term (Ongoing) |
| 146 | +6. ✅ **Monitoring** - Track query performance, index bloat, VACUUM times |
| 147 | +7. ✅ **Connection pooling** - Use pgbouncer for read replicas |
| 148 | +8. ⚠️ **Consider partitioning** - If performance degrades further (not implemented per request) |
| 149 | + |
| 150 | +## PostgreSQL Configuration Tuning |
| 151 | + |
| 152 | +For this scale, tune PostgreSQL: |
| 153 | + |
| 154 | +```ini |
| 155 | +# postgresql.conf |
| 156 | +shared_buffers = 32GB # 25% of RAM |
| 157 | +effective_cache_size = 96GB # 75% of RAM |
| 158 | +maintenance_work_mem = 4GB # For VACUUM/REINDEX |
| 159 | +work_mem = 256MB # Per query operation |
| 160 | +max_parallel_workers_per_gather = 4 |
| 161 | +max_parallel_workers = 16 |
| 162 | +wal_buffers = 64MB |
| 163 | +checkpoint_completion_target = 0.9 |
| 164 | +random_page_cost = 1.1 # For SSD |
| 165 | +effective_io_concurrency = 200 # For SSD |
| 166 | + |
| 167 | +# Partitioning |
| 168 | +enable_partition_pruning = on |
| 169 | +``` |
| 170 | + |
| 171 | +## Query Performance Estimates |
| 172 | + |
| 173 | +| Operation | Before | After Improvements | Improvement | |
| 174 | +|-----------|--------|-------------------|-------------| |
| 175 | +| INSERT (1M rows) | ~5-10 min | ~5-10 min | 1x (batching not implemented) | |
| 176 | +| SELECT by account_id | ~5-30 sec | ~100-500ms | 50-100x (composite indexes) | |
| 177 | +| Pagination queries | ~2-10 sec | ~200ms-1s | 10-50x (window functions) | |
| 178 | +| DELETE old data | ~hours | ~minutes | 10-100x (batched) | |
| 179 | +| Account batch update | ~10-50 sec | ~1-5 sec | 10-50x (VALUES clause) | |
| 180 | +| VACUUM | ~days | ~days | 1x (no partitioning) | |
| 181 | + |
| 182 | +## Migration Steps |
| 183 | + |
| 184 | +1. **Apply composite indexes**: |
| 185 | + ```sql |
| 186 | + -- Run the updated migration: db/schema/pgsql/20260104000000_el-explorer.sql |
| 187 | + -- This adds the composite indexes |
| 188 | + ``` |
| 189 | + |
| 190 | +2. **Cleanup function** - Already improved to use batching by default |
| 191 | + - `DeleteElDataBeforeBlockUid()` now uses batched deletes internally (50k rows per batch) |
| 192 | + - Note: dbTx parameter is ignored as batching requires managing its own transactions |
| 193 | + |
| 194 | +3. **Account update function** - Already improved to use batch VALUES clause |
| 195 | + - `UpdateElAccountsLastNonce()` now uses efficient batch update |
| 196 | + |
| 197 | +4. **Pagination queries** - Already updated with window functions, no code changes needed |
| 198 | + |
0 commit comments