Commit b4ea857
committed
feat(migrator): streaming batch approach for large schema migrations
- Add `batch_size` config to control subjects fetched per batch (default: 100)
- Add `workers` config for parallel sync within batches (default: 10)
- Stream schemas: fetch batch → sync batch → report progress → repeat
- Sample first batch to detect schema references
- Only sort subjects by schema ID when references are detected
- Force sequential processing when references exist to preserve ordering
- Real-time progress logging with synced/skipped counts
Ordering guarantees:
- If references detected: subjects sorted by schema ID, sequential processing
- If no references: parallel processing safe with fixed IDs (translate_ids=false)
Memory optimization:
- Only hold one batch in memory at a time
- Avoid 2x API calls in common case (no references)
- Tracking maps grow with synced schemas (unavoidable for deduplication)1 parent f789836 commit b4ea857
File tree
3 files changed
+428
-27
lines changed- docs/modules/components/pages/outputs
- internal/impl/redpanda/migrator
3 files changed
+428
-27
lines changedLines changed: 22 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| |||
139 | 141 | | |
140 | 142 | | |
141 | 143 | | |
| 144 | + | |
| 145 | + | |
142 | 146 | | |
143 | 147 | | |
144 | 148 | | |
| |||
1425 | 1429 | | |
1426 | 1430 | | |
1427 | 1431 | | |
| 1432 | + | |
| 1433 | + | |
| 1434 | + | |
| 1435 | + | |
| 1436 | + | |
| 1437 | + | |
| 1438 | + | |
| 1439 | + | |
| 1440 | + | |
| 1441 | + | |
| 1442 | + | |
| 1443 | + | |
| 1444 | + | |
| 1445 | + | |
| 1446 | + | |
| 1447 | + | |
| 1448 | + | |
| 1449 | + | |
1428 | 1450 | | |
1429 | 1451 | | |
1430 | 1452 | | |
| |||
0 commit comments