Skip to content

Commit 9561094

Browse files
committed
feat(migrate): optimize doc enumeration and simplify CLI
Document Enumeration Optimization: - Use FT.AGGREGATE WITHCURSOR for efficient key enumeration - Falls back to SCAN only when index has hash_indexing_failures - Pre-enumerate keys before drop for reliable re-indexing CLI Simplification: - Remove redundant --allow-downtime flag from apply/batch-apply - Plan review is now the safety mechanism Batch Migration: - Add BatchMigrationExecutor and BatchMigrationPlanner - Support for multi-index migration with failure policies - Resumable batch operations with state persistence Bug Fixes: - Fix mypy type errors in planner, wizard, validation, and CLI Documentation: - Update concepts and how-to guides for new workflow - Remove --allow-downtime references from all docs
1 parent 61c6e80 commit 9561094

File tree

13 files changed

+2920
-168
lines changed

13 files changed

+2920
-168
lines changed

docs/concepts/index-migrations.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ The process:
5656
4. Wait for Redis to re-index the existing documents
5757
5. Validate the result
5858

59-
**Tradeoff**: The index is unavailable during the rebuild. The migrator requires explicit acknowledgment of this downtime before proceeding.
59+
**Tradeoff**: The index is unavailable during the rebuild. Review the migration plan carefully before applying.
6060

6161
## Index only vs document dependent changes
6262

@@ -130,7 +130,16 @@ Adding a vector field means all existing documents need vectors for that field.
130130

131131
## Downtime considerations
132132

133-
With `drop_recreate`, your index is unavailable between the drop and when re-indexing completes. Plan for:
133+
c lWith `drop_recreate`, your index is unavailable between the drop and when re-indexing completes.
134+
135+
**CRITICAL**: Downtime requires both reads AND writes to be paused:
136+
137+
| Requirement | Reason |
138+
|-------------|--------|
139+
| **Pause reads** | Index is unavailable during migration |
140+
| **Pause writes** | Redis updates indexes synchronously. Writes during migration may conflict with vector re-encoding or be missed |
141+
142+
Plan for:
134143

135144
- Search unavailability during the migration window
136145
- Partial results while indexing is in progress
@@ -151,8 +160,9 @@ The migration workflow has distinct phases. Here is what each mode affects:
151160
|-------|-----------|------------|-------|
152161
| **Plan generation** | `MigrationPlanner.create_plan()` | `AsyncMigrationPlanner.create_plan()` | Reads index metadata from Redis |
153162
| **Schema snapshot** | Sync Redis calls | Async Redis calls | Single `FT.INFO` command |
163+
| **Enumeration** | FT.AGGREGATE (or SCAN fallback) | FT.AGGREGATE (or SCAN fallback) | Before drop, only if quantization needed |
154164
| **Drop index** | `index.delete()` | `await index.delete()` | Single `FT.DROPINDEX` command |
155-
| **Quantization** | Sequential SCAN + HSET | Pipelined SCAN + batched HSET | See below |
165+
| **Quantization** | Sequential HGET + HSET | Pipelined HGET + batched HSET | Uses pre-enumerated keys |
156166
| **Create index** | `index.create()` | `await index.create()` | Single `FT.CREATE` command |
157167
| **Readiness polling** | `time.sleep()` loop | `asyncio.sleep()` loop | Polls `FT.INFO` until indexed |
158168
| **Validation** | Sync Redis calls | Async Redis calls | Schema and doc count checks |
@@ -177,13 +187,13 @@ Async execution (`--async` flag) provides benefits in specific scenarios:
177187

178188
Converting float32 to float16 requires reading every vector, converting it, and writing it back. The async executor:
179189

180-
- Uses `SCAN` with `COUNT 500` to iterate keys without blocking Redis (per [Redis SCAN docs](https://redis.io/docs/latest/commands/scan/), SCAN is O(1) per call)
190+
- Enumerates documents using `FT.AGGREGATE WITHCURSOR` for index-specific enumeration (falls back to `SCAN` only if indexing failures exist)
181191
- Pipelines `HSET` operations in batches (100-1000 operations per pipeline is optimal for Redis)
182192
- Yields to the event loop between batches so other tasks can proceed
183193

184194
**Large keyspaces (40M+ keys)**
185195

186-
When your Redis instance has many keys, `SCAN` iteration can take minutes. Async mode yields between batches.
196+
When your Redis instance has many keys and the index has indexing failures (requiring SCAN fallback), async mode yields between batches.
187197

188198
**Async application integration**
189199

@@ -205,12 +215,18 @@ asyncio.run(migrate())
205215

206216
### Why async helps with quantization
207217

208-
The key difference is in the vector re-encoding loop:
218+
The migrator uses an optimized enumeration strategy:
219+
220+
1. **Index-based enumeration**: Uses `FT.AGGREGATE WITHCURSOR` to enumerate only indexed documents (not the entire keyspace)
221+
2. **Fallback for safety**: If the index has indexing failures (`hash_indexing_failures > 0`), falls back to `SCAN` to ensure completeness
222+
3. **Enumerate before drop**: Captures the document list while the index still exists, then drops and quantizes
223+
224+
This optimization provides 10-1000x speedup for sparse indexes (where only a small fraction of prefix-matching keys are indexed).
209225

210226
**Sync quantization:**
211227
```
228+
enumerate keys (FT.AGGREGATE or SCAN) -> store list
212229
for each batch of 500 keys:
213-
SCAN (blocks) -> get keys
214230
for each key:
215231
HGET field (blocks)
216232
convert array
@@ -220,8 +236,8 @@ for each batch of 500 keys:
220236

221237
**Async quantization:**
222238
```
239+
enumerate keys (FT.AGGREGATE or SCAN) -> store list
223240
for each batch of 500 keys:
224-
await SCAN -> get keys (yields)
225241
for each key:
226242
await HGET field (yields)
227243
convert array

docs/user_guide/cli.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
"| `rvl migrate` | `helper` or `list` | show migration guidance and list indexes available for migration|\n",
5555
"| `rvl migrate` | `wizard` | interactively build a migration plan and schema patch|\n",
5656
"| `rvl migrate` | `plan` | generate `migration_plan.yaml` from a patch or target schema|\n",
57-
"| `rvl migrate` | `apply --allow-downtime` | execute a reviewed `drop_recreate` migration|\n",
57+
"| `rvl migrate` | `apply` | execute a reviewed `drop_recreate` migration|\n",
5858
"| `rvl migrate` | `validate` | validate a completed migration and emit report artifacts|"
5959
]
6060
},

docs/user_guide/how_to_guides/migrate-indexes.md

Lines changed: 48 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ rvl migrate list --url redis://localhost:6379
2121
rvl migrate wizard --index myindex --url redis://localhost:6379
2222

2323
# 3. Apply the migration
24-
rvl migrate apply --plan migration_plan.yaml --allow-downtime --url redis://localhost:6379
24+
rvl migrate apply --plan migration_plan.yaml --url redis://localhost:6379
2525

2626
# 4. Verify the result
2727
rvl migrate validate --plan migration_plan.yaml --url redis://localhost:6379
@@ -266,14 +266,45 @@ merged_target_schema:
266266
- `diff_classification.blocked_reasons` - Must be empty
267267
- `merged_target_schema` - The final schema after migration
268268

269+
## Understanding Downtime Requirements
270+
271+
**CRITICAL**: During a `drop_recreate` migration, your application must:
272+
273+
| Requirement | Description |
274+
|-------------|-------------|
275+
| **Pause reads** | Index is unavailable during migration |
276+
| **Pause writes** | Writes during migration may be missed or cause conflicts |
277+
278+
### Why Both Reads AND Writes Must Be Paused
279+
280+
- **Reads**: The index definition is dropped and recreated. Any queries during this window will fail.
281+
- **Writes**: Redis updates indexes synchronously on every write. If your app writes documents while the index is dropped, those writes are not indexed. Additionally, if you're quantizing vectors (float32 → float16), concurrent writes may conflict with the migration's re-encoding process.
282+
283+
### What "Downtime" Means
284+
285+
| Downtime Type | Reads | Writes | Safe? |
286+
|---------------|-------|--------|-------|
287+
| Full quiesce (recommended) | Stopped | Stopped | **YES** |
288+
| Read-only pause | Stopped | Continuing | **NO** |
289+
| Active | Active | Active | **NO** |
290+
291+
### Recovery from Interrupted Migration
292+
293+
| Interruption Point | Documents | Index | Recovery |
294+
|--------------------|-----------|-------|----------|
295+
| After drop, before quantize | Unchanged | **None** | Re-run apply |
296+
| After quantization, before create | Quantized | **None** | Manual FT.CREATE or re-run apply |
297+
| After create | Correct | Rebuilding | Wait for index ready |
298+
299+
The underlying documents are **never deleted** by `drop_recreate` mode.
300+
269301
## Step 4: Apply the Migration
270302

271-
The `apply` command requires `--allow-downtime` since the index will be temporarily unavailable.
303+
The `apply` command executes the migration. The index will be temporarily unavailable during the drop-recreate process.
272304

273305
```bash
274306
rvl migrate apply \
275307
--plan migration_plan.yaml \
276-
--allow-downtime \
277308
--url redis://localhost:6379 \
278309
--report-out migration_report.yaml \
279310
--benchmark-out benchmark_report.yaml
@@ -296,14 +327,13 @@ For large migrations (especially those involving vector quantization), use the `
296327
```bash
297328
rvl migrate apply \
298329
--plan migration_plan.yaml \
299-
--allow-downtime \
300330
--async \
301331
--url redis://localhost:6379
302332
```
303333

304334
**What becomes async:**
305335

306-
- Keyspace SCAN during quantization (yields between batches of 500 keys)
336+
- Document enumeration during quantization (uses `FT.AGGREGATE WITHCURSOR` for index-specific enumeration, falling back to SCAN only if indexing failures exist)
307337
- Vector read/write operations (pipelined HGET/HSET)
308338
- Index readiness polling (uses `asyncio.sleep()` instead of blocking)
309339
- Validation checks
@@ -388,7 +418,6 @@ rvl migrate validate \
388418
- `--url` : Redis connection URL
389419
- `--index` : Index name to migrate
390420
- `--plan` / `--plan-out` : Path to migration plan
391-
- `--allow-downtime` : Acknowledge index unavailability (required for apply)
392421
- `--async` : Use async executor for large migrations (apply only)
393422
- `--report-out` : Path for validation report
394423
- `--benchmark-out` : Path for performance metrics
@@ -504,7 +533,6 @@ rvl migrate batch-plan \
504533
# 3. Apply the batch plan
505534
rvl migrate batch-apply \
506535
--plan batch_plan.yaml \
507-
--allow-downtime \
508536
--accept-data-loss \
509537
--url redis://localhost:6379
510538
@@ -587,26 +615,31 @@ created_at: "2026-03-20T10:00:00Z"
587615
# Apply with fail-fast (default: stop on first error)
588616
rvl migrate batch-apply \
589617
--plan batch_plan.yaml \
590-
--allow-downtime \
591618
--accept-data-loss \
592619
--url redis://localhost:6379
593620
594-
# Apply with continue-on-error (process all possible indexes)
621+
# Apply with continue-on-error (set at batch-plan time)
622+
# Note: failure_policy is set during batch-plan, not batch-apply
623+
rvl migrate batch-plan \
624+
--pattern "*_idx" \
625+
--schema-patch quantize_patch.yaml \
626+
--failure-policy continue_on_error \
627+
--output batch_plan.yaml \
628+
--url redis://localhost:6379
629+
595630
rvl migrate batch-apply \
596631
--plan batch_plan.yaml \
597-
--allow-downtime \
598632
--accept-data-loss \
599-
--failure-policy continue_on_error \
600633
--url redis://localhost:6379
601634
```
602635

603-
**Flags:**
604-
- `--allow-downtime` : Required (each index is temporarily unavailable during migration)
636+
**Flags for batch-apply:**
605637
- `--accept-data-loss` : Required when quantizing vectors (float32 → float16 is lossy)
606-
- `--failure-policy` : `fail_fast` (default) or `continue_on_error`
607638
- `--state` : Path to checkpoint file (default: `batch_state.yaml`)
608639
- `--report-dir` : Directory for per-index reports (default: `./reports/`)
609640

641+
**Note:** `--failure-policy` is set during `batch-plan`, not `batch-apply`. The policy is stored in the batch plan file.
642+
610643
### Resume After Failure
611644

612645
Batch migration automatically checkpoints progress. If interrupted:
@@ -615,14 +648,12 @@ Batch migration automatically checkpoints progress. If interrupted:
615648
# Resume from where it left off
616649
rvl migrate batch-resume \
617650
--state batch_state.yaml \
618-
--allow-downtime \
619651
--url redis://localhost:6379
620652
621653
# Retry previously failed indexes
622654
rvl migrate batch-resume \
623655
--state batch_state.yaml \
624656
--retry-failed \
625-
--allow-downtime \
626657
--url redis://localhost:6379
627658
```
628659

@@ -686,7 +717,7 @@ from redisvl.migration import BatchMigrationPlanner, BatchMigrationExecutor
686717

687718
# Create batch plan
688719
planner = BatchMigrationPlanner()
689-
batch_plan = planner.create_plan(
720+
batch_plan = planner.create_batch_plan(
690721
redis_url="redis://localhost:6379",
691722
pattern="*_idx",
692723
schema_patch_path="quantize_patch.yaml",

redisvl/cli/migrate.py

Lines changed: 7 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
load_yaml,
2020
write_benchmark_report,
2121
write_migration_report,
22-
write_yaml,
2322
)
2423
from redisvl.migration.wizard import MigrationWizard
2524
from redisvl.utils.log import get_logger
@@ -105,7 +104,7 @@ def helper(self):
105104
rvl migrate list List all indexes
106105
rvl migrate wizard --index <name> Guided migration builder
107106
rvl migrate plan --index <name> --schema-patch <patch.yaml>
108-
rvl migrate apply --plan <plan.yaml> --allow-downtime
107+
rvl migrate apply --plan <plan.yaml>
109108
rvl migrate validate --plan <plan.yaml>"""
110109
)
111110

@@ -211,16 +210,11 @@ def wizard(self):
211210
def apply(self):
212211
parser = argparse.ArgumentParser(
213212
usage=(
214-
"rvl migrate apply --plan <migration_plan.yaml> --allow-downtime "
213+
"rvl migrate apply --plan <migration_plan.yaml> "
215214
"[--async] [--report-out <migration_report.yaml>]"
216215
)
217216
)
218217
parser.add_argument("--plan", help="Path to migration_plan.yaml", required=True)
219-
parser.add_argument(
220-
"--allow-downtime",
221-
help="Explicitly acknowledge downtime for drop_recreate",
222-
action="store_true",
223-
)
224218
parser.add_argument(
225219
"--async",
226220
dest="use_async",
@@ -245,11 +239,6 @@ def apply(self):
245239
parser = add_redis_connection_options(parser)
246240
args = parser.parse_args(sys.argv[3:])
247241

248-
if not args.allow_downtime:
249-
raise ValueError(
250-
"apply requires --allow-downtime for drop_recreate migrations"
251-
)
252-
253242
redis_url = create_redis_url(args)
254243
plan = load_migration_plan(args.plan)
255244

@@ -271,7 +260,7 @@ def _apply_sync(self, plan, redis_url: str, query_check_file: Optional[str]):
271260

272261
print(f"\nApplying migration to '{plan.source.index_name}'...")
273262

274-
def progress_callback(step: str, detail: str) -> None:
263+
def progress_callback(step: str, detail: Optional[str]) -> None:
275264
step_labels = {
276265
"drop": "[1/5] Drop index",
277266
"quantize": "[2/5] Quantize vectors",
@@ -301,7 +290,7 @@ async def _apply_async(self, plan, redis_url: str, query_check_file: Optional[st
301290

302291
print(f"\nApplying migration to '{plan.source.index_name}' (async mode)...")
303292

304-
def progress_callback(step: str, detail: str) -> None:
293+
def progress_callback(step: str, detail: Optional[str]) -> None:
305294
step_labels = {
306295
"drop": "[1/5] Drop index",
307296
"quantize": "[2/5] Quantize vectors",
@@ -427,9 +416,7 @@ def _print_plan_summary(self, plan_out: str, plan) -> None:
427416

428417
print("\nNext steps:")
429418
print(f" Review the plan: cat {plan_out}")
430-
print(
431-
f" Apply the migration: rvl migrate apply --plan {plan_out} --allow-downtime"
432-
)
419+
print(f" Apply the migration: rvl migrate apply --plan {plan_out}")
433420
print(f" Validate the result: rvl migrate validate --plan {plan_out}")
434421
print(
435422
f"\nTo add more changes: rvl migrate wizard --index {plan.source.index_name} --patch schema_patch.yaml"
@@ -518,16 +505,11 @@ def batch_apply(self):
518505
"""Execute a batch migration plan with checkpointing."""
519506
parser = argparse.ArgumentParser(
520507
usage=(
521-
"rvl migrate batch-apply --plan <batch_plan.yaml> --allow-downtime "
508+
"rvl migrate batch-apply --plan <batch_plan.yaml> "
522509
"[--state <batch_state.yaml>] [--report-dir <./reports>]"
523510
)
524511
)
525512
parser.add_argument("--plan", help="Path to batch_plan.yaml", required=True)
526-
parser.add_argument(
527-
"--allow-downtime",
528-
help="Explicitly acknowledge downtime for drop_recreate",
529-
action="store_true",
530-
)
531513
parser.add_argument(
532514
"--accept-data-loss",
533515
help="Acknowledge that quantization is lossy and cannot be reverted",
@@ -546,11 +528,6 @@ def batch_apply(self):
546528
parser = add_redis_connection_options(parser)
547529
args = parser.parse_args(sys.argv[3:])
548530

549-
if not args.allow_downtime:
550-
raise ValueError(
551-
"batch-apply requires --allow-downtime for drop_recreate migrations"
552-
)
553-
554531
# Load batch plan
555532
from redisvl.migration.models import BatchPlan
556533

@@ -701,7 +678,7 @@ def _print_batch_plan_summary(self, plan_out: str, batch_plan) -> None:
701678
f"""
702679
Next steps:
703680
Review the plan: cat {plan_out}
704-
Apply the migration: rvl migrate batch-apply --plan {plan_out} --allow-downtime"""
681+
Apply the migration: rvl migrate batch-apply --plan {plan_out}"""
705682
)
706683

707684
if batch_plan.requires_quantization:

0 commit comments

Comments
 (0)