redpanda/migrator: optimize schema registry sync with parallel processing by mmatczuk · Pull Request #3951 · redpanda-data/connect

mmatczuk · 2026-02-02T15:41:52Z

Improves schema registry migration performance by adding configurable parallel processing. Large schema registries with thousands of schemas now migrate significantly faster.

Changes

Parallel subject processing - Added max_parallel_http_requests config option (default: 10) to control worker pool size for concurrent schema migration
Memory optimization - Converted schema loading from batch to streaming iterator to handle large registries without loading all schemas into memory
Correct dependency order - Implemented DFS traversal to ensure schema references are migrated in proper dependency order, matching migrator v1 behavior
Deduplication - Each schema version is now fetched exactly once during sync loop
Load balancing - Subject order is shuffled to distribute work evenly across workers

Configuration

New optional field in schema registry config:

output:
  redpanda_migrator:
    schema_registry:
      max_parallel_http_requests: 10  # Number of concurrent workers

Convert listSubjectSchemas() to return iterator for memory-efficient processing of large schema registries.

… in SyncLoop() Add filter function to listSubjectSchemas() to fetch each schema version once. Change knownSubjects to set type since IDs are not tracked.

Fix concurrent access to knownSchemas map and improve code clarity.

…as()

Process schema references depth-first to ensure versions are migrated in correct dependency order, matching migrator v1 behavior.

…ma migration Add configurable worker pool to process subjects in parallel. Each worker uses DFS traversal to complete entire subject trees before moving to next. Shuffle subject order for improved load distribution across workers.

rockwotj

LGTM

mmatczuk added 6 commits February 2, 2026 16:34

redpanda/migrator: stream schemas instead of loading all into memory

e26341f

Convert listSubjectSchemas() to return iterator for memory-efficient processing of large schema registries.

redpanda/migrator(schema registry): fetch subject version schema once…

0799d2d

… in SyncLoop() Add filter function to listSubjectSchemas() to fetch each schema version once. Change knownSubjects to set type since IDs are not tracked.

chore(redpanda/migrator): extract checkSchemaIDConflict() method

81d126f

Fix concurrent access to knownSchemas map and improve code clarity.

chore(redpanda/migrator): pass version explicitly to listSubjectSchem…

7d0ffbe

…as()

redpanda/migrator: implement DFS traversal for schema dependencies

2cdfa49

Process schema references depth-first to ensure versions are migrated in correct dependency order, matching migrator v1 behavior.

mmatczuk mentioned this pull request Feb 2, 2026

feat(migrator): streaming batch approach for large schema migrations #3949

Closed

mmatczuk requested review from Jeffail and vuldin February 3, 2026 10:18

redpanda/migrator: add progress logs to schema migration worker

5cd248d

mmatczuk force-pushed the mmt/migrator_max_parallel_http_requests branch from 70586b6 to 5cd248d Compare February 3, 2026 10:49

redpanda/migrator: add lint rule for max_parallel_http_requests

8f11fcb

rockwotj approved these changes Feb 3, 2026

View reviewed changes

mmatczuk merged commit c4f27fc into main Feb 4, 2026
5 checks passed

mmatczuk deleted the mmt/migrator_max_parallel_http_requests branch February 4, 2026 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redpanda/migrator: optimize schema registry sync with parallel processing#3951

redpanda/migrator: optimize schema registry sync with parallel processing#3951
mmatczuk merged 8 commits intomainfrom
mmt/migrator_max_parallel_http_requests

mmatczuk commented Feb 2, 2026

Uh oh!

rockwotj left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmatczuk commented Feb 2, 2026

Changes

Configuration

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants