Skip to content

redpanda/migrator: optimize schema registry sync with parallel processing#3951

Merged
mmatczuk merged 8 commits intomainfrom
mmt/migrator_max_parallel_http_requests
Feb 4, 2026
Merged

redpanda/migrator: optimize schema registry sync with parallel processing#3951
mmatczuk merged 8 commits intomainfrom
mmt/migrator_max_parallel_http_requests

Conversation

@mmatczuk
Copy link
Collaborator

@mmatczuk mmatczuk commented Feb 2, 2026

Improves schema registry migration performance by adding configurable parallel processing. Large schema registries with thousands of schemas now migrate significantly faster.

Changes

  1. Parallel subject processing - Added max_parallel_http_requests config option (default: 10) to control worker pool size for concurrent schema migration

  2. Memory optimization - Converted schema loading from batch to streaming iterator to handle large registries without loading all schemas into memory

  3. Correct dependency order - Implemented DFS traversal to ensure schema references are migrated in proper dependency order, matching migrator v1 behavior

  4. Deduplication - Each schema version is now fetched exactly once during sync loop

  5. Load balancing - Subject order is shuffled to distribute work evenly across workers

Configuration

New optional field in schema registry config:

output:
  redpanda_migrator:
    schema_registry:
      max_parallel_http_requests: 10  # Number of concurrent workers

Convert listSubjectSchemas() to return iterator for memory-efficient
processing of large schema registries.
… in SyncLoop()

Add filter function to listSubjectSchemas() to fetch each schema version
once. Change knownSubjects to set type since IDs are not tracked.
Fix concurrent access to knownSchemas map and improve code clarity.
Process schema references depth-first to ensure versions are migrated
in correct dependency order, matching migrator v1 behavior.
…ma migration

Add configurable worker pool to process subjects in parallel. Each worker
uses DFS traversal to complete entire subject trees before moving to next.
Shuffle subject order for improved load distribution across workers.
@mmatczuk mmatczuk force-pushed the mmt/migrator_max_parallel_http_requests branch from 70586b6 to 5cd248d Compare February 3, 2026 10:49
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mmatczuk mmatczuk merged commit c4f27fc into main Feb 4, 2026
5 checks passed
@mmatczuk mmatczuk deleted the mmt/migrator_max_parallel_http_requests branch February 4, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants