Skip to content

Conversation

@jfrancoa
Copy link
Collaborator

@jfrancoa jfrancoa commented Jan 5, 2026

Improve batch logic to use fixed-size by default (it still supports dynamic batch via argument when creating data), but this allows not overloading the cluster. The logic was also improved, allowing ingesting big data reducing the consumption of memory, as the data gets generated as it's getting consumed (generator - queue - consumers).

@jfrancoa jfrancoa requested a review from Copilot January 5, 2026 13:01
Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the batch ingestion logic to improve performance and memory efficiency when inserting data into Weaviate. The implementation introduces a producer-consumer pattern with two distinct modes: fixed-size batching (default) and dynamic batching (opt-in).

Key Changes:

  • Introduces fixed-size batch ingestion as the default, with configurable batch size and concurrent requests to prevent cluster overload
  • Adds dynamic batch mode as an opt-in feature for high-throughput scenarios using multiprocessing
  • Implements a memory-efficient producer-consumer pattern with queue-based coordination between data generation and ingestion

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
weaviate_cli/managers/data_manager.py Core refactoring implementing new producer-consumer ingestion pattern with _ErrorTracker class, new __producer_consumer_ingest method supporting both fixed-size and dynamic batch modes, and simplified update logic
weaviate_cli/managers/config_manager.py Adds support for slow connection environments via SLOW_CONNECTION environment variable that doubles client timeouts
weaviate_cli/defaults.py Introduces MAX_WORKERS constant and new CreateDataDefaults fields for batch configuration (batch_size, dynamic_batch)
weaviate_cli/commands/create.py Adds CLI options for --dynamic_batch, --batch_size, and --concurrent_requests with validation logic
.github/workflows/release.yaml Updates GitHub Actions artifact upload/download actions to newer versions
.github/workflows/main.yaml Updates GitHub Actions upload-artifact to v5

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jfrancoa jfrancoa force-pushed the improve-batch-logic branch from 56493b2 to dab185f Compare January 5, 2026 15:44
jfrancoa and others added 8 commits January 5, 2026 17:10
Instead of generating the data first and then ingesting
it, we attemp to use fixed_size batching and leave each
worker taking care of generating it's corresponding batch.

Making it more memory efficient.
Use a combination of streaming via multiprocessing.
Keeps the fixed_size implementation the same to ensure
not overloading the server.
This commit re-attempts connection in case the link is
slow and it takes longer than the timeout to perform the
gRPC checks. Next re-attempt takes place with a larger
timeout.

Also, fixes some logic issue when querying data for
multitenant collections with auto-tenant creation.
@jfrancoa jfrancoa force-pushed the improve-batch-logic branch from dab185f to 89f007e Compare January 5, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants