Skip to content

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented Oct 7, 2025

This enforces the max size on the sampling config, and includes a new counter for documents rejected due to having exceeded the max size.
Note: I'm changing the serialization of SampleStats without including a new TransportVersion. This is safe because this object is not reachable in any released version, and is unavailable on serverless.

@masseyke masseyke added >non-issue :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v9.3.0 labels Oct 7, 2025
@masseyke masseyke requested a review from Copilot October 8, 2025 20:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements maximum size enforcement for document sampling in Elasticsearch. It adds a new rejection counter for documents that exceed the configured size limit and prevents samples from growing beyond the specified maximum size.

  • Added size-based filtering logic to prevent documents from being added when they would exceed the configured maximum size
  • Introduced a new counter samplesRejectedForSize to track documents rejected due to size constraints
  • Implemented efficient size calculation for sample collections with caching to optimize performance

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
SamplingService.java Core implementation of size enforcement logic and new rejection counter
SamplingServiceTests.java Added test coverage for maximum size enforcement behavior
SamplingServiceSampleStatsTests.java Updated test infrastructure to handle new rejection counter
GetSampleStatsActionResponseTests.java Updated test data generation for new stats field
GetSampleStatsActionNodeResponseTests.java Updated test data generation for new stats field

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@masseyke masseyke requested a review from Copilot October 8, 2025 20:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@masseyke masseyke marked this pull request as ready for review October 8, 2025 21:55
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Oct 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@seanzatzdev seanzatzdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, just left a few questions I'd like to discuss

@masseyke masseyke requested a review from seanzatzdev October 9, 2025 14:12
@masseyke masseyke merged commit 5a4010c into elastic:main Oct 9, 2025
34 checks passed
@masseyke masseyke deleted the random-sampling-max-size branch October 9, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >non-issue Team:Data Management Meta label for data/management team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants