Skip to content

Conversation

@lukewhiting
Copy link
Contributor

@lukewhiting lukewhiting commented Aug 15, 2025

This pull request introduces endpoint parameter restrictions for streams, specifically targeting the logs stream, and enforces allowed parameters for bulk and index operations based on the API designs of Streams.

New YAML rest suite added to verify these restrictions.

@lukewhiting
Copy link
Contributor Author

Originally we were going to create a new REST action for logs streams that did it's checks and then delegated back down to the relevant existing REST action.

I'm not 100% happy with either approach TBH... I'm loath to have streams leak out across the codebase but I also don't want the maintenance headache of unnecessary duplication as the REST framework isn't set up for that kind of delegation and relies on a lot on interface methods to provide configuration that end up duplicated on the new endpoint.

On balance, I think just doing it this way is the lesser of two evils for and there is precedent for other features doing it this way however that opinion might change if we added any additional streams logic to the indexing endpoints.

Comments and differing opinions most welcome 😃

@lukewhiting lukewhiting force-pushed the es-11331-streams-params-restriction branch 2 times, most recently from e598e8e to b8a8020 Compare August 19, 2025 12:57
@lukewhiting lukewhiting force-pushed the es-11331-streams-params-restriction branch from 717dde4 to 960097e Compare August 20, 2025 12:41
@lukewhiting lukewhiting changed the title DRAFT: ES-11331 streams params restriction ES-11331 streams params restriction Aug 20, 2025
@lukewhiting lukewhiting added >enhancement :Data Management/Data streams Data streams and their lifecycles labels Aug 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @lukewhiting, I've created a changelog YAML for you.

@lukewhiting lukewhiting marked this pull request as ready for review August 20, 2025 12:48
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Aug 20, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces endpoint parameter restrictions for streams, specifically targeting the logs stream, to ensure only allowed parameters are used in bulk and index operations according to streams API design specifications.

  • Adds validation to restrict parameters when writing to streams, allowing only index, op_type, error_trace, and timeout
  • Implements parameter validation in both bulk and single document operations for streams
  • Adds comprehensive test coverage via YAML REST tests to verify these restrictions

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
RestIndexAction.java Added constructor parameters and stream parameter validation logic
RestBulkAction.java Defined allowed stream parameters and added validation tracking
TransportAbstractBulkAction.java Refactored stream validation into dedicated method
BulkRequest.java Added field to track restricted parameter usage with transport version support
IncrementalBulkService.java Updated to propagate restricted parameter usage flag
ActionModule.java Updated handler registrations to include required dependencies
TransportVersions.java Added new transport version for stream parameter restrictions
30_param_restrictions.yml Added comprehensive test cases for parameter restriction validation
20_substream_restrictions.yml Minor test optimization to avoid refresh parameter
132967.yaml Added changelog entry

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

);
}

if (e == null && bulkRequest.streamsRestrictedParamsUsed() && req.index().equals(streamType.getStreamName())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things about this:
(1) It would be useful if it also printed out which parameters were passed, because I had to debug in to see why my request had failed
(2) I think it is accidentally a little too restrictive. If I run this command:

POST logs/_doc
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}

then I get rejected. Through debugging, I discovered that that is implicitly adding pretty as a parameter. I think we ought to allow that. If I use the form of that API that passes in an ID, that also gets blocked by this. Maybe we want to prevent users from providing their own IDs, but you can still pas an ID in the bulk API, so that seems inconsistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rethinking my last comment, I think there are 3 things that need to be done:
(1) List the request params in the error message
(2) Add pretty and _id to the whitelisted params
(3) Decide if we want to prevent users from setting the ID, and if so do it at a lower level so that it's consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @masseyke :-) I have refactored this to pass the list of param names used so we get better error messages. This is at the expense of additional traffic over the wire on each bulk request line.

I also white listed pretty and id as allowed but will follow up with product when they are back on Monday if they want ID or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Product have agreed that ID should not be blocked for now so I think we're good to go on this PR from that perspective

…rror message

Also whitelists ID and pretty as allowed params
…ion' into es-11331-streams-params-restriction

# Conflicts:
#	server/src/main/java/org/elasticsearch/rest/action/document/RestBulkAction.java
@masseyke
Copy link
Member

masseyke commented Aug 26, 2025

Just to make sure we're explicit, here's where it currently stands:

Param Allowed
error_trace
filter_path
id
include_source_on_error boolean
index
list_executed_pipelines
op_type
pipeline
pretty
refresh
require_alias
require_data_stream
routing
_source
_source_excludes
_source_includes
timeout
wait_for_active_shards

Some of those seem kind of useful -- list_executed_pipelines for example.

EDIT: Updated 2025-08-24@13:47 by LW to include addition of filter_path and refresh

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The Observability team is currently setting the refresh and filter_path params when they are loading to a stream, so we might need to whitelist a few more params at some point.

@lukewhiting lukewhiting enabled auto-merge (squash) September 1, 2025 07:27
@lukewhiting lukewhiting merged commit 4042137 into elastic:main Sep 1, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants