-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Restrict Indexing To Child Streams When Streams Is Enabled #132011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict Indexing To Child Streams When Streams Is Enabled #132011
Conversation
|
Hi @lukewhiting, I've created a changelog YAML for you. |
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Requested review from @jbaiera as this touches the failure store |
…serving ingest time taken
… to not depend on same instance assertions This prevents issues when wrapping responses during ingest
… is enabled in the cluster
5936fd2 to
1a861cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess don't have a lot of context on a more higher-level to approve, but code lgtm!
left a few small suggestions and Qs for learning
...ams/src/yamlRestTest/resources/rest-api-spec/test/streams/logs/20_substream_restrictions.yml
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/BulkResponse.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Szymon Bialkowski <[email protected]>
Co-authored-by: Szymon Bialkowski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements restrictions on direct indexing to child streams when streams mode is enabled, specifically preventing writes to indices matching the logs.* pattern while allowing operations through the parent logs stream.
- Adds validation logic to prevent direct writes to child streams via bulk operations and single document indexing
- Introduces pipeline-level validation to prevent rerouting documents to child streams through ingest processors
- Allows delete operations on child streams while blocking create/update operations
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
StreamType.java |
New enum defining stream types with validation methods for enabled streams and child stream matching |
TransportAbstractBulkAction.java |
Adds pre-pipeline validation to reject direct writes to child streams |
IngestService.java |
Implements pipeline validation to prevent rerouting documents to child streams |
BulkRequestModifier.java |
Refactors listener wrapping methods to support flexible ingest time calculation |
BulkResponse.java |
Adds equals and hashCode methods for proper response comparison |
TransportBulkActionIngestTests.java |
Updates test to use concrete BulkResponse instead of mock for equality testing |
20_substream_restrictions.yml |
Comprehensive integration tests covering various restriction scenarios |
StreamsYamlTestSuiteIT.java |
Adds required modules for integration testing |
build.gradle |
Includes additional REST API endpoints and test dependencies |
server/src/main/java/org/elasticsearch/ingest/IngestService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/ingest/IngestService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
but someone more experienced should have the final approve 🚀
|
Hi @lukewhiting, I've created a changelog YAML for you. |
|
|
||
| for (StreamType streamType : StreamType.getEnabledStreamTypesForProject(project)) { | ||
| if (streamType.matchesStreamPrefix(newIndex) | ||
| && ingestDocument.getIndexHistory().contains(streamType.getStreamName()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would let me through if I rerouted from say logs to logs.abc.def. That is, it allows me to write to any descendent stream, not just a direct child stream. I assume that's OK, right? The real goal is to prevent things from outside of the stream writing to child streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct but yeah, I don't think it's in scope to enforce the hierarchy in ES. At least not at this stage.
server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments, but LGTM.
…port-changes' into es-11941-streams-logs-bulk-transport-changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. There's a couple easy to miss things that might need to be addressed for failure store, and I left a few small questions and suggestions but otherwise it's looking good. Marking as approved for once the important things are addressed.
| BulkRequestModifier bulkRequestModifier = new BulkRequestModifier(bulkRequest); | ||
|
|
||
| for (StreamType streamType : StreamType.getEnabledStreamTypesForProject(projectMetadata)) { | ||
| for (int i = 0; i < bulkRequest.requests.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're iterating using the locally available bulk request, but you're dereferencing the bulk request from the request modifier to get the documents. This assumes that the request modifier will never make any changes to its internal state. I think we can avoid that kind of snag if we iterate over the request items and check each stream type per document instead of iterating over the request items multiple times for each eventual stream type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around where we use bulk request modifier in other places and it's actually an iterable itself. In the IngestService we just iterate over it like a regular iterable, and maintain a slot counter separately. Might read more clearly if we do that here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have switched this to use the bulk modifier as an iterator and moved the stream types to be the inner iterator so best of both worlds :-)
server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/TransportAbstractBulkAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/bulk/BulkRequestModifier.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public boolean equals(Object o) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we adding this for completeness sake or are we using it somewhere? Checking bulk response equality seems like something that could be unintentionally expensive for a large or complicated request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh this was needed to fix a unit test which relied on a sameInstance assertion which became invalid after we started wrapping everything at the higher level. However it's now no longer required with the short circuit wrapping logic added here: #132011 (comment) so have reverted the change as it goes back to being the same instance if wrapped with no modifications.
| "Pipelines can't re-route documents to child streams, but pipeline [%s] tried to reroute " | ||
| + "this document from index [%s] to index [%s]. Reroute history: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the term "reroute" - That reads to me to mean using the reroute processor, but I think you can get to here via any method of changing the index name. Also maybe we should elaborate on what we mean by child stream.
A rough suggestion like
| "Pipelines can't re-route documents to child streams, but pipeline [%s] tried to reroute " | |
| + "this document from index [%s] to index [%s]. Reroute history: %s", | |
| "pipeline [%s] can't change the target index (from [%s] to [%s] child stream [%s]) for document [%s]. History: [%s]", |
Or along those lines? e.g. (from [my-index-name] to [logs] child stream [logs.nginx.prod])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the message although omitted the document as this error is rendered in line with the document ID / slot.
server/src/main/java/org/elasticsearch/ingest/IngestService.java
Outdated
Show resolved
Hide resolved
…-streams-logs-bulk-transport-changes
This PR prevents indexing into child streams when streams mode is enabled.
In this case, we define a child stream as any index matching
logs.*and restrictions apply both to direct indexing via put to index or bulk along with indirect indexing attempts via pipelines using reroute, script or other processors that change the target index or routing.Deletes are still permitted from these child streams but updates such as
_query_by_updatewill be prevented.Example
Input
logsredirects tologs.abc.defvia reroute processor on default pipelinebad-indexredirects tologs.abcvia a script processor changingctx._indexPUT {{host}}/_bulk Content-Type: application/json { "create":{"_index": "logs" } } { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg1.jpg HTTP/1.0\" 200 24736" } { "create":{"_index": "logs.abc" } } { "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" } { "create":{"_index": "bad-index" } } { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }Output
{ "errors": true, "took": 200, "ingest_took": 0, "items": [ { "create": { "_index": "logs.abc.def", "_id": "wmsjUZgBpF-FKxj59Ma4", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1, "status": 201 } }, { "create": { "_index": "logs.abc", "_id": "auto-generated", "status": 400, "failure_store": "not_enabled", "error": { "type": "illegal_argument_exception", "reason": "Direct writes to child streams are prohibited. Index directly into the [logs] stream instead" } } }, { "create": { "_index": "logs.abc", "_id": "auto-generated", "status": 400, "error": { "type": "illegal_argument_exception", "reason": "Pipelines can't re-route documents to child streams, but pipeline [pipeline1] tried to reroute this document from index [bad-index] to index [logs.abc]. Reroute history: bad-index" } } } ] }Fixes ES-11941