Skip to content

Conversation

lukewhiting
Copy link
Contributor

@lukewhiting lukewhiting commented Oct 11, 2024

Fixes ES#111893 - Data stream restoration/promotion without a matching template

This PR adds warnings headers and log messages for the following conditions:

  • Promoting a CCR replicated data stream wihout a matching index template for the data stream
  • Restoring a snapshot of a data stream without a matching index template for the data stream

In both these senarios, the action will succede however future rollovers of the data stream will fail due to the missing templates. By adding a warning header and log, we make the user explicitly aware of this pending failure.

Documentation for both these actions has also been updated to warn of this scenario

@lukewhiting lukewhiting added >bug >docs General docs changes :Data Management/Data streams Data streams and their lifecycles auto-backport Automatically create backport pull requests when merged low-risk An open issue or test failure that is a low risk to future releases v8.16.0 v9.0.0 labels Oct 11, 2024
@lukewhiting lukewhiting requested a review from a team October 11, 2024 10:16
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team labels Oct 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@lukewhiting
Copy link
Contributor Author

es-docs people, Do you need pinging for small changes like this or should I omit the >docs label for minor changes?

@shainaraskas
Copy link
Contributor

shainaraskas commented Oct 11, 2024

@lukewhiting you don't have to wait for our reviews for small things like these, but keeping the label is a good idea. we don't have the capacity to review every PR, but we try to keep an eye out and provide feedback when possible.

this PR incidentally LGTM

@gmarouli gmarouli self-requested a review October 14, 2024 07:19
Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @lukewhiting , I learnt a lot reviewing this code. It's touching some parts I am unfamiliar with. I added some comments and some questions that will help me understand the code better as well.

Comment on lines 517 to 530
var templatePatterns = clusterService.state()
.metadata()
.templatesV2()
.values()
.stream()
.filter(cit -> cit.getDataStreamTemplate() != null)
.flatMap(cit -> cit.indexPatterns().stream())
.collect(Collectors.toSet());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this code checks only the current cluster state for a matching template while the template could be restored as part of the snapshot. Should we consider checking the metadataBuilder or the globalMetadata, that are set here:

if (request.includeGlobalState()) {
globalMetadata = repository.getSnapshotGlobalMetadata(snapshotId);
metadataBuilder = Metadata.builder(globalMetadata);

This should be verified, I haven't tested it, but I think it might be a valid scenario.

Copy link
Contributor Author

@lukewhiting lukewhiting Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right... There might be a few failed rollover attempts while the restore is in progress (if the DS gets restored before the templates) but I believe it would rollover successfully in the end so it's a valid scenario and should not warn.

I'll get that logic in and add some tests for these senario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic and tests added however the test is affected by #107515 so I have had to mute it :-/ I did some manual runs of the test with org/elasticsearch/test/ESIntegTestCase.java:1305 commented out to prevent the cluster state size mismatch failing the test and it all passed.

Copy link
Contributor

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I did some minor comments, but apart from these, it's good :) Thank you @lukewhiting 🚀

@lukewhiting lukewhiting removed auto-backport Automatically create backport pull requests when merged v8.16.0 labels Oct 15, 2024
to data stream promotion endpoint
Add a new assertion to synchronously execute a request and check the
response contains a specific warning header
This checks if the snapshot contains a matching template for the DS
@lukewhiting lukewhiting force-pushed the 111893-ds-restore-template-warning branch from 2ab424c to 5bcfb5e Compare October 15, 2024 12:50
@lukewhiting lukewhiting merged commit 37f03dc into elastic:main Oct 15, 2024
15 checks passed
@lukewhiting lukewhiting deleted the 111893-ds-restore-template-warning branch October 15, 2024 14:00
kderusso pushed a commit to kderusso/elasticsearch that referenced this pull request Oct 15, 2024
* Add data stream template validation

to snapshot restore

* Add data stream template validation

to data stream promotion endpoint

* Add new assertion for response headers

Add a new assertion to synchronously execute a request and check the
response contains a specific warning header

* Test for warning header on snapshot restore

When missing templates

* Test for promotion warnings

* Add documentation for the potential error states

* PR changes

* Spotless reformatting

* Add logic to look in snapshot global metadata

This checks if the snapshot contains a matching template for the DS

* Comment on test cleanup to explain it was copied

* Removed cluster service field
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
* Add data stream template validation

to snapshot restore

* Add data stream template validation

to data stream promotion endpoint

* Add new assertion for response headers

Add a new assertion to synchronously execute a request and check the
response contains a specific warning header

* Test for warning header on snapshot restore

When missing templates

* Test for promotion warnings

* Add documentation for the potential error states

* PR changes

* Spotless reformatting

* Add logic to look in snapshot global metadata

This checks if the snapshot contains a matching template for the DS

* Comment on test cleanup to explain it was copied

* Removed cluster service field
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Data Management/Data streams Data streams and their lifecycles >docs General docs changes low-risk An open issue or test failure that is a low risk to future releases Team:Data Management Meta label for data/management team Team:Docs Meta label for docs team v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data stream restoration/promotion without a matching template

4 participants