Skip to content

Commit e2b2287

Browse files
authored
[SO Migrations] Check cluster routing allocation when creating new indices (#225965)
## Summary Resolves #222539 Add new states to check that `cluster.routing.allocation.enable` has a valid value to create a new index before trying to do so. ### Checklist Check the PR satisfies following conditions. Reviewers should verify this PR satisfies this list as well. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ### Identify risks Does this PR introduce any risks? For example, consider risks like hard to test bugs, performance regression, potential of data loss. Describe the risk, its severity, and mitigation for each identified risk. Invite stakeholders and evaluate how to proceed before merging. - [ ] [See some risk examples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx) - [ ] ...
1 parent 11bb923 commit e2b2287

File tree

8 files changed

+273
-79
lines changed

8 files changed

+273
-79
lines changed

src/core/packages/saved-objects/migration-server-internal/src/README.md

Lines changed: 96 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -4,119 +4,122 @@
44
- [INIT](#init)
55
- [Next action](#next-action)
66
- [New control state](#new-control-state)
7-
- [CREATE\_NEW\_TARGET](#create_new_target)
7+
- [CREATE\_INDEX\_CHECK\_CLUSTER\_ROUTING\_ALLOCATION](#create_index_check_cluster_routing_allocation)
88
- [Next action](#next-action-1)
99
- [New control state](#new-control-state-1)
10-
- [LEGACY\_CHECK\_CLUSTER\_ROUTING\_ALLOCATION](#legacy_check_cluster_routing_allocation)
10+
- [CREATE\_NEW\_TARGET](#create_new_target)
1111
- [Next action](#next-action-2)
1212
- [New control state](#new-control-state-2)
13-
- [LEGACY\_SET\_WRITE\_BLOCK](#legacy_set_write_block)
13+
- [LEGACY\_CHECK\_CLUSTER\_ROUTING\_ALLOCATION](#legacy_check_cluster_routing_allocation)
1414
- [Next action](#next-action-3)
1515
- [New control state](#new-control-state-3)
16-
- [LEGACY\_CREATE\_REINDEX\_TARGET](#legacy_create_reindex_target)
16+
- [LEGACY\_SET\_WRITE\_BLOCK](#legacy_set_write_block)
1717
- [Next action](#next-action-4)
1818
- [New control state](#new-control-state-4)
19-
- [LEGACY\_REINDEX](#legacy_reindex)
19+
- [LEGACY\_CREATE\_REINDEX\_TARGET](#legacy_create_reindex_target)
2020
- [Next action](#next-action-5)
2121
- [New control state](#new-control-state-5)
22-
- [LEGACY\_REINDEX\_WAIT\_FOR\_TASK](#legacy_reindex_wait_for_task)
22+
- [LEGACY\_REINDEX](#legacy_reindex)
2323
- [Next action](#next-action-6)
2424
- [New control state](#new-control-state-6)
25-
- [LEGACY\_DELETE](#legacy_delete)
25+
- [LEGACY\_REINDEX\_WAIT\_FOR\_TASK](#legacy_reindex_wait_for_task)
2626
- [Next action](#next-action-7)
2727
- [New control state](#new-control-state-7)
28-
- [WAIT\_FOR\_MIGRATION\_COMPLETION](#wait_for_migration_completion)
28+
- [LEGACY\_DELETE](#legacy_delete)
2929
- [Next action](#next-action-8)
3030
- [New control state](#new-control-state-8)
31-
- [WAIT\_FOR\_YELLOW\_SOURCE](#wait_for_yellow_source)
31+
- [WAIT\_FOR\_MIGRATION\_COMPLETION](#wait_for_migration_completion)
3232
- [Next action](#next-action-9)
3333
- [New control state](#new-control-state-9)
34-
- [UPDATE\_SOURCE\_MAPPINGS\_PROPERTIES](#update_source_mappings_properties)
34+
- [WAIT\_FOR\_YELLOW\_SOURCE](#wait_for_yellow_source)
3535
- [Next action](#next-action-10)
3636
- [New control state](#new-control-state-10)
37-
- [CLEANUP\_UNKNOWN\_AND\_EXCLUDED](#cleanup_unknown_and_excluded)
37+
- [UPDATE\_SOURCE\_MAPPINGS\_PROPERTIES](#update_source_mappings_properties)
3838
- [Next action](#next-action-11)
3939
- [New control state](#new-control-state-11)
40-
- [CLEANUP\_UNKNOWN\_AND\_EXCLUDED\_WAIT\_FOR\_TASK](#cleanup_unknown_and_excluded_wait_for_task)
40+
- [CLEANUP\_UNKNOWN\_AND\_EXCLUDED](#cleanup_unknown_and_excluded)
4141
- [Next action](#next-action-12)
4242
- [New control state](#new-control-state-12)
43-
- [PREPARE\_COMPATIBLE\_MIGRATION](#prepare_compatible_migration)
43+
- [CLEANUP\_UNKNOWN\_AND\_EXCLUDED\_WAIT\_FOR\_TASK](#cleanup_unknown_and_excluded_wait_for_task)
4444
- [Next action](#next-action-13)
4545
- [New control state](#new-control-state-13)
46-
- [REFRESH\_SOURCE](#refresh_source)
46+
- [PREPARE\_COMPATIBLE\_MIGRATION](#prepare_compatible_migration)
4747
- [Next action](#next-action-14)
4848
- [New control state](#new-control-state-14)
49-
- [CHECK\_CLUSTER\_ROUTING\_ALLOCATION](#check_cluster_routing_allocation)
49+
- [REFRESH\_SOURCE](#refresh_source)
5050
- [Next action](#next-action-15)
5151
- [New control state](#new-control-state-15)
52-
- [CHECK\_UNKNOWN\_DOCUMENTS](#check_unknown_documents)
52+
- [REINDEX\_CHECK\_CLUSTER\_ROUTING\_ALLOCATION](#reindex_check_cluster_routing_allocation)
5353
- [Next action](#next-action-16)
54-
- [SET\_SOURCE\_WRITE\_BLOCK](#set_source_write_block)
55-
- [Next action](#next-action-17)
5654
- [New control state](#new-control-state-16)
57-
- [CREATE\_REINDEX\_TEMP](#create_reindex_temp)
55+
- [CHECK\_UNKNOWN\_DOCUMENTS](#check_unknown_documents)
56+
- [Next action](#next-action-17)
57+
- [SET\_SOURCE\_WRITE\_BLOCK](#set_source_write_block)
5858
- [Next action](#next-action-18)
5959
- [New control state](#new-control-state-17)
60-
- [REINDEX\_SOURCE\_TO\_TEMP\_OPEN\_PIT](#reindex_source_to_temp_open_pit)
61-
- [Next action](#next-action-19)
62-
- [New control state](#new-control-state-18)
63-
- [REINDEX\_SOURCE\_TO\_TEMP\_READ](#reindex_source_to_temp_read)
60+
- [RELOCATE\_CHECK\_CLUSTER\_ROUTING\_ALLOCATION](#relocate_check_cluster_routing_allocation)
6461
- [Next action](#next-action-20)
6562
- [New control state](#new-control-state-19)
66-
- [REINDEX\_SOURCE\_TO\_TEMP\_TRANSFORM](#reindex_source_to_temp_transform)
63+
- [REINDEX\_SOURCE\_TO\_TEMP\_OPEN\_PIT](#reindex_source_to_temp_open_pit)
6764
- [Next action](#next-action-21)
6865
- [New control state](#new-control-state-20)
69-
- [REINDEX\_SOURCE\_TO\_TEMP\_INDEX\_BULK](#reindex_source_to_temp_index_bulk)
66+
- [REINDEX\_SOURCE\_TO\_TEMP\_READ](#reindex_source_to_temp_read)
7067
- [Next action](#next-action-22)
7168
- [New control state](#new-control-state-21)
72-
- [REINDEX\_SOURCE\_TO\_TEMP\_CLOSE\_PIT](#reindex_source_to_temp_close_pit)
69+
- [REINDEX\_SOURCE\_TO\_TEMP\_TRANSFORM](#reindex_source_to_temp_transform)
7370
- [Next action](#next-action-23)
7471
- [New control state](#new-control-state-22)
75-
- [SET\_TEMP\_WRITE\_BLOCK](#set_temp_write_block)
72+
- [REINDEX\_SOURCE\_TO\_TEMP\_INDEX\_BULK](#reindex_source_to_temp_index_bulk)
7673
- [Next action](#next-action-24)
7774
- [New control state](#new-control-state-23)
78-
- [CLONE\_TEMP\_TO\_TARGET](#clone_temp_to_target)
75+
- [REINDEX\_SOURCE\_TO\_TEMP\_CLOSE\_PIT](#reindex_source_to_temp_close_pit)
7976
- [Next action](#next-action-25)
8077
- [New control state](#new-control-state-24)
81-
- [REFRESH\_TARGET](#refresh_target)
78+
- [SET\_TEMP\_WRITE\_BLOCK](#set_temp_write_block)
8279
- [Next action](#next-action-26)
8380
- [New control state](#new-control-state-25)
84-
- [OUTDATED\_DOCUMENTS\_SEARCH\_OPEN\_PIT](#outdated_documents_search_open_pit)
81+
- [CLONE\_TEMP\_TO\_TARGET](#clone_temp_to_target)
8582
- [Next action](#next-action-27)
8683
- [New control state](#new-control-state-26)
87-
- [OUTDATED\_DOCUMENTS\_SEARCH\_READ](#outdated_documents_search_read)
84+
- [REFRESH\_TARGET](#refresh_target)
8885
- [Next action](#next-action-28)
8986
- [New control state](#new-control-state-27)
90-
- [OUTDATED\_DOCUMENTS\_TRANSFORM](#outdated_documents_transform)
87+
- [OUTDATED\_DOCUMENTS\_SEARCH\_OPEN\_PIT](#outdated_documents_search_open_pit)
9188
- [Next action](#next-action-29)
9289
- [New control state](#new-control-state-28)
93-
- [TRANSFORMED\_DOCUMENTS\_BULK\_INDEX](#transformed_documents_bulk_index)
90+
- [OUTDATED\_DOCUMENTS\_SEARCH\_READ](#outdated_documents_search_read)
9491
- [Next action](#next-action-30)
9592
- [New control state](#new-control-state-29)
96-
- [OUTDATED\_DOCUMENTS\_SEARCH\_CLOSE\_PIT](#outdated_documents_search_close_pit)
93+
- [OUTDATED\_DOCUMENTS\_TRANSFORM](#outdated_documents_transform)
9794
- [Next action](#next-action-31)
9895
- [New control state](#new-control-state-30)
99-
- [OUTDATED\_DOCUMENTS\_REFRESH](#outdated_documents_refresh)
96+
- [TRANSFORMED\_DOCUMENTS\_BULK\_INDEX](#transformed_documents_bulk_index)
10097
- [Next action](#next-action-32)
10198
- [New control state](#new-control-state-31)
102-
- [CHECK\_TARGET\_MAPPINGS](#check_target_mappings)
99+
- [OUTDATED\_DOCUMENTS\_SEARCH\_CLOSE\_PIT](#outdated_documents_search_close_pit)
103100
- [Next action](#next-action-33)
104101
- [New control state](#new-control-state-32)
105-
- [UPDATE\_TARGET\_MAPPINGS\_PROPERTIES](#update_target_mappings_properties)
102+
- [OUTDATED\_DOCUMENTS\_REFRESH](#outdated_documents_refresh)
106103
- [Next action](#next-action-34)
107104
- [New control state](#new-control-state-33)
108-
- [UPDATE\_TARGET\_MAPPINGS\_PROPERTIES\_WAIT\_FOR\_TASK](#update_target_mappings_properties_wait_for_task)
105+
- [CHECK\_TARGET\_MAPPINGS](#check_target_mappings)
109106
- [Next action](#next-action-35)
110107
- [New control state](#new-control-state-34)
111-
- [CHECK\_VERSION\_INDEX\_READY\_ACTIONS](#check_version_index_ready_actions)
108+
- [UPDATE\_TARGET\_MAPPINGS\_PROPERTIES](#update_target_mappings_properties)
112109
- [Next action](#next-action-36)
113110
- [New control state](#new-control-state-35)
114-
- [MARK\_VERSION\_INDEX\_READY](#mark_version_index_ready)
111+
- [UPDATE\_TARGET\_MAPPINGS\_PROPERTIES\_WAIT\_FOR\_TASK](#update_target_mappings_properties_wait_for_task)
115112
- [Next action](#next-action-37)
116113
- [New control state](#new-control-state-36)
117-
- [MARK\_VERSION\_INDEX\_READY\_CONFLICT](#mark_version_index_ready_conflict)
114+
- [CHECK\_VERSION\_INDEX\_READY\_ACTIONS](#check_version_index_ready_actions)
118115
- [Next action](#next-action-38)
119116
- [New control state](#new-control-state-37)
117+
- [MARK\_VERSION\_INDEX\_READY](#mark_version_index_ready)
118+
- [Next action](#next-action-39)
119+
- [New control state](#new-control-state-38)
120+
- [MARK\_VERSION\_INDEX\_READY\_CONFLICT](#mark_version_index_ready_conflict)
121+
- [Next action](#next-action-40)
122+
- [New control state](#new-control-state-39)
120123
- [FATAL](#fatal)
121124
- [DONE](#done)
122125
- [Manual QA Test Plan](#manual-qa-test-plan)
@@ -225,11 +228,36 @@ and the migration source index is the index the `.kibana` alias points to.
225228

226229
[LEGACY_SET_WRITE_BLOCK](#legacy_set_write_block)
227230

228-
6. If there are no `.kibana` indices, this is a fresh deployment. Initialize a
229-
new saved objects index
231+
6. If there are no `.kibana` indices, this is a fresh deployment. Check cluster routing allocation and
232+
initialize a new saved objects index
233+
234+
[CREATE_INDEX_CHECK_CLUSTER_ROUTING_ALLOCATION](#create_index_check_cluster_routing_allocation)
235+
236+
7. If there is a new indices migrators (e.g. .kibana_alerting_cases). Check cluster routing allocation
237+
and reindex (this is dead code and should be removed)
238+
239+
## CREATE_INDEX_CHECK_CLUSTER_ROUTING_ALLOCATION
240+
241+
### Next action
242+
243+
`checkClusterRoutingAllocationEnabled`
244+
245+
Check that replica allocation is enabled from cluster settings (`cluster.routing.allocation.enabled`). Migrations will fail when replica allocation is disabled during the bulk index operation that waits for all active shards. Migrations wait for all active shards to ensure that saved objects are replicated to protect against data loss.
246+
247+
The Elasticsearch documentation mentions switching off replica allocation when restoring a cluster and this is a setting that might be overlooked when a restore is done. Migrations will fail early if replica allocation is incorrectly set to avoid adding a write block to the old index before running into a failure later.
248+
249+
If replica allocation is set to 'all', the migration continues to fetch the saved object indices.
250+
251+
### New control state
252+
253+
1. If `cluster.routing.allocation.enabled` has a compatible value.
230254

231255
[CREATE_NEW_TARGET](#create_new_target)
232256

257+
2. If it has a value that will not allow creating new *saved object* indices.
258+
259+
[CREATE_INDEX_CHECK_CLUSTER_ROUTING_ALLOCATION](#create_index_check_cluster_routing_allocation)
260+
233261
## CREATE_NEW_TARGET
234262

235263
### Next action
@@ -428,7 +456,7 @@ The latter usually happens when a new plugin is enabled that brings some incompa
428456

429457
3. If the mappings are not updated due to incompatible changes and the migration is still in progress.
430458

431-
[CHECK_CLUSTER_ROUTING_ALLOCATION](#check_cluster_routing_allocation)
459+
[REINDEX_CHECK_CLUSTER_ROUTING_ALLOCATION](#reindex_check_cluster_routing_allocation)
432460

433461
4. If the mappings are not updated due to incompatible changes and the migration is already completed.
434462

@@ -523,7 +551,7 @@ We are performing a *compatible migration*, and we discarded some unknown and ex
523551

524552
[FATAL](#fatal)
525553

526-
## CHECK_CLUSTER_ROUTING_ALLOCATION
554+
## REINDEX_CHECK_CLUSTER_ROUTING_ALLOCATION
527555

528556
### Next action
529557

@@ -549,7 +577,7 @@ The check only considers persistent and transient settings and does not take sta
549577

550578
2. If it has a value that will not allow creating new *saved object* indices.
551579

552-
[CHECK_CLUSTER_ROUTING_ALLOCATION](#check_cluster_routing_allocation) (retry)
580+
[REINDEX_CHECK_CLUSTER_ROUTING_ALLOCATION](#reindex_check_cluster_routing_allocation) (retry)
553581

554582
## CHECK_UNKNOWN_DOCUMENTS
555583

@@ -579,6 +607,28 @@ Set a write block on the source index to prevent any older Kibana instances from
579607

580608
[CREATE_REINDEX_TEMP](#create_reindex_temp)
581609

610+
## RELOCATE_CHECK_CLUSTER_ROUTING_ALLOCATION
611+
612+
### Next action
613+
614+
`checkClusterRoutingAllocationEnabled`
615+
616+
Check that replica allocation is enabled from cluster settings (`cluster.routing.allocation.enabled`). Migrations will fail when replica allocation is disabled during the bulk index operation that waits for all active shards. Migrations wait for all active shards to ensure that saved objects are replicated to protect against data loss.
617+
618+
The Elasticsearch documentation mentions switching off replica allocation when restoring a cluster and this is a setting that might be overlooked when a restore is done. Migrations will fail early if replica allocation is incorrectly set to avoid adding a write block to the old index before running into a failure later.
619+
620+
If replica allocation is set to 'all', the migration continues to fetch the saved object indices.
621+
622+
### New control state
623+
624+
1. If `cluster.routing.allocation.enabled` has a compatible value.
625+
626+
[CREATE_REINDEX_TEMP](#create_reindex_temp)
627+
628+
2. If it has a value that will not allow creating new *saved object* indices.
629+
630+
[RELOCATE_CHECK_CLUSTER_ROUTING_ALLOCATION](#relocate_check_cluster_routing_allocation)
631+
582632
## CREATE_REINDEX_TEMP
583633

584634
### Next action

0 commit comments

Comments
 (0)