Skip to content

Commit 142e2bd

Browse files
authored
Add javadocs describing features functionality (elastic#120359) (elastic#120379)
1 parent 1c35a3b commit 142e2bd

File tree

1 file changed

+151
-0
lines changed

1 file changed

+151
-0
lines changed
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the "Elastic License
4+
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
5+
* Public License v 1"; you may not use this file except in compliance with, at
6+
* your election, the "Elastic License 2.0", the "GNU Affero General Public
7+
* License v3.0 only", or the "Server Side Public License, v 1".
8+
*/
9+
10+
/**
11+
* The features infrastructure in Elasticsearch is responsible for two things:
12+
* <ol>
13+
* <li>
14+
* Determining when all nodes in a cluster have been upgraded to support some new functionality.
15+
* This is used to only utilise new behavior when all nodes in the cluster support it.
16+
* </li>
17+
* <li>
18+
* Ensuring nodes only join a cluster if they support all features already present on that cluster.
19+
* This is to ensure that once a cluster supports a feature, it then never drops support.
20+
* Conversely, when a feature is defined, it can then never be removed (but see Assumed features below).
21+
* </li>
22+
* </ol>
23+
*
24+
* <h2>Functionality</h2>
25+
* This functionality starts with {@link org.elasticsearch.features.NodeFeature}. This is a single id representing
26+
* new or a change in functionality - exactly what functionality that feature represents is up to the developer. These are expected
27+
* to be {@code public static final} variables on a relevant class. Each area of code then exposes their features
28+
* through an implementation of {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, registered as an SPI implementation.
29+
* <p>
30+
* All the features exposed by a node are included in the {@link org.elasticsearch.cluster.coordination.JoinTask.NodeJoinTask} information
31+
* processed by {@link org.elasticsearch.cluster.coordination.NodeJoinExecutor}, when a node attempts to join a cluster. This checks
32+
* the joining node has all the features already present on the cluster, and then records the set of features against that node
33+
* in cluster state (in the {@link org.elasticsearch.cluster.ClusterFeatures} object).
34+
* The calculated effective cluster features are not persisted, only the per-node feature set.
35+
* <p>
36+
* Informally, the features supported by a particular node are 'node features'; when all nodes in a cluster support a particular
37+
* feature, that is then a 'cluster feature'.
38+
* <p>
39+
* Node features can then be checked by code to determine if all nodes in the cluster support that particular feature.
40+
* This is done using {@link org.elasticsearch.features.FeatureService#clusterHasFeature}. This is a fast operation - the first
41+
* time this method is called on a particular cluster state, the cluster features for a cluster are calculated from all the
42+
* node feature information, and cached in the {@link org.elasticsearch.cluster.ClusterFeatures} object.
43+
* Henceforth, all cluster feature checks are fast hash set lookups, at least until the nodes or master changes.
44+
*
45+
* <h2>Features test infrastructure</h2>
46+
* Features can be specified as conditions in YAML tests, as well as checks and conditions in code-defined rolling upgrade tests
47+
* (see the Elasticsearch development documentation for more information).
48+
* These checks are performed by the {@code TestFeatureService} interface, and its standard implementation {@code ESRestTestFeatureService}.
49+
*
50+
* <h3>Test features</h3>
51+
* Sometimes, you want to define a feature for nodes, but the only checks you need to do are as part of a test. In this case,
52+
* the feature doesn't need to be included in the production feature set, it only needs to be present for automated tests.
53+
* So alongside {@link org.elasticsearch.features.FeatureSpecification#getFeatures}, there is
54+
* {@link org.elasticsearch.features.FeatureSpecification#getTestFeatures}. This can be used to exposed node features,
55+
* but only for automated tests. It is ignored in production uses. This is determined by the {@link org.elasticsearch.features.FeatureData}
56+
* class, which uses a system property (set by the test infrastructure) to decide whether to include test features or not,
57+
* when gathering all the registered {@code FeatureSpecification} instances.
58+
* <p>
59+
* Test features can be removed at-will (with appropriate backports),
60+
* as there is no long-term upgrade guarantees required for clusters in automated tests.
61+
*
62+
* <h3>Synthetic version features</h3>
63+
* Cluster functionality checks performed on code built from the {@code main} branch can only use features to check functionality,
64+
* but we also have branch releases with a longer release cadence. Sometimes tests need to be conditional on older versions
65+
* (where there isn't a feature already defined in the right place), determined some point after the release has been finalized.
66+
* This is where synthetic version features comes in. These can be used in tests where it is sensible to use
67+
* a release version number (eg 8.12.3). The presence of these features is determined solely by the minimum
68+
* node version present in the test cluster; no actual cluster features are defined nor checked.
69+
* This is done by {@code ESRestTestFeatureService}, matching on features of the form {@code gte_v8.12.3}.
70+
* For more information on their use, see the Elasticsearch developer documentation.
71+
*
72+
* <h2>Assumed features</h2>
73+
* Once a feature is defined on a cluster, it can never be removed - this is to ensure that functionality that is available
74+
* on a cluster then never stops being available. However, this can lead to the list of features in cluster state growing ever larger.
75+
* It is possible to remove defined cluster features, but only on a compatibility boundary (normally a new major release).
76+
* To see how this can be so, it may be helpful to start with the compatibility guarantees we provide:
77+
* <ul>
78+
* <li>
79+
* The first version of a new major (eg v9.0) can only form a cluster with the highest minor
80+
* of the previous major (eg v8.18).
81+
* </li>
82+
* <li>
83+
* This means that any cluster feature that was added <em>before</em> 8.18.0 was cut will <em>always</em> be present
84+
* on any cluster that has at least one v9 node in it (as we don't support mixed-version clusters of more than two versions)
85+
* </li>
86+
* <li>
87+
* This means that the code checks for those features can be completely removed from the code in v9,
88+
* and the new behavior used all the time.
89+
* </li>
90+
* <li>
91+
* This means that the node features themselves are not required, as they are never checked in the v9 codebase.
92+
* </li>
93+
* </ul>
94+
* So, starting up a fresh v9 cluster, it does not need to have any knowledge of features added before 8.18, as the cluster
95+
* will always have the new functionality.
96+
* <p>
97+
* So then how do we do a rolling upgrade from 8.18 to 9.0, if features have been removed? Normally, that would prevent a 9.0
98+
* node from joining an 8.18 cluster, as it will not have all the required features published. However, we can make use
99+
* of the major version difference to allow the rolling upgrade to proceed.
100+
* <p>
101+
* This is where the {@link org.elasticsearch.features.NodeFeature#assumedAfterNextCompatibilityBoundary()} field comes in. On 8.18,
102+
* we can mark all the features that will be removed in 9.0 as assumed. This means that when the features infrastructure sees a
103+
* 9.x node, it will deem that node to have all the assumed features, even if the 9.0 node doesn't actually have those features
104+
* in its published set. It will allow 9.0 nodes to join the cluster missing assumed features,
105+
* and it will say the cluster supports a particular assumed feature even if it is missing from any 9.0 nodes in the cluster.
106+
* <p>
107+
* Essentially, 8.18 nodes (or any other version that can form a cluster with 8.x or 9.x nodes) can mediate
108+
* between the 8.x and 9.x feature sets, using {@code assumedAfterNextCompatibilityBoundary}
109+
* to mark features that have been removed from 9.x, and know that 9.x nodes still meet the requirements for those features.
110+
* These assumed features need to be defined before 8.18 and 9.0 are released.
111+
* <p>
112+
* To go into more detail what happens during a rolling upgrade:
113+
* <ol>
114+
* <li>Start with a homogenous 8.18 cluster, with an 8.18 cluster feature set (including assumed features)</li>
115+
* <li>
116+
* The first 9.0 node joins the cluster. Even though it is missing the features marked as assumed in 8.18,
117+
* the 8.18 master lets the 9.0 node join because all the missing features are marked as assumed,
118+
* and it is of the next major version.
119+
* </li>
120+
* <li>
121+
* At this point, any feature checks that happen on 8.18 nodes for assumed features pass, despite the 9.0 node
122+
* not publishing those features, as the 9.0 node is assumed to meet the requirements for that feature.
123+
* 9.0 nodes do not have those checks at all, and the corresponding code running on 9.0 uses the new behaviour without checking.
124+
* </li>
125+
* <li>More 8.18 nodes get swapped for 9.0 nodes</li>
126+
* <li>
127+
* At some point, the master will change from an 8.18 node to a 9.0 node. The 9.0 node does not have the assumed
128+
* features at all, so the new cluster feature set as calculated by the 9.0 master will only contain the features
129+
* that 9.0 knows about (the calculated feature set is not persisted anywhere).
130+
* The cluster has effectively dropped all the 8.18 features assumed in 9.0, whilst maintaining all behaviour.
131+
* The upgrade carries on.
132+
* </li>
133+
* <li>
134+
* If an 8.18 node were to quit and re-join the cluster still as 8.18 at this point
135+
* (and there are other 8.18 nodes not yet upgraded), it will be able to join the cluster despite the master being 9.0.
136+
* The 8.18 node publishes all the assumed features that 9.0 does not have - but that doesn't matter, because nodes can join
137+
* with more features than are present in the cluster as a whole. The additional features are not added
138+
* to the cluster feature set because not all the nodes in the cluster have those features
139+
* (as there is at least one 9.0 node in the cluster - itself).
140+
* </li>
141+
* <li>
142+
* At some point, the last 8.18 node leaves the cluster, and the cluster is a homogenous 9.0 cluster
143+
* with only the cluster features known about by 9.0.
144+
* </li>
145+
* </ol>
146+
*
147+
* For any dynamic releases that occur from main, the cadence is much quicker - once a feature is present in a cluster,
148+
* you then only need one completed release to mark a feature as assumed, and a subsequent release to remove it from the codebase
149+
* and elide the corresponding check.
150+
*/
151+
package org.elasticsearch.features;

0 commit comments

Comments
 (0)