Skip to content

Commit ae6266d

Browse files
Rolling HDFS upgrade (#571)
* Add upgrade mode with serialized deployments * Use deployedProductVersion to decide upgrade mode (but do not automatically advance it) * Upgrade docs * Remove dummy log message * Move upgrade readiness check into utils module * Fix test build issue * Regenerate CRDs * Docs * s/terminal/shell/g * Update rust/operator-binary/src/hdfs_controller.rs Co-authored-by: Nick <[email protected]> * Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc Co-authored-by: Nick <[email protected]> * Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc Co-authored-by: Nick <[email protected]> * Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc Co-authored-by: Nick <[email protected]> * Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc Co-authored-by: Nick <[email protected]> * Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc Co-authored-by: Nick <[email protected]> * Move upgrade_args to a separate variable * Upgrade mode -> compatibility mode * Move rollout tracker into operator-rs * Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc Co-authored-by: Nick <[email protected]> * Add note on downgrades * Perform downgrades in order * Add note about status subresource * Update CRDs * s/upgrading_product_version/upgrade_target_product_version/g * Switch to main operator-rs * Update rust/crd/src/lib.rs Co-authored-by: Nick <[email protected]> * Add guardrail against trying to crossgrade in the middle of another upgrade --------- Co-authored-by: Nick <[email protected]>
1 parent 4b61d28 commit ae6266d

File tree

7 files changed

+296
-32
lines changed

7 files changed

+296
-32
lines changed

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,6 @@ tokio = { version = "1.39", features = ["full"] }
2828
tracing = "0.1"
2929
tracing-futures = { version = "0.2", features = ["futures-03"] }
3030

31-
#[patch."https://github.com/stackabletech/operator-rs.git"]
31+
[patch."https://github.com/stackabletech/operator-rs.git"]
3232
#stackable-operator = { path = "../operator-rs/crates/stackable-operator" }
3333
#stackable-operator = { git = "https://github.com/stackabletech//operator-rs.git", branch = "main" }

deploy/helm/hdfs-operator/crds/crds.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1807,6 +1807,17 @@ spec:
18071807
- type
18081808
type: object
18091809
type: array
1810+
deployedProductVersion:
1811+
description: |-
1812+
The product version that the HDFS cluster is currently running.
1813+
1814+
During upgrades, this field contains the *old* version.
1815+
nullable: true
1816+
type: string
1817+
upgradeTargetProductVersion:
1818+
description: The product version that is currently being upgraded to, otherwise null.
1819+
nullable: true
1820+
type: string
18101821
type: object
18111822
required:
18121823
- spec
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
= Upgrading HDFS
2+
3+
IMPORTANT: HDFS upgrades are experimental, and details may change at any time
4+
5+
HDFS currently requires a manual process to upgrade. This guide will take you through an example case, upgrading an example cluster (from our xref:getting_started/index.adoc[Getting Started] guide) from HDFS 3.3.6 to 3.4.0.
6+
7+
== Preparing for the worst
8+
9+
Upgrades can fail, and it is important to prepare for when that happens. Apache HDFS supports https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Downgrade_and_Rollback[two ways to revert an upgrade]:
10+
11+
Rollback:: Reverts all user data to the pre-upgrade state. Requires taking the cluster offline.
12+
Downgrade:: Downgrades the HDFS software but preserves all changes made by users. Can be performed as a rolling change, keeping the cluster online.
13+
14+
The Stackable Operator for HDFS supports downgrading but not rollbacks.
15+
16+
In order to downgrade, revert the `.spec.image.productVersion` field, and then proceed to xref:#finalize[finalizing] once the cluster is downgraded:
17+
18+
[source,shell]
19+
----
20+
$ kubectl patch hdfs/simple-hdfs --patch '{"spec": {"image": {"productVersion": "3.3.6"}}}' --type=merge
21+
hdfscluster.hdfs.stackable.tech/simple-hdfs patched
22+
----
23+
24+
== Preparing HDFS
25+
26+
HDFS must be configured to initiate the upgrade process. To do this, put the cluster into upgrade mode by running the following commands in an HDFS superuser environment
27+
(either a client configured with a superuser account, or from inside a NameNode pod):
28+
29+
// This could be automated by the operator, but dfsadmin does not have good machine-readable output.
30+
// It *can* be queried over JMX, but we're not so lucky for finalization.
31+
32+
[source,shell]
33+
----
34+
$ hdfs dfsadmin -rollingUpgrade prepare
35+
PREPARE rolling upgrade ...
36+
Preparing for upgrade. Data is being saved for rollback.
37+
Run "dfsadmin -rollingUpgrade query" to check the status
38+
for proceeding with rolling upgrade
39+
Block Pool ID: BP-841432641-10.244.0.29-1722612757853
40+
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341)
41+
Finalize Time: <NOT FINALIZED>
42+
43+
$ # Then run query until the HDFS is ready to proceed
44+
$ hdfs dfsadmin -rollingUpgrade query
45+
QUERY rolling upgrade ...
46+
Preparing for upgrade. Data is being saved for rollback.
47+
Run "dfsadmin -rollingUpgrade query" to check the status
48+
for proceeding with rolling upgrade
49+
Block Pool ID: BP-841432641-10.244.0.29-1722612757853
50+
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341)
51+
Finalize Time: <NOT FINALIZED>
52+
53+
$ # It is safe to proceed when the output indicates so, like this:
54+
$ hdfs dfsadmin -rollingUpgrade query
55+
QUERY rolling upgrade ...
56+
Proceed with rolling upgrade:
57+
Block Pool ID: BP-841432641-10.244.0.29-1722612757853
58+
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341)
59+
Finalize Time: <NOT FINALIZED>
60+
----
61+
62+
== Starting the upgrade
63+
64+
Once HDFS is ready to upgrade, the HdfsCluster can be updated with the new product version:
65+
66+
[source,shell]
67+
----
68+
$ kubectl patch hdfs/simple-hdfs --patch '{"spec": {"image": {"productVersion": "3.4.0"}}}' --type=merge
69+
hdfscluster.hdfs.stackable.tech/simple-hdfs patched
70+
----
71+
72+
Then wait until all pods have restarted, are in the Ready state, and running the new HDFS version.
73+
74+
NOTE: This will automatically enable the NameNodes' compatibility mode, allowing them to start despite the fsImage version mismatch.
75+
76+
NOTE: Services will be upgraded in order: JournalNodes, then NameNodes, then DataNodes.
77+
78+
[#finalize]
79+
== Finalizing the upgrade
80+
81+
Once all HDFS pods are running the new version, the HDFS upgrade can be finalized (from the HDFS superuser environment as described in the preparation step):
82+
83+
[source,shell]
84+
----
85+
$ hdfs dfsadmin -rollingUpgrade finalize
86+
FINALIZE rolling upgrade ...
87+
Rolling upgrade is finalized.
88+
Block Pool ID: BP-841432641-10.244.0.29-1722612757853
89+
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341)
90+
Finalize Time: Fri Aug 02 15:58:39 GMT 2024 (=1722614319854)
91+
----
92+
93+
// We can't safely automate this, because finalize is asynchronous and doesn't tell us whether all NameNodes have even received the request to finalize.
94+
95+
WARNING: Please ensure that all NameNodes are running and available before proceeding. NameNodes that have not finalized yet will crash on launch when taken out of compatibility mode.
96+
97+
Finally, mark the cluster as upgraded:
98+
99+
[source,shell]
100+
----
101+
$ kubectl patch hdfs/simple-hdfs --subresource=status --patch '{"status": {"deployedProductVersion": "3.4.0"}}' --type=merge
102+
hdfscluster.hdfs.stackable.tech/simple-hdfs patched
103+
----
104+
105+
NOTE: `deployedProductVersion` is located in the _status_ subresource, which will not be modified by most graphical editors, and `kubectl` requires the `--subresource=status` flag.
106+
107+
The NameNodes will then be restarted a final time, taking them out of compatibility mode.

docs/modules/hdfs/partials/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
** xref:hdfs:usage-guide/logging-log-aggregation.adoc[]
1111
** xref:hdfs:usage-guide/monitoring.adoc[]
1212
** xref:hdfs:usage-guide/configuration-environment-overrides.adoc[]
13+
** xref:hdfs:usage-guide/upgrading.adoc[]
1314
** xref:hdfs:usage-guide/operations/index.adoc[]
1415
*** xref:hdfs:usage-guide/operations/cluster-operations.adoc[]
1516
*** xref:hdfs:usage-guide/operations/pod-placement.adoc[]

rust/crd/src/lib.rs

Lines changed: 71 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ use stackable_operator::{
4141
status::condition::{ClusterCondition, HasStatusCondition},
4242
time::Duration,
4343
};
44-
use strum::{Display, EnumIter, EnumString};
44+
use strum::{Display, EnumIter, EnumString, IntoStaticStr};
4545

4646
use crate::{
4747
affinity::get_affinity,
@@ -312,27 +312,29 @@ impl AnyNodeConfig {
312312

313313
#[derive(
314314
Clone,
315+
Copy,
315316
Debug,
316317
Deserialize,
317318
Display,
318319
EnumIter,
319320
EnumString,
321+
IntoStaticStr,
320322
Eq,
321323
Hash,
322324
JsonSchema,
323325
PartialEq,
324326
Serialize,
325327
)]
326328
pub enum HdfsRole {
329+
#[serde(rename = "journalnode")]
330+
#[strum(serialize = "journalnode")]
331+
JournalNode,
327332
#[serde(rename = "namenode")]
328333
#[strum(serialize = "namenode")]
329334
NameNode,
330335
#[serde(rename = "datanode")]
331336
#[strum(serialize = "datanode")]
332337
DataNode,
333-
#[serde(rename = "journalnode")]
334-
#[strum(serialize = "journalnode")]
335-
JournalNode,
336338
}
337339

338340
impl HdfsRole {
@@ -802,6 +804,43 @@ impl HdfsCluster {
802804
Ok(result)
803805
}
804806

807+
pub fn upgrade_state(&self) -> Result<Option<UpgradeState>, UpgradeStateError> {
808+
use upgrade_state_error::*;
809+
let Some(status) = self.status.as_ref() else {
810+
return Ok(None);
811+
};
812+
let requested_version = self.spec.image.product_version();
813+
let Some(deployed_version) = status.deployed_product_version.as_deref() else {
814+
// If no deployed version, fresh install -> no upgrade
815+
return Ok(None);
816+
};
817+
let current_upgrade_target_version = status.upgrade_target_product_version.as_deref();
818+
819+
if requested_version != deployed_version {
820+
// If we're requesting a different version than what is deployed, assume that we're upgrading.
821+
// Could also be a downgrade to an older version, but we don't support downgrades after upgrade finalization.
822+
match current_upgrade_target_version {
823+
Some(upgrading_version) if requested_version != upgrading_version => {
824+
// If we're in an upgrade, do not allow switching to a third version
825+
InvalidCrossgradeSnafu {
826+
requested_version,
827+
deployed_version,
828+
upgrading_version,
829+
}
830+
.fail()
831+
}
832+
_ => Ok(Some(UpgradeState::Upgrading)),
833+
}
834+
} else if current_upgrade_target_version.is_some_and(|x| requested_version != x) {
835+
// If we're requesting the old version mid-upgrade, assume that we're downgrading.
836+
// We only support downgrading to the exact previous version.
837+
Ok(Some(UpgradeState::Downgrading))
838+
} else {
839+
// All three versions match, upgrade was completed without clearing `upgrading_product_version`.
840+
Ok(None)
841+
}
842+
}
843+
805844
pub fn authentication_config(&self) -> Option<&AuthenticationConfig> {
806845
self.spec.cluster_config.authentication.as_ref()
807846
}
@@ -955,6 +994,26 @@ impl HdfsPodRef {
955994
}
956995
}
957996

997+
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
998+
pub enum UpgradeState {
999+
/// The cluster is currently being upgraded to a new version.
1000+
Upgrading,
1001+
1002+
/// The cluster is currently being downgraded to the previous version.
1003+
Downgrading,
1004+
}
1005+
1006+
#[derive(Debug, Snafu)]
1007+
#[snafu(module)]
1008+
pub enum UpgradeStateError {
1009+
#[snafu(display("requested version {requested_version:?} while still upgrading from {deployed_version:?} to {upgrading_version:?}, please finish the upgrade or downgrade first"))]
1010+
InvalidCrossgrade {
1011+
requested_version: String,
1012+
deployed_version: String,
1013+
upgrading_version: String,
1014+
},
1015+
}
1016+
9581017
#[derive(
9591018
Clone,
9601019
Debug,
@@ -1322,6 +1381,14 @@ impl Configuration for JournalNodeConfigFragment {
13221381
pub struct HdfsClusterStatus {
13231382
#[serde(default)]
13241383
pub conditions: Vec<ClusterCondition>,
1384+
1385+
/// The product version that the HDFS cluster is currently running.
1386+
///
1387+
/// During upgrades, this field contains the *old* version.
1388+
pub deployed_product_version: Option<String>,
1389+
1390+
/// The product version that is currently being upgraded to, otherwise null.
1391+
pub upgrade_target_product_version: Option<String>,
13251392
}
13261393

13271394
impl HasStatusCondition for HdfsCluster {

rust/operator-binary/src/container.rs

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
use crate::DATANODE_ROOT_DATA_DIR_PREFIX;
1313
use crate::JVM_SECURITY_PROPERTIES_FILE;
1414
use crate::LOG4J_PROPERTIES;
15+
use stackable_hdfs_crd::UpgradeState;
1516
use stackable_operator::utils::COMMON_BASH_TRAP_FUNCTIONS;
1617
use std::{collections::BTreeMap, str::FromStr};
1718

@@ -212,7 +213,7 @@ impl ContainerConfig {
212213
labels: &Labels,
213214
) -> Result<(), Error> {
214215
// HDFS main container
215-
let main_container_config = Self::from(role.clone());
216+
let main_container_config = Self::from(*role);
216217
pb.add_volumes(main_container_config.volumes(merged_config, object_name, labels)?);
217218
pb.add_container(main_container_config.main_container(
218219
hdfs,
@@ -548,6 +549,14 @@ impl ContainerConfig {
548549
args.push_str(&Self::export_kerberos_real_env_var_command());
549550
}
550551

552+
let upgrade_args = if hdfs.upgrade_state().ok() == Some(Some(UpgradeState::Upgrading))
553+
&& *role == HdfsRole::NameNode
554+
{
555+
"-rollingUpgrade started"
556+
} else {
557+
""
558+
};
559+
551560
match self {
552561
ContainerConfig::Hdfs { role, .. } => {
553562
args.push_str(&self.copy_log4j_properties_cmd(
@@ -566,7 +575,7 @@ if [[ -d {LISTENER_VOLUME_DIR} ]]; then
566575
export $(basename $i | tr a-z- A-Z_)_PORT="$(cat $i)"
567576
done
568577
fi
569-
{hadoop_home}/bin/hdfs {role} &
578+
{hadoop_home}/bin/hdfs {role} {upgrade_args} &
570579
wait_for_termination $!
571580
{create_vector_shutdown_file_command}
572581
"#,
@@ -1317,7 +1326,7 @@ impl From<HdfsRole> for ContainerConfig {
13171326
fn from(role: HdfsRole) -> Self {
13181327
match role {
13191328
HdfsRole::NameNode => Self::Hdfs {
1320-
role: role.clone(),
1329+
role,
13211330
container_name: role.to_string(),
13221331
volume_mounts: ContainerVolumeDirs::from(role),
13231332
ipc_port_name: SERVICE_PORT_NAME_RPC,
@@ -1327,7 +1336,7 @@ impl From<HdfsRole> for ContainerConfig {
13271336
metrics_port: DEFAULT_NAME_NODE_METRICS_PORT,
13281337
},
13291338
HdfsRole::DataNode => Self::Hdfs {
1330-
role: role.clone(),
1339+
role,
13311340
container_name: role.to_string(),
13321341
volume_mounts: ContainerVolumeDirs::from(role),
13331342
ipc_port_name: SERVICE_PORT_NAME_IPC,
@@ -1337,7 +1346,7 @@ impl From<HdfsRole> for ContainerConfig {
13371346
metrics_port: DEFAULT_DATA_NODE_METRICS_PORT,
13381347
},
13391348
HdfsRole::JournalNode => Self::Hdfs {
1340-
role: role.clone(),
1349+
role,
13411350
container_name: role.to_string(),
13421351
volume_mounts: ContainerVolumeDirs::from(role),
13431352
ipc_port_name: SERVICE_PORT_NAME_RPC,

0 commit comments

Comments
 (0)