Skip to content

Commit 17efbbc

Browse files
adwk67xeniape
andauthored
feat: Add flag for db init routine (#669)
* add flag for db init routine incl. test asserts as part of cluster ops * changelog * added a doc * reworked crd change following decision * straightened up crd docs * Update rust/operator-binary/src/crd/mod.rs Co-authored-by: Xenia <[email protected]> * improved trace message * clarified doc comment --------- Co-authored-by: Xenia <[email protected]>
1 parent faedcbd commit 17efbbc

File tree

9 files changed

+143
-34
lines changed

9 files changed

+143
-34
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,18 @@
22

33
## [Unreleased]
44

5+
### Added
6+
7+
- Add a flag to determine if database initialization steps should be executed ([#669]).
8+
59
### Fixed
610

711
- Don't panic on invalid authorization config. Previously, a missing OPA ConfigMap would crash the operator ([#667]).
812
- Fix OPA authorization for Airflow 3. Airflow 3 needs to be configured via env variables, the operator now does this correctly ([#668]).
913

1014
[#667]: https://github.com/stackabletech/airflow-operator/pull/667
1115
[#668]: https://github.com/stackabletech/airflow-operator/pull/668
16+
[#669]: https://github.com/stackabletech/airflow-operator/pull/669
1217

1318
## [25.7.0] - 2025-07-23
1419

deploy/helm/airflow-operator/crds/crds.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -591,6 +591,16 @@ spec:
591591
- repo
592592
type: object
593593
type: array
594+
databaseInitialization:
595+
default:
596+
enabled: true
597+
description: Settings related to the database initialization routines (which are always executed by default).
598+
properties:
599+
enabled:
600+
default: true
601+
description: 'Whether to execute the database initialization routines (a combination of database initialization, upgrade and migration depending on the Airflow version). Defaults to true to be backwards-compatible. WARNING: setting this to false is *unsupported* as subsequent updates to the Airflow cluster may result in broken behaviour due to inconsistent metadata! Do not change the default unless you know what you are doing!'
602+
type: boolean
603+
type: object
594604
exposeConfig:
595605
default: false
596606
description: for internal use only - not for production use.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
= Database initialization
2+
:description: Configure Airflow Database start-up.
3+
4+
By default, Airflow will run database initialization routines (checking and/or creating the metadata schema and creating an admin user) on start-up.
5+
These are idempotent and can be run every time as the overhead is minimal.
6+
However, if these steps should be skipped, a running Airflow cluster can be patched with a resource like this to deactivate the initialization:
7+
8+
[source,yaml]
9+
----
10+
---
11+
apiVersion: airflow.stackable.tech/v1alpha1
12+
kind: AirflowCluster
13+
metadata:
14+
name: airflow
15+
spec:
16+
clusterConfig:
17+
databaseInitialization:
18+
enabled: false # <1>
19+
----
20+
<1> Turn off the initialization routine by setting `databaseInitialization.enabled` to `false`
21+
22+
NOTE: The field `databaseInitialization.enabled` is `true` by default to be backwards-compatible.
23+
A fresh Airflow cluster cannot be created with this field set to `false` as this results in missing metadata in the Airflow database.
24+
25+
WARNING: Setting `databaseInitialization.enabled` to `false` is an unsupported operation as subsequent updates to a running Airflow cluster can result in broken behaviour due to inconsistent metadata.
26+
Only set `databaseInitialization.enabled` to `false` if you know what you are doing!

docs/modules/airflow/partials/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
** xref:airflow:getting_started/first_steps.adoc[]
44
* xref:airflow:required-external-components.adoc[]
55
* xref:airflow:usage-guide/index.adoc[]
6+
** xref:airflow:usage-guide/db-init.adoc[]
67
** xref:airflow:usage-guide/mounting-dags.adoc[]
78
** xref:airflow:usage-guide/applying-custom-resources.adoc[]
89
** xref:airflow:usage-guide/listenerclass.adoc[]

rust/operator-binary/src/airflow_controller.rs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -951,8 +951,11 @@ fn build_server_rolegroup_statefulset(
951951
.context(GracefulShutdownSnafu)?;
952952

953953
let mut airflow_container_args = Vec::new();
954-
airflow_container_args
955-
.extend(airflow_role.get_commands(authentication_config, resolved_product_image));
954+
airflow_container_args.extend(airflow_role.get_commands(
955+
airflow,
956+
authentication_config,
957+
resolved_product_image,
958+
));
956959

957960
airflow_container
958961
.image_from_product_image(resolved_product_image)

rust/operator-binary/src/crd/mod.rs

Lines changed: 78 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,10 @@ pub mod versioned {
251251
#[serde(default)]
252252
pub load_examples: bool,
253253

254+
/// Settings related to the database initialization routines (which are always executed by default).
255+
#[serde(default)]
256+
pub database_initialization: DatabaseInitializationConfig,
257+
254258
/// Name of the Vector aggregator [discovery ConfigMap](DOCS_BASE_URL_PLACEHOLDER/concepts/service_discovery).
255259
/// It must contain the key `ADDRESS` with the address of the Vector aggregator.
256260
/// Follow the [logging tutorial](DOCS_BASE_URL_PLACEHOLDER/tutorials/logging-vector-aggregator)
@@ -268,7 +272,6 @@ pub mod versioned {
268272
#[schemars(schema_with = "raw_object_list_schema")]
269273
pub volume_mounts: Vec<VolumeMount>,
270274
}
271-
272275
// TODO: move generic version to op-rs?
273276
#[derive(Clone, Debug, Deserialize, JsonSchema, PartialEq, Serialize)]
274277
#[serde(rename_all = "camelCase")]
@@ -282,6 +285,28 @@ pub mod versioned {
282285
}
283286
}
284287

288+
#[derive(Clone, Debug, Deserialize, JsonSchema, PartialEq, Serialize)]
289+
#[serde(rename_all = "camelCase")]
290+
pub struct DatabaseInitializationConfig {
291+
/// Whether to execute the database initialization routines (a combination of database initialization, upgrade and migration depending on the Airflow version). Defaults to true to be backwards-compatible.
292+
/// WARNING: setting this to false is *unsupported* as subsequent updates to the Airflow cluster may result in broken behaviour due to inconsistent metadata!
293+
/// Do not change the default unless you know what you are doing!
294+
#[serde(default = "default_db_init")]
295+
pub enabled: bool,
296+
}
297+
298+
impl Default for DatabaseInitializationConfig {
299+
fn default() -> Self {
300+
Self {
301+
enabled: default_db_init(),
302+
}
303+
}
304+
}
305+
306+
pub fn default_db_init() -> bool {
307+
true
308+
}
309+
285310
impl Default for v1alpha1::WebserverRoleConfig {
286311
fn default() -> Self {
287312
v1alpha1::WebserverRoleConfig {
@@ -547,6 +572,7 @@ impl AirflowRole {
547572
/// if authentication is enabled.
548573
pub fn get_commands(
549574
&self,
575+
airflow: &v1alpha1::AirflowCluster,
550576
auth_config: &AirflowClientAuthenticationDetailsResolved,
551577
resolved_product_image: &ResolvedProductImage,
552578
) -> Vec<String> {
@@ -576,21 +602,30 @@ impl AirflowRole {
576602
"airflow api-server &".to_string(),
577603
]);
578604
}
579-
AirflowRole::Scheduler => command.extend(vec![
580-
"airflow db migrate".to_string(),
581-
"airflow users create \
582-
--username \"$ADMIN_USERNAME\" \
583-
--firstname \"$ADMIN_FIRSTNAME\" \
584-
--lastname \"$ADMIN_LASTNAME\" \
585-
--email \"$ADMIN_EMAIL\" \
586-
--password \"$ADMIN_PASSWORD\" \
587-
--role \"Admin\""
588-
.to_string(),
589-
"prepare_signal_handlers".to_string(),
590-
container_debug_command(),
591-
"airflow dag-processor &".to_string(),
592-
"airflow scheduler &".to_string(),
593-
]),
605+
AirflowRole::Scheduler => {
606+
if airflow.spec.cluster_config.database_initialization.enabled {
607+
tracing::info!("Database initialization has been enabled.");
608+
command.extend(vec![
609+
"airflow db migrate".to_string(),
610+
"airflow users create \
611+
--username \"$ADMIN_USERNAME\" \
612+
--firstname \"$ADMIN_FIRSTNAME\" \
613+
--lastname \"$ADMIN_LASTNAME\" \
614+
--email \"$ADMIN_EMAIL\" \
615+
--password \"$ADMIN_PASSWORD\" \
616+
--role \"Admin\""
617+
.to_string(),
618+
]);
619+
} else {
620+
tracing::info!("Database initialization routines have been skipped!")
621+
}
622+
command.extend(vec![
623+
"prepare_signal_handlers".to_string(),
624+
container_debug_command(),
625+
"airflow dag-processor &".to_string(),
626+
"airflow scheduler &".to_string(),
627+
]);
628+
}
594629
AirflowRole::Worker => command.extend(vec![
595630
"prepare_signal_handlers".to_string(),
596631
container_debug_command(),
@@ -608,22 +643,31 @@ impl AirflowRole {
608643
"airflow webserver &".to_string(),
609644
]);
610645
}
611-
AirflowRole::Scheduler => command.extend(vec![
612-
// Database initialization is limited to the scheduler, see https://github.com/stackabletech/airflow-operator/issues/259
613-
"airflow db init".to_string(),
614-
"airflow db upgrade".to_string(),
615-
"airflow users create \
616-
--username \"$ADMIN_USERNAME\" \
617-
--firstname \"$ADMIN_FIRSTNAME\" \
618-
--lastname \"$ADMIN_LASTNAME\" \
619-
--email \"$ADMIN_EMAIL\" \
620-
--password \"$ADMIN_PASSWORD\" \
621-
--role \"Admin\""
622-
.to_string(),
623-
"prepare_signal_handlers".to_string(),
624-
container_debug_command(),
625-
"airflow scheduler &".to_string(),
626-
]),
646+
AirflowRole::Scheduler => {
647+
if airflow.spec.cluster_config.database_initialization.enabled {
648+
tracing::info!("Database initialization has been enabled.");
649+
command.extend(vec![
650+
// Database initialization is limited to the scheduler, see https://github.com/stackabletech/airflow-operator/issues/259
651+
"airflow db init".to_string(),
652+
"airflow db upgrade".to_string(),
653+
"airflow users create \
654+
--username \"$ADMIN_USERNAME\" \
655+
--firstname \"$ADMIN_FIRSTNAME\" \
656+
--lastname \"$ADMIN_LASTNAME\" \
657+
--email \"$ADMIN_EMAIL\" \
658+
--password \"$ADMIN_PASSWORD\" \
659+
--role \"Admin\""
660+
.to_string(),
661+
]);
662+
} else {
663+
tracing::info!("Database initialization routines have been skipped!")
664+
}
665+
command.extend(vec![
666+
"prepare_signal_handlers".to_string(),
667+
container_debug_command(),
668+
"airflow scheduler &".to_string(),
669+
]);
670+
}
627671
AirflowRole::Worker => command.extend(vec![
628672
"prepare_signal_handlers".to_string(),
629673
container_debug_command(),
@@ -981,5 +1025,7 @@ mod tests {
9811025
assert_eq!("KubernetesExecutor", cluster.spec.executor.to_string());
9821026
assert!(cluster.spec.cluster_config.load_examples);
9831027
assert!(cluster.spec.cluster_config.expose_config);
1028+
// defaults to true
1029+
assert!(cluster.spec.cluster_config.database_initialization.enabled);
9841030
}
9851031
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# For this assert we expect the database operation to be logged
3+
apiVersion: kuttl.dev/v1beta1
4+
kind: TestAssert
5+
timeout: 30
6+
commands:
7+
- script: |
8+
kubectl -n $NAMESPACE logs airflow-scheduler-default-0 | grep "Database migrating done!"

tests/templates/kuttl/cluster-operation/30-restart-airflow.yaml.j2

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ spec:
2525
vectorAggregatorConfigMapName: vector-aggregator-discovery
2626
{% endif %}
2727
credentialsSecret: test-airflow-credentials
28+
databaseInitialization:
29+
enabled: false
2830
webservers:
2931
roleConfig:
3032
listenerClass: external-unstable
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# For this step we expect the database operation to NOT be logged
3+
apiVersion: kuttl.dev/v1beta1
4+
kind: TestAssert
5+
timeout: 30
6+
commands:
7+
- script: |
8+
kubectl -n $NAMESPACE logs airflow-scheduler-default-0 | grep -q "Database migrating done!" && exit 1 || exit 0

0 commit comments

Comments
 (0)