Skip to content

Commit 8b1cc2a

Browse files
test: Add test for Apache Iceberg integration (#785)
* Clean up smoke test * clean up smoke test part 2 * Add working test :) * Move files * Add and test HDFS functionality * Kerbize HDFS and HMS * Add Kerberos test * Use nightly image * linter * Update Iceberg docs * changelog * Small bumps * Update docs/modules/nifi/pages/usage_guide/writing-to-iceberg-tables.adoc Co-authored-by: Nick <[email protected]> --------- Co-authored-by: Nick <[email protected]>
1 parent b8dba77 commit 8b1cc2a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+3562
-114
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ All notable changes to this project will be documented in this file.
1111
- Use `--file-log-max-files` (or `FILE_LOG_MAX_FILES`) to limit the number of log files kept.
1212
- Use `--file-log-rotation-period` (or `FILE_LOG_ROTATION_PERIOD`) to configure the frequency of rotation.
1313
- Use `--console-log-format` (or `CONSOLE_LOG_FORMAT`) to set the format to `plain` (default) or `json`.
14+
- Add test for Apache Iceberg integration ([#785]).
1415

1516
### Changed
1617

@@ -39,6 +40,7 @@ All notable changes to this project will be documented in this file.
3940
[#774]: https://github.com/stackabletech/nifi-operator/pull/774
4041
[#776]: https://github.com/stackabletech/nifi-operator/pull/776
4142
[#782]: https://github.com/stackabletech/nifi-operator/pull/782
43+
[#785]: https://github.com/stackabletech/nifi-operator/pull/785
4244
[#787]: https://github.com/stackabletech/nifi-operator/pull/787
4345
[#789]: https://github.com/stackabletech/nifi-operator/pull/789
4446

docs/modules/nifi/pages/usage_guide/writing-to-iceberg-tables.adoc

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,27 @@
22
:description: Write to Apache Iceberg tables in NiFi using the PutIceberg processor. Supports integration with S3 and Hive Metastore for scalable data handling.
33
:iceberg: https://iceberg.apache.org/
44

5-
WARNING: In NiFi `2.0.0` Iceberg support https://issues.apache.org/jira/browse/NIFI-13938[has been removed].
6-
75
{iceberg}[Apache Iceberg] is a high-performance format for huge analytic tables.
86
Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.
97

108
NiFi supports a `PutIceberg` processor to add rows to an existing Iceberg table https://issues.apache.org/jira/browse/NIFI-10442[starting from version 1.19.0].
11-
As of NiFi version `1.23.1` only `PutIceberg` is supported, you need to create and compact your tables with other tools such as Trino or Spark (both included in the Stackable Data Platform).
9+
As of NiFi version `2.4.0` only `PutIceberg` is supported, you need to create and compact your tables with other tools such as Trino or Spark (both included in the Stackable Data Platform).
10+
11+
== NiFi 2
12+
13+
In NiFi `2.0.0` Iceberg support https://issues.apache.org/jira/browse/NIFI-13938[has been removed] from upstream NiFi.
14+
15+
We forked the `nifi-iceberg-bundle` and made it available at https://github.com/stackabletech/nifi-iceberg-bundle.
16+
Starting with SDP 25.7, we have added the necessary bundle to NiFi by default, you don't need to explicitly add Iceberg support to the Stackable NiFi.
17+
18+
Please read on https://github.com/stackabletech/nifi-iceberg-bundle[its documentation] on how to ingest data into Iceberg tables.
19+
You don't need any special configs on the `NiFiCluster` in case you are using S3 and no Kerberos.
20+
21+
HDFS and Kerberos are also supported, please have a look at the https://github.com/stackabletech/nifi-operator/tree/main/tests/templates/kuttl/iceberg[Iceberg integration test] for that.
22+
23+
== NiFi 1
24+
25+
Starting with `1.19.0`, NiFi supports writing to Iceberg tables.
1226

1327
The following example shows an example NiFi setup using the Iceberg integration.
1428

tests/release.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,15 @@ releases:
1212
operatorVersion: 0.0.0-dev
1313
listener:
1414
operatorVersion: 0.0.0-dev
15+
opa:
16+
operatorVersion: 0.0.0-dev
1517
zookeeper:
1618
operatorVersion: 0.0.0-dev
19+
hdfs:
20+
operatorVersion: 0.0.0-dev
21+
hive:
22+
operatorVersion: 0.0.0-dev
23+
trino:
24+
operatorVersion: 0.0.0-dev
1725
nifi:
1826
operatorVersion: 0.0.0-dev
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{% if test_scenario['values']['openshift'] == 'true' %}
2+
# see https://github.com/stackabletech/issues/issues/566
3+
---
4+
apiVersion: kuttl.dev/v1beta1
5+
kind: TestStep
6+
commands:
7+
- script: kubectl patch namespace $NAMESPACE -p '{"metadata":{"labels":{"pod-security.kubernetes.io/enforce":"privileged"}}}'
8+
timeout: 120
9+
{% endif %}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
kind: Role
3+
apiVersion: rbac.authorization.k8s.io/v1
4+
metadata:
5+
name: test-role
6+
rules:
7+
{% if test_scenario['values']['openshift'] == "true" %}
8+
- apiGroups: ["security.openshift.io"]
9+
resources: ["securitycontextconstraints"]
10+
resourceNames: ["privileged"]
11+
verbs: ["use"]
12+
{% endif %}
13+
---
14+
apiVersion: v1
15+
kind: ServiceAccount
16+
metadata:
17+
name: test-sa
18+
---
19+
kind: RoleBinding
20+
apiVersion: rbac.authorization.k8s.io/v1
21+
metadata:
22+
name: test-rb
23+
subjects:
24+
- kind: ServiceAccount
25+
name: test-sa
26+
roleRef:
27+
kind: Role
28+
name: test-role
29+
apiGroup: rbac.authorization.k8s.io
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
apiVersion: kuttl.dev/v1beta1
3+
kind: TestStep
4+
commands:
5+
- script: envsubst '$NAMESPACE' < 01_s3-connection.yaml | kubectl apply -n $NAMESPACE -f -
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
apiVersion: s3.stackable.tech/v1alpha1
3+
kind: S3Connection
4+
metadata:
5+
name: minio
6+
spec:
7+
host: "minio.${NAMESPACE}.svc.cluster.local"
8+
port: 9000
9+
accessStyle: Path
10+
credentials:
11+
secretClass: s3-credentials-class
12+
tls:
13+
verification:
14+
server:
15+
caCert:
16+
secretClass: tls
17+
---
18+
apiVersion: secrets.stackable.tech/v1alpha1
19+
kind: SecretClass
20+
metadata:
21+
name: s3-credentials-class
22+
spec:
23+
backend:
24+
k8sSearch:
25+
searchNamespace:
26+
pod: {}
27+
---
28+
apiVersion: v1
29+
kind: Secret
30+
metadata:
31+
name: minio-credentials
32+
labels:
33+
secrets.stackable.tech/class: s3-credentials-class
34+
stringData:
35+
accessKey: admin
36+
secretKey: adminadmin
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
apiVersion: kuttl.dev/v1beta1
3+
kind: TestAssert
4+
timeout: 300
5+
{% if test_scenario['values']['iceberg-use-kerberos'] == 'true' %}
6+
---
7+
apiVersion: apps/v1
8+
kind: StatefulSet
9+
metadata:
10+
name: krb5-kdc
11+
status:
12+
readyReplicas: 1
13+
replicas: 1
14+
{% endif %}
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
{% if test_scenario['values']['iceberg-use-kerberos'] == 'true' %}
2+
apiVersion: apps/v1
3+
kind: StatefulSet
4+
metadata:
5+
name: krb5-kdc
6+
spec:
7+
selector:
8+
matchLabels:
9+
app: krb5-kdc
10+
template:
11+
metadata:
12+
labels:
13+
app: krb5-kdc
14+
spec:
15+
serviceAccountName: test-sa
16+
initContainers:
17+
- name: init
18+
image: oci.stackable.tech/sdp/krb5:{{ test_scenario['values']['krb5'] }}-stackable0.0.0-dev
19+
args:
20+
- sh
21+
- -euo
22+
- pipefail
23+
- -c
24+
- |
25+
test -e /var/kerberos/krb5kdc/principal || kdb5_util create -s -P asdf
26+
kadmin.local get_principal -terse root/admin || kadmin.local add_principal -pw asdf root/admin
27+
# stackable-secret-operator principal must match the keytab specified in the SecretClass
28+
kadmin.local get_principal -terse stackable-secret-operator || kadmin.local add_principal -e aes256-cts-hmac-sha384-192:normal -pw asdf stackable-secret-operator
29+
env:
30+
- name: KRB5_CONFIG
31+
value: /stackable/config/krb5.conf
32+
volumeMounts:
33+
- mountPath: /stackable/config
34+
name: config
35+
- mountPath: /var/kerberos/krb5kdc
36+
name: data
37+
containers:
38+
- name: kdc
39+
image: oci.stackable.tech/sdp/krb5:{{ test_scenario['values']['krb5'] }}-stackable0.0.0-dev
40+
args:
41+
- krb5kdc
42+
- -n
43+
env:
44+
- name: KRB5_CONFIG
45+
value: /stackable/config/krb5.conf
46+
volumeMounts:
47+
- mountPath: /stackable/config
48+
name: config
49+
- mountPath: /var/kerberos/krb5kdc
50+
name: data
51+
# Root permissions required on Openshift to access internal ports
52+
{% if test_scenario['values']['openshift'] == "true" %}
53+
securityContext:
54+
runAsUser: 0
55+
{% endif %}
56+
- name: kadmind
57+
image: oci.stackable.tech/sdp/krb5:{{ test_scenario['values']['krb5'] }}-stackable0.0.0-dev
58+
args:
59+
- kadmind
60+
- -nofork
61+
env:
62+
- name: KRB5_CONFIG
63+
value: /stackable/config/krb5.conf
64+
volumeMounts:
65+
- mountPath: /stackable/config
66+
name: config
67+
- mountPath: /var/kerberos/krb5kdc
68+
name: data
69+
# Root permissions required on Openshift to access internal ports
70+
{% if test_scenario['values']['openshift'] == "true" %}
71+
securityContext:
72+
runAsUser: 0
73+
{% endif %}
74+
- name: client
75+
image: oci.stackable.tech/sdp/krb5:{{ test_scenario['values']['krb5'] }}-stackable0.0.0-dev
76+
tty: true
77+
stdin: true
78+
env:
79+
- name: KRB5_CONFIG
80+
value: /stackable/config/krb5.conf
81+
volumeMounts:
82+
- mountPath: /stackable/config
83+
name: config
84+
volumes:
85+
- name: config
86+
configMap:
87+
name: krb5-kdc
88+
volumeClaimTemplates:
89+
- metadata:
90+
name: data
91+
spec:
92+
accessModes:
93+
- ReadWriteOnce
94+
resources:
95+
requests:
96+
storage: 1Gi
97+
---
98+
apiVersion: v1
99+
kind: Service
100+
metadata:
101+
name: krb5-kdc
102+
spec:
103+
selector:
104+
app: krb5-kdc
105+
ports:
106+
- name: kadmin
107+
port: 749
108+
- name: kdc
109+
port: 88
110+
- name: kdc-udp
111+
port: 88
112+
protocol: UDP
113+
---
114+
apiVersion: v1
115+
kind: ConfigMap
116+
metadata:
117+
name: krb5-kdc
118+
data:
119+
krb5.conf: |
120+
[logging]
121+
default = STDERR
122+
kdc = STDERR
123+
admin_server = STDERR
124+
# default = FILE:/var/log/krb5libs.log
125+
# kdc = FILE:/var/log/krb5kdc.log
126+
# admin_server = FILE:/vaggr/log/kadmind.log
127+
[libdefaults]
128+
dns_lookup_realm = false
129+
ticket_lifetime = 24h
130+
renew_lifetime = 7d
131+
forwardable = true
132+
rdns = false
133+
default_realm = {{ test_scenario['values']['kerberos-realm'] }}
134+
spake_preauth_groups = edwards25519
135+
[realms]
136+
{{ test_scenario['values']['kerberos-realm'] }} = {
137+
acl_file = /stackable/config/kadm5.acl
138+
disable_encrypted_timestamp = false
139+
}
140+
[domain_realm]
141+
.cluster.local = {{ test_scenario['values']['kerberos-realm'] }}
142+
cluster.local = {{ test_scenario['values']['kerberos-realm'] }}
143+
kadm5.acl: |
144+
root/admin *e
145+
stackable-secret-operator *e
146+
{% endif %}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
{% if test_scenario['values']['iceberg-use-kerberos'] == 'true' %}
3+
apiVersion: kuttl.dev/v1beta1
4+
kind: TestStep
5+
commands:
6+
# We need to replace $NAMESPACE (by KUTTL)
7+
- script: envsubst '$NAMESPACE' < 03_kerberos-secretclass.yaml | kubectl apply -n $NAMESPACE -f -
8+
{% endif %}

0 commit comments

Comments
 (0)