brew install terraform
brew install helmPrepare helm charts:
helm repo add nessie-helm https://charts.projectnessie.org
helm repo add trino https://trinodb.github.io/charts/
helm repo add superset https://apache.github.io/superset
helm repo add dagster https://dagster-io.github.io/helm
helm repo updateIn the config/dagster/app folder you find a simple dagster DAG which copies the data from one table and stores it in a second table.
For dagster we need to create a user code docker image and make it available for the dagster server.
We already did this for you and the docker image is available under ghcr.io/bigdatarepublic/bdr-open-data-platform-code:1
You can also change the pipeline and create your own docker image:
cd ../../config/dagster
docker build --platform linux/amd64 -t dagster_code_amd64:1cd platform
terraform init
# create workspace for local, cyso or scaleway
terraform workspace new (local/cyso/scaleway)
terraform workspace show
terraform applyYou just deployed a data platform to your kubernetes cluster. Do a little dance ♪┏(・o・)┛♪┗ ( ・o・) ┓♪.
Lets get it to work now. First we will add some data via trino:
# run trino in trino pod
TRINO_POD=$(kubectl get pods | grep trino | awk '{print $1}')
kubectl exec -it $TRINO_POD -- trinoIn the trino cli you can run SQL statements to add data:
CREATE SCHEMA iceberg.test_schema;
CREATE TABLE iceberg.test_schema.employees_test
(
name varchar,
salary decimal(10,2)
)
WITH (
format = 'PARQUET'
);
INSERT INTO iceberg.test_schema.employees_test (name, salary) VALUES ('Steven Rogers', 55000);Lets query the data with superset:
# for local example use http://localhost:8088
# for cloud example get public ip
SUPERSET_IP=$(kubectl get svc | grep "superset.*LoadBalancer" | awk '{print $4}')
echo http://${SUPERSET_IP}:8088 # admin:admin
# add trino as database with url: trino://default@trino:8080/iceberg/test_schema
# run query in sql lab: SELECT * FROM test_schema.employees_test;Lets run a pipeline with dagster:
# for local example use port forwarding and http://localhost:30089
kubectl port-forward svc/dagster-dagster-webserver 30089:30089
# for cloud example get public ip
DAGSTER_IP=$(kubectl get svc | grep "dagster.*LoadBalancer" | awk '{print $4}')
echo http://${DAGSTER_IP}:30089
# try to materialize the asset