Skip to content

Latest commit

 

History

History
65 lines (42 loc) · 2.8 KB

File metadata and controls

65 lines (42 loc) · 2.8 KB

BigData Republic Open Data Platform

Introduction

The BigData Republic Open Data Platform is a repository of Terraform templates that help to deploy a modern, open-source data platform on European cloud providers. The platform follows open data stack principles to promote interoperability and avoid vendor lock‑in.

The Open Data Stack is a collection of open-source tools and open standards that together support the full data engineering lifecycle — enabling scalable, flexible, and cost‑effective data platforms without proprietary constraints.

Goal

This project helps organizations experiment with running a data platform on European cloud infrastructure and achieve a first working deployment within a single day.

It is an opinionated starting point. Many production concerns are intentionally out of scope for now, including:

  • Network security hardening
  • Authentication & authorization (IAM / SSO)
  • Streaming / real‑time ingestion
  • Backup, disaster recovery, and lifecycle policies
  • Cost governance & observability

Integrations

Currently supported deployment targets:

Local Kubernetes deployment is also supported for development and experimentation.

Solution Overview

The solution consists of two layers:

  1. Infrastructure provisioning (cloud + local) via Terraform
  2. Data platform deployment via Terraform + Helm on the provisioned Kubernetes cluster

Deployment

Infrastructure Deployment

The infrastructure layer provisions the core building blocks:

  • Object storage
  • Kubernetes cluster

Each cloud integration lives under infra/<provider> and implements the provider‑specific provisioning logic. Depending on the provider, resources are created either through direct Terraform providers (e.g. Scaleway) or via OpenStack APIs plus kubectl (e.g. Cyso Cloud). A local option is also available for testing.

Refer to the provider documentation for setup instructions:

Data Platform Deployment

The proof‑of‑concept data platform assembles a lean but capable open-source stack:

Trino is configured to persist datasets as Apache Iceberg tables in object storage via the Nessie catalog.

BigData Republic Open Data Platform

Deployment steps are provider‑agnostic; see the platform deployment guide here.

License

Released under the MIT License.