You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: improve project documentation and README structure
- Streamline main README with clearer navigation and architecture overview
- Add comprehensive workloads README with component descriptions
- Enhance ingest workload documentation with usage examples and configuration
- Add new Makefile targets for install and run-help commands
- Improve project structure visibility and quick start instructions
-**[📋 Examples](docs/examples.md)** - Comprehensive YAML examples and use cases
19
18
20
19
---
21
20
22
-
## 🚀 Overview
23
-
24
-
Pipeline Forge is a complete solution for orchestrating data pipelines in Kubernetes environments. It combines a powerful Kubernetes operator with specialized workloads to provide a declarative, event-driven approach to data pipeline management.
25
-
26
-
### 🎯 The Problem
27
-
28
-
Modern data teams face challenges with:
29
-
30
-
-**Complex Orchestration**: Managing dependencies between data ingestion and transformation
31
-
-**Event-Driven Requirements**: Responding to file drops, database changes, and streaming events
32
-
-**Infrastructure Complexity**: Deploying and scaling data processing workloads
33
-
-**Observability Gaps**: Tracking pipeline health and data lineage
34
-
-**Team Coordination**: Coordinating between data engineering and platform teams
35
-
-**Resilience and Lifecycle Management**: Ensuring each pipeline step is connected in a clear lifecycle—if one step fails, others don't attempt to run, preventing cascading errors and maintaining robust execution
36
-
37
-
### 💡 The Solution
38
-
39
-
Pipeline Forge provides a Kubernetes-native platform that:
40
-
41
-
-**Declaratively Orchestrates** data pipelines using Custom Resource Definitions (CRDs)
42
-
-**Flexible Ingestion** supports both event-driven (e.g., GCS file drops, Pub/Sub messages, BigQuery updates) and scheduled (CronJob-based) pipeline execution
43
-
-**Clear Separation** between data ingestion and transformation phases
44
-
-**Built-in Observability** with comprehensive status tracking and monitoring
45
-
-**GitOps Ready** configuration that fits modern deployment practices
46
-
47
-
## ✨ Key Features
48
-
49
-
-**Unified Pipeline Lifecycle**: Connect ingestion with staging models in a single application lifecycle - if ingestion fails, the entire staging fails, preventing orphaned transformations
50
-
-**Native Kubernetes Resources**: Each step runs on 100% native K8s resources (Transform → Job, Ingest → CronJob/Job/Trigger)
51
-
-**Event-Driven Orchestration**: React to file drops, Pub/Sub messages, and BigQuery updates with intelligent retry policies
52
-
-**Built-in Observability**: Comprehensive status tracking with detailed execution history and failure analysis
53
-
-**Flexible Ingestion**: Reference existing CronJobs during Ingestion or create new ones as needed with full type safety and managed by the operator
54
-
-**Custom Image Support**: Use your own image for each step, or use pre-built Docker images from the Pipeline Forge repository
55
-
56
-
### ⚡ Event-Driven Orchestration
57
-
58
-
-**GCS Triggers**: Monitor bucket changes and trigger pipelines
59
-
-**Pub/Sub Triggers**: React to real-time messages with optional filtering
60
-
-**BigQuery Triggers**: Watch for table updates and data freshness
61
-
-**Retry & Cooldown**: Configurable retry policies with intelligent intervals
62
-
63
-
### ☸️ Kubernetes-Native
64
-
65
-
-**CRD-Based**: Native Kubernetes resources for pipeline definition
66
-
-**RBAC Integration**: Fine-grained access control for teams
67
-
-**Resource Management**: CPU, memory, and storage allocation
68
-
-**Independent Scaling**: Each step scales independently as native K8s resources
69
-
70
-
### 📊 Built-in Observability
71
-
72
-
-**Rich Status Tracking**: Comprehensive pipeline health monitoring with detailed execution history
A complete solution for orchestrating data pipelines in Kubernetes environments. Combines a powerful Kubernetes operator with specialized workloads to provide a declarative, event-driven approach to data pipeline management.
This directory will contain the code and Dockerfile for Pipeline Forge workloads.
3
+
This directory contains the data processing components for Pipeline Forge. All workloads are packaged as Docker images for easy deployment and scaling.
4
+
5
+
## Available Workloads
6
+
7
+
-**[Ingest](./ingest/README.md)** - Data ingestion from databases to BigQuery (Python)
8
+
-**[Transform](./transform/README.md)** - dbt-based data transformation
0 commit comments