Skip to content

This is a full-stack, real-time distributed data processing platform designed to ingest, process, store, and visualize high-volume event streams at production scale.

License

Notifications You must be signed in to change notification settings

Quantum-Fiend/Stream_Forge

Repository files navigation

⚡ StreamForge

Real-Time Distributed Data Processing Platform

Scala Java Kotlin Flutter Docker

Mini Spark + Flink + Firebase — Built for Production Scale

FeaturesArchitectureQuick StartDashboardAPIDocumentation


divider

🎯 What is StreamForge?

StreamForge is a production-ready, fault-tolerant distributed data processing platform that handles high-volume event streams with:

🚀 Real-Time Processing

  • 10K+ events/second throughput
  • Sub-20ms processing latency
  • Windowed aggregations
  • Stateful computations

🛡️ Fault Tolerance

  • Exactly-once semantics
  • Automatic checkpointing
  • Instant recovery (<2s)
  • Zero data loss guarantee

✨ Features

📊 Custom DSL

stream("events")
  .filter(_.type == "click")
  .window(sliding(5.min))
  .aggregate(sum("value"))
  .sink("dashboard")

⏱️ Windowing

  • Sliding Windows
  • Tumbling Windows
  • Session Windows
  • Count Windows

🔄 Operators

  • Map / Filter
  • Aggregate
  • Join
  • Stateful

📈 Dashboard

  • Real-time charts
  • Cluster health
  • Job management
  • Live metrics

🏗 Architecture

                                    ┌─────────────────────────────────────────────────────────┐
                                    │                   DATA SOURCES                          │
                                    │            IoT • Logs • Web Events • APIs               │
                                    └─────────────────────────┬───────────────────────────────┘
                                                              │
                                                              ▼
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                                         │
│    ┌───────────────────────────────────────────────────────────────────────────────────────────────┐    │
│    │                        🔵 KOTLIN API LAYER (Port 8080)                                       │     │
│    │                                                                                               │    │
│    │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                    │    │
│    │   │  REST API   │    │   gRPC      │    │  WebSocket  │    │  Scheduler  │                    │    │
│    │   │   (Ktor)    │    │  Service    │    │   Metrics   │    │  (Quartz)   │                    │    │
│    │   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘                    │    │
│    │                              Coroutines • Backpressure • Flow Control                         │    │
│    └───────────────────────────────────────────────────────────────────────────────────────────────┘    │
│                                                              │                                          │
│                                                              ▼                                          │
│    ┌───────────────────────────────────────────────────────────────────────────────────────────────┐    │
│    │                        🔴 SCALA PROCESSING ENGINE                                             │    │
│    │                                                                                               │    │
│    │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                    │    │ 
│    │   │ Custom DSL  │───▶│ Stream Graph│───▶│  Operators  │───▶│  Windowing  │                  │    │
│    │   │   Parser    │    │   Builder   │    │ Map/Filter  │    │  Aggregate  │                    │    │
│    │   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘                    │    │
│    │                              Akka Streams • Reactive • Backpressure                           │    │
│    └───────────────────────────────────────────────────────────────────────────────────────────────┘    │
│                                                              │                                          │
│                                                              ▼                                          │
│    ┌───────────────────────────────────────────────────────────────────────────────────────────────┐    │
│    │                        🟢 JAVA INFRASTRUCTURE LAYER                                           │   │
│    │                                                                                               │    │
│    │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                    │    │ 
│    │   │ Checkpoint  │    │   State     │    │  Cluster    │    │  Recovery   │                    │    │
│    │   │  Manager    │    │   Store     │    │ Coordinator │    │  Manager    │                    │    │
│    │   │ (RocksDB)   │    │ (RocksDB)   │    │ (ZooKeeper) │    │  (Health)   │                    │    │
│    │   └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘                    │    │
│    │                          Exactly-Once • Fault Tolerance • High Availability                   │    │
│    └───────────────────────────────────────────────────────────────────────────────────────────────┘    │
│                                                                                                         │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────┘
                                                              │
                                                              ▼
                                    ┌─────────────────────────────────────────────────────────┐
                                    │                   🟡 FLUTTER DASHBOARD                  │
                                    │                      (Port 3000)                        │
                                    │                                                         │
                                    │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
                                    │   │   Cluster   │  │     Job     │  │     Job     │     │
                                    │   │   Health    │  │   Metrics   │  │  Management │     │
                                    │   └─────────────┘  └─────────────┘  └─────────────┘     │
                                    │                                                         │
                                    │          Real-Time Charts • WebSocket • Material 3      │
                                    └─────────────────────────────────────────────────────────┘

🎨 Dashboard Preview

📊 Cluster Health

┌────────────────────────────┐
│  🟢 Active Nodes: 3        │
│  📊 Running Jobs: 5        │
│  ⚡ Events/sec: 1,250      │
├────────────────────────────┤
│  CPU Usage                 │
│  ████████████░░░░░ 65%     │
├────────────────────────────┤
│  Memory Usage              │
│  ██████████████░░░ 72%     │
├────────────────────────────┤
│  Throughput (events/s)     │
│  📈 ▁▂▄▆█▇▅▆▇█▆▇█▅▆     │
└────────────────────────────┘

📈 Job Metrics

┌────────────────────────────┐
│  Click Stream Analysis     │
│  ● RUNNING                 │
├────────────────────────────┤
│  Events: 125,000           │
│  Rate: 850/s               │
│  Latency: 12ms             │
├────────────────────────────┤
│  Throughput                │
│  📈 ▂▃▅▆▇█▇▆▇█▇▅▆█         
├────────────────────────────┤
│  Latency (ms)              │
│  📉 ▅▄▃▂▁▂▃▂▁▂▃     │
└────────────────────────────┘

⚙️ Job Management

┌────────────────────────────┐
│  [+ New Job]               │
├────────────────────────────┤
│  ● Click Analysis          │
│    STREAMING │ 2h ago      │
│    [Pause] [Details]       │
├────────────────────────────┤
│  ● User Activity           │
│    STREAMING │ 1h ago      │
│    [Pause] [Details]       │
├────────────────────────────┤
│  ○ Transaction Monitor     │
│    PAUSED │ 30m ago        │
│    [Resume] [Cancel]       │
└────────────────────────────┘

🚀 Quick Start

Prerequisites

  • JDK 17+ — For Scala, Java, Kotlin
  • Flutter 3.0+ — For dashboard
  • Docker — For containerized deployment

Run with Docker (Recommended)

# Clone the repository
git clone https://github.com/yourusername/streamforge.git
cd streamforge

# Start all services
docker-compose up -d

# Access
# API:       http://localhost:8080
# Dashboard: http://localhost:3000

Build from Source

# Build backend
./gradlew build

# Run API server
./gradlew :api-kotlin:run

# Run dashboard (separate terminal)
cd dashboard-flutter
flutter run -d web-server --web-port 3000

📡 API

Submit a Job

curl -X POST http://localhost:8080/api/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "id": "click-analysis",
    "name": "Click Stream Analysis",
    "type": "STREAMING",
    "config": {
      "source": "events",
      "sink": "dashboard",
      "parallelism": 4,
      "checkpointInterval": 60000
    }
  }'

Ingest Events

curl -X POST http://localhost:8080/api/ingest/event \
  -H "Content-Type: application/json" \
  -d '{
    "id": "evt-001",
    "type": "click",
    "timestamp": "2025-12-22T20:00:00Z",
    "source": "web-app",
    "payload": {"page": "/products", "userId": "user-123"}
  }'

Monitor Status

curl http://localhost:8080/api/jobs/click-analysis

🧩 Language Stack


Scala
Processing Engine
DSL • Windowing • Operators

Java
Infrastructure
Checkpoints • State • Recovery

Kotlin
API Layer
REST • WebSocket • Scheduler

Flutter
Dashboard
Charts • Monitoring • Control

📊 Performance

Metric Value
Throughput 10,000+ events/second per node
Latency < 20ms average processing
Recovery < 2 seconds from checkpoint
Scalability Horizontal scaling with ZooKeeper

📁 Project Structure

streamforge/
├── 📄 README.md, LICENSE, CONTRIBUTING.md, CHANGELOG.md
├── 🐳 docker-compose.yml, Dockerfile.api, Dockerfile.engine
│
├── 🔴 engine-scala/           # Scala Processing Engine
│   └── dsl/, window/, operators/, StreamProcessor, BatchProcessor
│
├── 🟢 infrastructure-java/    # Java Infrastructure
│   └── checkpoint/, state/, cluster/, recovery/
│
├── 🔵 api-kotlin/             # Kotlin API Layer
│   └── Application, JobAPI, IngestionAPI, MetricsAPI, Scheduler
│
├── 🟡 dashboard-flutter/      # Flutter Dashboard
│   └── pages/, services/, Dockerfile, nginx.conf
│
├── 📚 docs/                   # Documentation
│   └── ARCHITECTURE.md, API.md, DEPLOYMENT.md
│
└── 📝 examples/               # Examples
    └── example_pipeline.scala, sample_job.json

📚 Documentation

Document Description
Architecture System design, data flow, components
API Reference REST endpoints, WebSocket, models
Deployment Docker, Kubernetes, production setup

🌟 Why StreamForge?

Production Ready Fault-tolerant with exactly-once semantics
Modern Stack Multi-language architecture optimized for each layer
Beautiful Dashboard Real-time visualization with Flutter
Developer Friendly Declarative DSL, comprehensive APIs
Big-Tech Patterns Implements designs from Spark, Flink, Samza

📜 License

This project is licensed under the MIT License — see LICENSE for details.


Built with 💜 by Tushar 💫

GitHub stars GitHub forks

⬆ Back to Top

About

This is a full-stack, real-time distributed data processing platform designed to ingest, process, store, and visualize high-volume event streams at production scale.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published