Skip to content

Ayush-Vish/CloudSync

Repository files navigation

☁️ CloudSync: A Distributed Storage Cloud

alt text

CloudSync is a distributed cloud storage solution that enables users to lend their unused storage space, creating a decentralized, scalable, and secure storage network.

It is designed as a distributed file system for large-scale, data-intensive applications — offering fault tolerance, high aggregate performance, and metadata efficiency.


📖 Project Overview

CloudSync is composed of two primary components:

🧠 Master Node (Go)

The control plane of the entire system.

  • Responsible for file system metadata (Namespace, Chunk mappings).
  • Handles agent health monitoring, load balancing, and access control.
  • Crucial: It acts as a coordinator only. It does not process file data, preventing network bottlenecks.

💾 Agent Node (Go Client)

The storage engine running on user machines.

  • Hybrid Server: Runs both a gRPC Server (for internal pipeline) and an HTTP Server (for direct browser uploads).
  • Manages physical storage of chunks on local disk.
  • Performs data pipelining to replicate data to other agents.

🏗️ System Architecture

The architecture utilizes a Hybrid Protocol Approach to maximize performance and browser compatibility.

  1. Control Plane (gRPC/HTTP): The Master keeps metadata in RAM and logs to disk.
  2. Data Plane (HTTP): Browsers upload directly to Agents (Signed URLs), bypassing the Master.
  3. Replication Plane (gRPC): Agents pipeline data to other Agents using high-performance streams.
graph TD
    subgraph Master Node
        A[Control Plane]
        B{In-Memory Metadata}
        C[(Operation Log)]
        D[(PostgreSQL DB)]
    end

    subgraph User
        E[Browser / Client]
    end
    
    subgraph Storage Cluster
        F[Agent 1 <br> (HTTP + gRPC)]
        G[Agent 2 <br> (gRPC)]
        H[Agent N <br> (gRPC)]
    end

    %% Flow
    E -- 1. Request Upload (HTTP) --> A
    A -- 2. Return Target Agent IP + Token --> E
    
    E -- 3. Direct Upload (HTTP/REST) --> F
    F -- 4. Pipeline Replication (gRPC) --> G
    G -- 4. Pipeline Replication (gRPC) --> H

    %% Internal
    A -- Manages Identity --> D
    B -- Persists to --> C
    F -.-> |Heartbeat| A
    G -.-> |Heartbeat| A
    H -.-> |Heartbeat| A
Loading

Key Features

  • Zero-Bottleneck Transfers: Clients upload directly to Storage Agents via HTTP. The Master Node never touches the data payload, allowing infinite horizontal scaling.

  • 🚀 Hybrid Protocol Stack:

    • HTTP for universal browser compatibility.
    • gRPC for high-speed, low-latency internal communication between nodes.
  • 🧠 High-Performance Metadata: Master node stores all file system structure in RAM (GFS-style) for millisecond-latency lookups.

  • 🛡️ Fault Tolerance: Data is pipelined to multiple agents immediately. If one agent fails, the Master detects it via heartbeats and triggers auto-replication.

  • 💰 Incentive System: Agents are tracked in PostgreSQL, laying the foundation for a marketplace where users earn credits for sharing storage.


🧰 Technology Stack

Layer Technology
Master Node Go (Gin + gRPC)
Agent Node Go (Native HTTP + gRPC Server)
Metadata Store In-memory store + Operation Log + Checkpointing
Identity DB PostgreSQL
Communication Hybrid (HTTP/1.1 for Clients, gRPC for Cluster)
Infrastructure Docker, Docker Compose

🗺️ Project Roadmap

Week 1–3: Core Design & Authentication

  • ✅ Research GFS and distributed architectures
  • ✅ Design modular in-memory metadata system
  • ✅ Implement Agent Registration (JWT Auth & MAC Address check)
  • ✅ Implement Heartbeat mechanism

Week 4: Agent MVP

  • ✅ Build CLI with interactive setup (survey)
  • ✅ Implement basic gRPC connectivity
  • ✅ Implement Self-Update/Install service

Week 5: In-Memory Metadata Engine

  • ☐ Implement metadata.Manager (In-Memory Store)
  • ☐ Implement OperationLog (Append-only persistence)
  • ☐ Implement Checkpoint system for crash recovery
  • ☐ Wire Metadata Engine into Master

Week 6: Hybrid Upload Workflow

  • Master: Implement InitiateUpload (Agent Selection logic)
  • Agent: Implement HTTP Server for direct browser uploads
  • Agent: Implement gRPC Stream for Agent-to-Agent pipelining
  • Client: Create test harness for Direct HTTP Uploads

Week 7: Fault Tolerance

  • ☐ Detect dead agents via Heartbeat timeouts
  • ☐ Implement "ReplicateChunk" RPC (Master orders Agent A to copy to Agent B)
  • ☐ Add checksum verification (SHA-256) on storage

Week 8: Simulation & Polish

  • ☐ Docker Compose for 100-node simulation
  • ☐ Load testing (Network saturation tests)
  • ☐ Final Documentation

🚀 Getting Started

Prerequisites

  • Go 1.24+
  • Docker & Docker Compose
  • PostgreSQL 14+

1. Start the Master Node & Database

make run-master

This spins up the Coordinator and the Identity Database.

2. Register a New Agent

In a separate terminal:

make run-agent ARGS="register"

Follow the interactive prompt to set your storage path and quota.

3. Start the Agent

make run-agent ARGS="start"

The Agent will start two servers:

  1. gRPC (Port 50052): For communicating with Master and other Agents.
  2. HTTP (Port 8080): For accepting file uploads from Browsers.

🧩 Summary

Component Role Protocol
Master The Brain. Decisions, Metadata, Health. gRPC (Internal), HTTP (API)
Agent The Muscle. Storage, Replication. gRPC (Pipeline), HTTP (Uploads)
Browser The User. Uploads/Downloads. HTTP (REST)

About

CloudSync is a distributed Cloud for Storage.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors