CloudSync is a distributed cloud storage solution that enables users to lend their unused storage space, creating a decentralized, scalable, and secure storage network.
It is designed as a distributed file system for large-scale, data-intensive applications — offering fault tolerance, high aggregate performance, and metadata efficiency.
CloudSync is composed of two primary components:
The control plane of the entire system.
- Responsible for file system metadata (Namespace, Chunk mappings).
- Handles agent health monitoring, load balancing, and access control.
- Crucial: It acts as a coordinator only. It does not process file data, preventing network bottlenecks.
The storage engine running on user machines.
- Hybrid Server: Runs both a gRPC Server (for internal pipeline) and an HTTP Server (for direct browser uploads).
- Manages physical storage of chunks on local disk.
- Performs data pipelining to replicate data to other agents.
The architecture utilizes a Hybrid Protocol Approach to maximize performance and browser compatibility.
- Control Plane (gRPC/HTTP): The Master keeps metadata in RAM and logs to disk.
- Data Plane (HTTP): Browsers upload directly to Agents (Signed URLs), bypassing the Master.
- Replication Plane (gRPC): Agents pipeline data to other Agents using high-performance streams.
graph TD
subgraph Master Node
A[Control Plane]
B{In-Memory Metadata}
C[(Operation Log)]
D[(PostgreSQL DB)]
end
subgraph User
E[Browser / Client]
end
subgraph Storage Cluster
F[Agent 1 <br> (HTTP + gRPC)]
G[Agent 2 <br> (gRPC)]
H[Agent N <br> (gRPC)]
end
%% Flow
E -- 1. Request Upload (HTTP) --> A
A -- 2. Return Target Agent IP + Token --> E
E -- 3. Direct Upload (HTTP/REST) --> F
F -- 4. Pipeline Replication (gRPC) --> G
G -- 4. Pipeline Replication (gRPC) --> H
%% Internal
A -- Manages Identity --> D
B -- Persists to --> C
F -.-> |Heartbeat| A
G -.-> |Heartbeat| A
H -.-> |Heartbeat| A
-
⚡ Zero-Bottleneck Transfers: Clients upload directly to Storage Agents via HTTP. The Master Node never touches the data payload, allowing infinite horizontal scaling.
-
🚀 Hybrid Protocol Stack:
- HTTP for universal browser compatibility.
- gRPC for high-speed, low-latency internal communication between nodes.
-
🧠 High-Performance Metadata: Master node stores all file system structure in RAM (GFS-style) for millisecond-latency lookups.
-
🛡️ Fault Tolerance: Data is pipelined to multiple agents immediately. If one agent fails, the Master detects it via heartbeats and triggers auto-replication.
-
💰 Incentive System: Agents are tracked in PostgreSQL, laying the foundation for a marketplace where users earn credits for sharing storage.
| Layer | Technology |
|---|---|
| Master Node | Go (Gin + gRPC) |
| Agent Node | Go (Native HTTP + gRPC Server) |
| Metadata Store | In-memory store + Operation Log + Checkpointing |
| Identity DB | PostgreSQL |
| Communication | Hybrid (HTTP/1.1 for Clients, gRPC for Cluster) |
| Infrastructure | Docker, Docker Compose |
- ✅ Research GFS and distributed architectures
- ✅ Design modular in-memory metadata system
- ✅ Implement Agent Registration (JWT Auth & MAC Address check)
- ✅ Implement Heartbeat mechanism
- ✅ Build CLI with interactive setup (
survey) - ✅ Implement basic gRPC connectivity
- ✅ Implement Self-Update/Install service
- ☐ Implement
metadata.Manager(In-Memory Store) - ☐ Implement
OperationLog(Append-only persistence) - ☐ Implement
Checkpointsystem for crash recovery - ☐ Wire Metadata Engine into Master
- ☐ Master: Implement
InitiateUpload(Agent Selection logic) - ☐ Agent: Implement HTTP Server for direct browser uploads
- ☐ Agent: Implement gRPC Stream for Agent-to-Agent pipelining
- ☐ Client: Create test harness for Direct HTTP Uploads
- ☐ Detect dead agents via Heartbeat timeouts
- ☐ Implement "ReplicateChunk" RPC (Master orders Agent A to copy to Agent B)
- ☐ Add checksum verification (SHA-256) on storage
- ☐ Docker Compose for 100-node simulation
- ☐ Load testing (Network saturation tests)
- ☐ Final Documentation
- Go 1.24+
- Docker & Docker Compose
- PostgreSQL 14+
make run-masterThis spins up the Coordinator and the Identity Database.
In a separate terminal:
make run-agent ARGS="register"Follow the interactive prompt to set your storage path and quota.
make run-agent ARGS="start"The Agent will start two servers:
- gRPC (Port 50052): For communicating with Master and other Agents.
- HTTP (Port 8080): For accepting file uploads from Browsers.
| Component | Role | Protocol |
|---|---|---|
| Master | The Brain. Decisions, Metadata, Health. | gRPC (Internal), HTTP (API) |
| Agent | The Muscle. Storage, Replication. | gRPC (Pipeline), HTTP (Uploads) |
| Browser | The User. Uploads/Downloads. | HTTP (REST) |
