Google Drive is a cloud-based file storage and synchronization service that allows users to store, access, and share files from various devices. This chapter discusses designing a scalable system with the following features:
- File Upload and Download
- File Sync Across Devices
- File Sharing
- File Revision History
- Notifications for Edits, Deletes, and Shares
- Upload and download files.
- Sync files across multiple devices.
- Maintain file revisions.
- Enable file sharing with permissions.
- Send notifications on file edits, deletions, and shares.
- Reliability: Data loss is unacceptable.
- Fast Sync Speed: Avoid user impatience with delayed syncing.
- Bandwidth Efficiency: Minimize unnecessary data usage.
- Scalability: Handle 10 million daily active users (DAU).
- High Availability: Operate seamlessly during server failures or network issues.
- Users get 10 GB free space.
- Maximum file size: 10 GB.
- Average file upload size: 500 KB.
- Upload frequency: 2 files per day per user.
- Total storage required: 500 PB.
A basic setup includes:
- Web Server: Handles uploads and downloads.
- Metadata Database: to keep track of metadata like user data, login info, files info/
- Storage Directory: Holds files organized by namespaces.
- A web server and a directory called drive/ is set up as the root directory to store uploaded files.
- Under drive/ directory, there is a list of directories called namespaces.
- Each namespace contains all the uploaded files for that user.
- Each file or folder can be uniquely identified by joining the namespace and the relative path.
This design serves as a starting point but is inadequate for scaling.
- Upload a file to Google Drive: Two types of uploads are supported
- Simple upload: Used when file size is small.
- Resumable upload:
- Endpoint: https://api.example.com/files/upload?uploadType=resumable
- Send the initial request to retrieve the resumable URL.
- Upload the data and monitor upload state
- If upload is disturbed, resume the upload.
- Download a file from Google Drive: To download a file
- Endpoint: https://api.example.com/files/download
- Get file revisions:
-
Sharding: Split storage across servers based on
user_id. -
Amazon S3: Use S3 for scalable and redundant file storage with cross-region replication.
-
Load Balancer: Distribute traffic across multiple web servers.
-
Metadata Database Replication: Ensure availability through database sharding and replication.
For a large storage system like Google Drive, sync conflicts happen from time to time. When two users modify the same file or folder at the same time, a conflict happens.
- In the example user 1 and user 2 tries to update the same file at the same time, but user 1’s file is processed by our system first.
- User 1’s update operation goes through, but, user 2 gets a sync conflict.
- The system presents both copies of the same file: user 2’s local copy and the latest version from the server.
- User 2 has the option to merge both files or override one version with the other.
-
User Interaction:: Users access the application via browser or mobile app.
-
Block Servers:
- Files are split into 4 MB blocks (maximum size) and assigned unique hash values.
- Blocks are stored independently in cloud storage (e.g., Amazon S3).
- File reconstruction involves joining blocks in a specific order.
-
Cloud Storage: Blocks are stored in cloud storage for scalability and redundancy.
-
Cold Storage: Inactive files are moved to cold storage to reduce costs.
-
Load Balancer: Distributes requests evenly among API servers to ensure efficient operation.
-
API Servers:
- Handle user authentication, profile management, and file metadata updates.
- Manage all non-uploading workflows.
-
Metadata Database and Cache:
- Stores metadata for users, files, blocks, and versions.
- Frequently accessed metadata is cached for faster retrieval.
-
Notification Service:
- A publisher/subscriber system that notifies clients about file changes (add, edit, delete).
- Ensures clients can pull the latest updates.
-
Offline Backup Queue: Temporarily stores file change information for offline clients to sync when back online.
A highly simplified is shown below version as it only includes the most important tables and fields.
- User Table: Stores user profiles and preferences.
- File Table: Maintains file metadata (e.g., size, name, path).
- Block Table: Tracks file blocks for reconstructing files.
- File Version Table: Stores file revision history.
- File Upload:
- File is split into blocks, compressed, and encrypted by the block server.
- Blocks are uploaded to block servers and stored in S3.
- Metadata Upload:
- Client sends metadata to the API server.
- Metadata is stored in the database with status
pending.
- Completion:
- S3 triggers a callback to update the file status to
uploaded. - Notification service informs relevant users.
- S3 triggers a callback to update the file status to
-
Delta Sync: Transfer only modified blocks instead of the entire file.
-
Compression: Blocks are compressed using compression algorithms depending on file types.
-
Conflict Resolution:
- First processed version wins.
- Conflicting versions are saved separately for user resolution.
Download flow is triggered when a file is added or edited elsewhere. There are two ways a client can know:
- If client A is online while a file is changed by another client, notification service will inform client A.
- If client A is offline while a file is changed by another client, data will be saved to the cache. When the offline client is online again, it pulls the latest changes.
Once a client knows a file is changed, it first requests metadata via API servers, then downloads blocks to construct the file.
- Trigger: Notification service informs the client of file updates.
- Metadata Fetch: Client retrieves updated metadata via API.
- Block Download: Client downloads updated blocks from block servers and reconstructs the file.
- Purpose: Keeps clients updated about file changes.
- Mechanism: Implements long polling for asynchronous notifications.
- Example: When a file is added, edited, or deleted, notifications are pushed to all relevant clients.
- De-duplication: Remove duplicate blocks at the account level using hash-based comparisons.
- Versioning Strategy:
- Limit the number of saved revisions.
- Prioritize recent versions for frequently edited files.
- Cold Storage: Move rarely accessed files to cheaper storage solutions (e.g., Amazon S3 Glacier).
- Load Balancer Failure: Secondary load balancer becomes active.
- Block Server Failure: Pending tasks are reassigned to other servers.
- Metadata Database Failure:
- Promote a slave node to master.
- Redirect traffic to remaining replicas.
- Cloud Storage Failure: Use cross-region replication to fetch unavailable files.
- Notification Service Failure: Clients reconnect to alternative servers.







