YOUTUBE SYSTEM DESIGN

DECIDING REQUIREMENTS
- 1.Functional Requirements
- 2.Non Functional Requirements
CAPACITY ESTIMATION
API DESIGN
- 8.API Design-Upload Content
- 9.API Design-Stream Content
HIGH LEVEL DESIGN
DEEP DIVE INSIGHTS

DECIDING REQUIREMENTS:

Functional Requirements

There will be two types of users:

Viewers: One who consumes the content.
Content Creators: One who creates the content.

Functional Requirements for Viewers

Requirement	Description
Streaming	User can stream any TV show or movie.
Device Compatibility	The application should be supported on all devices.

Functional Requirements for Content Creators

Requirement	Description
Upload Content	Content Creators should be able to upload movies/TV shows. After successful upload, they should get a notification.

Non-Functional-Requirements

For Viewers

Requirement	Description
Low Latency	When a user watches a movie or TV show, the experience should be smooth. Any buffering or lagging will spoil the experience.
Scalability	The platform should be able to handle a large number of users streaming many movies/TV shows simultaneously.
User Experience	Users should get the best possible quality while streaming. For example, if a user's internet is strong, they shouldn't receive poor quality (240p) streaming.
Availability	The system should be highly available with 99.9999% uptime.

For Content Creators

Requirement	Description
Scalability	The platform should be able to handle many content creators uploading numerous movies/TV shows at the same time.
Security	The platform should ensure the security of the content uploaded by creators to prevent unauthorized access or piracy.
Storage Reliability	The platform should provide reliable storage to ensure the uploaded content is safely stored and remains available without disappearing.

CAPACITY ESTIMATION:

DAU-MAU

How many users will be using your software?

Daily Active Users (DAU): 100 million
Monthly Active Users (MAU): 2.5 billion

Throughput

1. Write Requests

Write Operation:

Uploading videos is the only way to write (add data) to the system.

Assumption:

Most users are viewers rather than creators. We assume that 1 out of 250 users uploads a video daily.

Calculation:

Total Daily Active Users (DAU): 100 million
Fraction of Users Uploading Videos: (1/250)
Write Requests per Day:

(1/250)times 100,000,000 = 0.4 million write requests per day

2. Read Requests

Read Operation:

Watching videos is the primary way to read data from the system.

Assumption:

An average user watches 10 videos per day.

Calculation:

Total Daily Active Users (DAU): 100 million
Videos Watched per User: 10
Read Requests per Day:

100,000,000 * 10 = 1billion read requests per day

Summary

Operation	Calculation	Result
Writes	(1/250) × 100 million	0.4 million requests/day
Reads	100 million × 10	1 billion requests/day

Storage

Video Data

Assumptions

Average Size of a Video: 600 MB
Daily Uploads (from throughput estimation): 0.4 million requests per day

Storage Calculations

Daily Storage Requirement:
600 MB × 0.4 million requests/day = 240 TB/day
10-Year Storage Requirement:
240 TB/day × 365 days × 10 years = 876 PB

Summary

Metric	Calculation	Result
Daily Storage	600 MB × 0.4 million requests/day	240 TB
10-Year Storage	240 TB/day × 365 days × 10 years	876 PB

Memory

Overview

By memory, we refer to the cache memory size required for faster data access.

Why Cache Memory?

Accessing data directly from the database takes time. To speed up data retrieval, cache memory is used.

Cache Memory Requirement Calculation

Daily Storage Requirement: 240 TB/day
Cache Requirement (1% of Daily Storage):
0.01 × 240 TB = 2.4 TB/day

Scalability

The memory size should scale as the system grows to accommodate increasing storage and data access demands.

Network and Bandwidth Estimation

Overview

Network/Bandwidth estimation helps us determine the amount of data flowing in and out of the system per second.

Data Flow Estimations

Ingress (Data Flow Into the System)

Data Stored per Day: 240 TB/day
Calculation:
240 TB ÷ (24 × 60 × 60) = 2.7 GB/s
Result: Incoming Data Flow = 2.7 GB/s

Egress (Data Flow Out of the System)

Total Read Requests per Day: 1 billion
Average Video Size: 600 MB
Daily Outgoing Data:
1 billion × 600 MB = 600 PB/day
Calculation:
600 PB ÷ (24 × 60 × 60) = 7 TB/s
Result: Outgoing Data Flow = 7 TB/s

Summary

Type	Calculation	Result
Ingress (Data Flow In)	240 TB ÷ (24 × 60 × 60)	2.7 GB/s
Egress (Data Flow Out)	600 PB ÷ (24 × 60 × 60)	7 TB/s

API DESIGN:

API-DESIGN Upload Content

Let's understand how a client (content creator) uploads content to YouTube and what APIs are involved.

Since videos can be very large (could be 10 minutes or even 2 hours), it's not feasible to upload a video in a single request. Multiple requests are sent to upload small chunks of the video.

Additionally, there is video metadata associated with the video, such as the video title, creator ID, format, etc.

When we press the upload button, initially a request is sent to add the metadata associated with the video to the server. Here are the details of the initial request.

Initial Request (Add Video Metadata)

HTTP Method

This tells the server what action to perform. Since we want to create something new on the server (metadata for the new video), we use the POST action.

Endpoint

This tells the server where to perform that action. Since we are creating video metadata, we use the /v1/videos endpoint of the server.

Note

Since we also need to upload the actual video after the metadata, we include uploadType=resumable in our endpoint. This indicates that we are uploading a large media file chunk by chunk. It is useful because if the connection drops while uploading this large file, we should be able to resume the upload from that point. In summary, uploadType=resumable tells the server to provide a "resumable" URL back that we can use to upload as well as resume the upload, if needed.

HTTP Body

We tell the server to create metadata for the video, but we haven't provided the details of the metadata yet. This information is sent in the request body.

{
  "title": "Your video title",
  "format": "The format of your video"
  // ...
}

In response to this request, YouTube's server provides us with a resumable URL. We can use this URL to upload the video, and if the connection drops, we can resume the upload from that point. This resumable URL is called a session.

When we upload the actual video, we use this same session URL. Here are the details of the actual upload request:

Final Request (Upload Video)

HTTP Method

This tells the server what action to perform. We use the PUT action to upload the actual video data to the given session URL.

Endpoint

This tells the server where to perform that action. We use the session URI provided by the server in the response to the first request. The uploadId in the URI helps the server identify which video's session it is, even when resuming the upload.

HTTP Body

The body of the PUT request contains the binary data of the video file.

API-DESIGN: Stream Content

Let's understand how a client streams content on YouTube and what APIs are involved.

Since videos can be very large (could be 10 minutes or even 2 hours), it's not feasible to get the whole video in a single request. Multiple requests are sent to get small chunks of the video. These chunks are stored at different locations on the server.

To stream the entire video, the client needs to know these locations. All these locations are saved in a file called a manifest file. So when the client first requests to watch a video, the server sends the manifest file. Once the client gets the manifest file, it uses it to fetch the different chunks from the server (this is streaming).

Flow Overview

Here's how the flow looks:

Watch Request
When we open a video, the first request that goes to the server is a watch request, telling the server that the client wants to watch the video.
Server Response (Manifest File)
The server responds with the manifest file.
Fetching Chunks
The client uses the manifest file to fetch different chunks from the server to stream the video.

When we open a video, the first request that goes to the server is a watch request, telling the server that the client wants to watch the video. The server responds with the manifest file. Here are the details about the request.

Initial Watch Request

HTTP Method

This tells the server what action to perform. Since we want to watch (get) a video, we use the GET action.

Endpoint

This tells the server where to perform that action. We use the /v1/watch endpoint to tell the server we want to watch a video. The server sends the manifest file back.

Now the client has the manifest file which contains the locations of the video chunks. The client uses these locations to start streaming the video chunk by chunk. This is how the overall flow looks with the streaming request.

Streaming Request

The locations in the manifest file are actually the CDN server locations where the chunks are stored. Simply put, a CDN is a server that makes it easy to load large assets (like videos and media).

HTTP Method

This tells the server what action to perform. Since we are getting video chunks, we use the GET action.

Endpoint

This tells the server where to perform that action. Since we are getting the video chunks from respective locations, we use the locations (provided in the manifest) as the endpoi

HLS Protocol

We are using something called the HLS Protocol for getting video chunks from the CDN. HLS (HTTP Live Streaming) is a very popular protocol used for streaming.

Benefits of HLS

Adaptive Streaming: The quality of the video can adjust based on the user's internet speed.
- Fast internet: High-quality video chunks are streamed.
- Slow internet: Video quality drops to prevent buffering, ensuring uninterrupted playback.

HIGH LEVEL DESIGN:

HIGH-LEVEL-DESIGN :Upload Content

As we saw in the API design, the video upload process involves two main steps. The first request uploads the video metadata, and the server responds with a session URL.

High-Level Process

Let’s understand this with a high-level diagram. Refer to the steps in the diagram.

When the client clicks the upload button, the first request goes to the API Gateway.
The API Gateway handles incoming requests and routes them to the Content Upload Service via the Load Balancer.
The Content Upload Service adds video metadata (title, format, etc.) to the Videos DB.
The client receives a successful confirmation along with a session URL.

This session URL is then used to upload the actual video data in chunks. Let's see how that happens next.

HIGH LEVEL DESIGN: Upload Content (Continued)

Once we have the session URL, we start uploading the video using that URL. Here's how the flow works (refer to the steps in the diagram):

The client uses the session URL as an endpoint and sends a PUT request. This request contains the actual video content.
The API Gateway handles incoming requests and routes them to the Content Upload Service via the Load Balancer.
The Content Upload Service stores the video content in Object Storage.
When the entire video is uploaded to Object Storage, the Content Upload Service adds an event to the Message Queue. This event contains the video ID.
The Content Processor, also known as the Workflow Engine, pulls this event from the Message Queue for further processing. It's called a Workflow Engine because it runs a series of steps one after the other.
Workflow Operation

Step 1: The Content Processor uses the event to retrieve the video from Object Storage.
Step 2: Then, it breaks the large video file into smaller chunks.
Step 3: Each chunk is converted into different formats and different qualities/resolutions. For example, a 1-minute video is broken into 10 chunks (6 seconds each). Each chunk is then converted into various formats (e.g., MP4, MOV) and resolutions (e.g., 4K, 720P, 240P).

This process ensures the video can be accessed in different formats and qualities. We will discuss this workflow in detail in the next part.

The Content Processor uploads these smaller chunks (in different formats and qualities) to the CDN. Uploading to the CDN allows clients to access the videos faster.
Each chunk uploaded to the CDN has a location URL. The Content Processor saves these locations in the Videos DB.
Since the video processing is complete, the Content Processor adds an event to the Message Queue to signal that the video processing is done.
The Notification Service pulls this event from the Message Queue for further processing.
Finally, the Notification Service sends a notification to the client saying the video upload process is complete.

HIGH LEVEL DESIGN :Stream Content

As we discussed during the API design, video streaming involves three main steps:

The client sends a request to the server indicating that it wants to play a specific video.
The server responds with a manifest file. This file contains the locations of the small video chunks.
The client streams these small video chunks from the CDN using the HLS protocol.

When the client sends a request to the server indicating that it wants to play a specific video, the server returns back the manifest file. Additionally, it returns other necessary video metadata like the title, creator ID, description, etc.

The manifest file from the server looks something like this. It lists the locations of video chunks in different formats and qualities for that video. The client then uses this file to request the video chunks.

When the client requests video chunks from the CDN, the CDN delivers these chunks to the client via the HLS protocol.

Different devices support different video formats:

When a client requests MP4 format, it receives MP4 chunks.
When a client requests MOV format, it receives MOV chunks.

Adaptive Quality Based on Network Strength

Weak Connection: The client requests lower-quality chunks (e.g., 240P).
Strong Connection: The client requests higher-quality chunks (e.g., 4K).

A common scenario occurs when the internet connection fluctuates between strong and weak:

When the connection is strong, the client requests high-quality chunks.
When the connection weakens, the client requests lower-quality chunks.

This adaptive behavior ensures a smooth viewing experience. This process is known as Adaptive streaming, and the HLS protocol supports it very well. This is why HLS is widely used during video streaming.

HIGH LEVEL DESIGN: Content Processor Workflow Engine

Let's dive deeper into the Content Processor (Workflow Engine). As mentioned, this is a workflow engine that runs a series of steps. Here are the steps:

Content Chunker Service
Format Converter Service
Quality Converter Service
CDN Uploader Service

Content Chunker Service

As we saw earlier, in the 4th step, an event is added to the Message Queue with the video ID. The Content Chunker Service grabs this video using the ID and breaks it into smaller chunks. This is necessary because it breaks the video into smaller chunks that can be processed in parallel. Also, it will make streaming efficient.

Here are the 4 precise steps:

Retrieve the event from the Message Queue.
Use the video ID from the event to get the video from Object Storage and break it into smaller chunks.
Upload the chunks to Object Storage.
For each uploaded chunk, create an event with the chunk ID and add it to the Message Queue.

Format Converter Service

After chunking, the Format Converter Service converts these chunks into different formats. This is necessary because converting the chunks into different formats ensures compatibility with various devices and platforms.

Steps Involved

Retrieve the event from the Message Queue.
Use the chunk ID from the event to get that chunk from Object Storage.
Then it converts the chunk into different formats (e.g., MP4, MOV).
Upload the converted chunks to Object Storage.
For each uploaded chunk, create an event with the chunk ID and add it to the Message Queue.

Quality Converter Service

After converting to different formats, the Quality Converter Service converts these chunks into different quality levels. This is necessary because converting the chunks into different quality levels provides a smooth viewing experience as per the user's internet speed.

Example:

If the connection is weak, they see 240P.
If the connection is strong, they see 4K.

Steps Involved

Retrieve the event from the Message Queue.
Use the chunk ID from the event to get the chunk from Object Storage.
It then converts the chunk into different qualities/resolutions (e.g., 4K, 720P).
Upload the converted chunks to Object Storage.
Create an event for each uploaded chunk and add it to the Message Queue.
The event should include the chunk ID.

The steps above, when visualized, look like this:

CDN Uploader Service

Finally, the CDN Uploader Service comes into play.

Note: The numbers 6, 7, 8, and 9 correspond to the high-level design diagram covered in the previous part.

The CDN Uploader Service retrieves these resulting chunks (in different formats and qualities) from Object Storage and uploads them to the CDN. This is necessary because accessing videos from CDN is much faster.

Each chunk uploaded to the CDN has a location. The CDN Uploader Service also saves these locations in the Videos DB.

Ultimately, since the video processing is complete, the CDN Uploader Service adds an event to the Message Queue to signal that the video processing is done.

This is how the overall flow looks like:

DEEP DIVE INSIGHTS:

DEEP DIVE INSIGHTS: Database Selection

In order to decide the DB type, here are some general guidelines that you can follow. However, it’s not always black and white — a lot depends on the project needs.

Guideline	Recommendation
When you need fast data access	Prefer NoSQL
When the scale is too large	NoSQL performs better
When the data fits into a fixed structure	Prefer SQL
When the data doesn’t fit into a fixed structure	Choose NoSQL
If you have complex queries to execute on your data	Use SQL
If you have simpler queries	NoSQL works well
If your data changes frequently or will evolve over time	NoSQL supports flexible structure

Database Decision Table

Database	Deciding Factors	Decision
VideosDB	- High Scale: Millions of YouTube videos are uploaded and watched every day. Handling such a scale of writes and reads makes NoSQL preferable. - Fast Access: Low latency is required due to non-functional requirements. NoSQL suits better for high-scale, low-latency systems. - Simple Query Pattern: When a user clicks on a video, the server returns video metadata along with the manifest file. Reading video metadata by video ID is well-handled by NoSQL.	NoSQL

DEEP DIVE INSIGHTS: Data Modeling

VideosDB Schema

Attribute	Details
Database Type	NoSQL
Common Queries	Reading video metadata by `videoId`. This occurs when the user clicks on a video to watch the content, and the server returns back the video metadata along with the manifest file.
Indexing	`videoId`

Note:

Because we have this common query to grab video metadata by videoId, we create an index on the videoId field. This sets a shortcut to quickly find the data by videoId. |

DEEP DIVE INSIGHTS: HLS Encoding

Workflow Engine Steps: Our Content Processor consists of four main steps:

Content Chunking
Format Conversion
Quality Conversion
CDN Upload

We also learned about HLS and how it enables adaptive streaming, which allows streaming different formats for different devices and different qualities based on internet speed.

Encoding for HLS

Another thing to know about HLS: whenever we are streaming videos with HLS, the video chunks need to be encoded in a certain way.

What is Encoding?

Encoding simply means turning the video into a stream of 0s and 1s. There are different ways to do this, and each way creates a different version (version = different representation of 0s and 1s) of the same video.

For HLS to work, the video needs to be encoded using specific standards like H.264 or H.265.

Current Workflow Problem

But have we ever included encoding in our process? Our current workflow doesn't have an encoding step. So, how should we tackle this problem?

Solution

By introducing the encoding step in the workflow engine. Here’s how we can do it:

Content Chunking
Encoding: Encode each chunk in H.264/H.265 standard.
Format Conversion: Convert each encoded chunk into different formats (e.g., MP4, MOV).
Quality Conversion: Convert each format into different qualities (e.g., 4K, 720P).
CDN Upload: Upload the final chunks to the CDN.

The introduction of the encoding step ensures that our workflow aligns with what HLS needs, enabling adaptive streaming.

FilesExpand file tree

Youtube System Design.md

Latest commit

History

Youtube System Design.md

File metadata and controls

YOUTUBE SYSTEM DESIGN

DECIDING REQUIREMENTS:

Functional Requirements

Functional Requirements for Viewers

Functional Requirements for Content Creators

Non-Functional-Requirements

For Viewers

For Content Creators

CAPACITY ESTIMATION:

DAU-MAU

How many users will be using your software?

Throughput

1. Write Requests

Write Operation:

Assumption:

Calculation:

2. Read Requests

Read Operation:

Assumption:

Calculation:

Summary

Storage

Video Data

Assumptions

Storage Calculations

Summary

Memory

Overview

Why Cache Memory?

Cache Memory Requirement Calculation

Scalability

Network and Bandwidth Estimation

Overview

Data Flow Estimations

Ingress (Data Flow Into the System)

Egress (Data Flow Out of the System)

Summary

API DESIGN:

API-DESIGN Upload Content

Initial Request (Add Video Metadata)

HTTP Method

Endpoint

Note

HTTP Body

Final Request (Upload Video)

HTTP Method

Endpoint

HTTP Body

API-DESIGN: Stream Content

Flow Overview

Initial Watch Request

HTTP Method

Endpoint

Streaming Request

HTTP Method

Endpoint

HLS Protocol

Benefits of HLS

HIGH LEVEL DESIGN:

HIGH-LEVEL-DESIGN :Upload Content

High-Level Process

HIGH LEVEL DESIGN: Upload Content (Continued)

HIGH LEVEL DESIGN :Stream Content

Adaptive Quality Based on Network Strength

HIGH LEVEL DESIGN: Content Processor Workflow Engine

Content Chunker Service

Here are the 4 precise steps:

Format Converter Service

Steps Involved

Quality Converter Service

Steps Involved

CDN Uploader Service

DEEP DIVE INSIGHTS:

DEEP DIVE INSIGHTS: Database Selection