Design a URL Shortener
Definition of URL Shortener:
- User input with original URL(e.g. https://github.com/jjteoh-thewebdev/url-shortener/tree/main), the program need to output a short url(e.g. https://short.url/xyz).
- When user browse the short url, the program need to redirect user to the long url.
Purpose of short url:
- Cleaner and professional look when attach in email, posting on social media, printing on printed materials.
- Tracking & Analytics
- Generate a short url from long url(original url)
- Short Url should redirect to long url
- Set expiry date
- Set password protection - for simplicity, infinite attempts for incorrect password
- Simple click count tracker - don't have to track visitor's user agent, IP, device, event time and etc. As long as the short url is requested, we increase the count.
- Availability - should be up and runnning all the time
- Reliability - should redirect to long url as fast as possible(within seconds)
- Scalability
- Faul Tolerance
- User management - no sign up/log in required
- API Security - no API Key required
- Complex tracking and analytics (i.e visitor's user agent, IP, device, event time and etc)
- Lockdown after several password attempts failed.
(Most of it from Alex Xu's System Design Inteview Volume 1)
-
Short URL Length - as short as possible
-
Whitelist Characters - numbers(0-9), characcters(a-z, A-Z)
-
Update or Delete - no (since user management is out of scope)
-
Traffic Volume - 100 million URLs are generated per day
- Write operation: 100 million URLs are generated per day
- Write operation per second: 100 million / 24 / 3600 = 1160
- Read operation per second: read:write ratio 10:1, 1160 * 10 = 11,600
-
Storage Requirement - Software lifespan of 10 years: 100mil * 365 * 10 = 365 bil records
- assume avg url length = 100 bytes
- storage requirements for 10 years: 365 bil * 100 bytes * 10 years = 365 TB
We will need:
- POST /api/v1/urls/shorten - url shortening
- GET /{short-url} - url redirection
- POST /{short-url} - url redirection for password-protected route
Description
Accept long url and generate short url.
Request Body
{
"long_url": string,
"expiry": date, // optional
"password": string, // optional
"custom_url": string // optional
}Response
success case
// HTTP 200 OK
{
"error": null,
"data": {
"id": "66",
"short_url": "0000014",
"long_url": "https://very-long-url",
"visitor_count": "0",
"has_password": false,
"expired_at": null,
"created_at": "2025-03-28T09:29:24.721Z"
}
}failed case
{
"error": string,
"data": null
}
Description
Redirect to long url.
Path Params
short-url: stringResponse
success case
HTTP 302 Moved Temporarilynot found case
// HTTP 404 NOT FOUND
{
"error": "url not found.",
"data": null
}password-protected case
HTML with password prompt will be returnedexpired short url
// HTTP 400 BAD REQUEST
{
"error": "url expired.",
"data": null
}Description
Redirect to long url(for password protected short url).
Request Body
// HTTP 200
{
"password": string
}Response Body
success case
HTTP 302 Moved Temporarilynot found case
// HTTP 404 NOT FOUND
{
"error": "url not found.",
"data": null
}invalid password
// HTTP 401 Unauthorized
{
"error": "Invalid Credentials",
"data": null
}expired short url
// HTTP 400 BAD REQUEST
{
"error": "url expired.",
"data": null
}Both are used for redirection. However, 301 indicates that the requested URL is permenantly moved to the long URL. This, the browser caches the response and subsequent requests for the same URL will be forwarded automatically without sending to our URL shortener. On the other hand, 302 indicates temporarily moved to long URL, thus, browser will send requests to URL shorterner all the time. Since, we want to capture the visit count, 302 is more appropriate for our usecase.
Components:
nginxacts as reverse-proxy, ingress controlfrontendapp built with Next.jsmanagement backendapi server(Express.js) for handling creation of urls in Postgres DBredirector backendbackend server(Fastify) for handling redirection, optimized for speedpostgres- store urlsredis- cache recently visited urlsKubernetes- Load balancing, failover, self-healing pods, pod autoscaler- *
Observability- (may not implement) monitor apps, collect metrics and logs
possible characters = [0-9,a-z,A-Z], 10 + 26 + 26 = 62 possible characters.
From previous section, we calculated estimated urls needed to generate in 10 years of software lifespan: 365 bil records.
Given possible characters = 62 number of records = 365 bil.
We can derive to equation:
62^n >= 365bil.
if n = 6
62^6 = ~56mil.
if n = 7
62^7 = ~3.5tril.
Thus, short-url with 7 character long is fair enough for 10 years data.
This is a technique to convert number to a different number representation systems. In our case, we will be converting numbers in base10 to base62.
Conversion mapping:
base10: 0 1 2 ... 9 10 11 ... 36 37 ... 61
base62: 0 1 2 ... 9 A B ... a b ... z
For example,
given: 1234567
Step Number ÷ 62 Quotient Remainder Base62 Representation
1 1234567 /62 19912 23 N
2 19912 /62 321 50 o
3 321 /62 5 11 B
4 5 /62 0 5 5
combine the base62 representation reversely, we get:
base62 = 5BoN
For the input(base10), we will leverage the auto-increment feature in Postgres to generate id in number. Then, we apply unique index to the shortUrl daat column to avoid collision.
urls
------
PK | id | bigint
| shortUrl | string, unique
| longUrl | string
| passwordHash | string
| visitorCount | bigint
| expiredAt | datetime
| createdAt | datetimeKeynote:
- In Postgres,
bigintcan support up to ~9.2 quintillion whileintegercan store up to ~2.1 billion records. bigintforidas we want to cater 365 billion of records over 10 yearsbigintforvisitorCountas 11,600 read per seconds * 60min * 24hr * 365day * 10yr = ~60 bilspasswordHashis the masked value of password, do not store password as plain text in data store.
- kubernetes
- Redis string
- key: short-url, value: long url/error code in case of fail
- Cache-Aside:
- for url not exist or expired(negative cache): we set ttl to 60 seconds
- for normal url: we set ttl to 1hr or until the end of the custom expiry date, whichever shorter.
- for password protected url, we choose not to cache.
