- http://www.cnblogs.com/duanxz/
- System Design interview
- Basic
- DNS: paid service (not running on your server)
- url -> IP address, HTTP requests sent to your web server
-
- web browser: generates and renders html
-
- mobile browser: JSON
Relational: RDBMS or SQL
popular (MySQL, Oracle, PostgreSQL)
- support upto 10 million
- table, easy to understand
-
CouchDB, Neo4j, Cassandra, HBase, Amazon DynamoDB
-
4 categories:
-
- K-V stores
-
- Graph stores
-
- Column stores
-
- Document stores
-
-
can't perform join
-
super low latency
-
unstructured data
-
massive amounts of data (> 5TB)
-
vertical scaling (scale up)
-
horizontal scaling (scale out) # always more desirable
- Load balancer
- Database replication:
- master DB: writes
- slave DB: reads (several replications)
- writes only on master: improve performance
- analytics on slaves:
- 1 slave offline: reads can be directed to master
- 1 master offline: slave prompted to master (can have problem)
- improve load/response time
- considerations:
-
- lifetime (too long: stale, too short: frequent fetch)
-
- consistency;
-
- mitigating failure (multi-cache server)
-
- evicting data (LRU)
-
- CDN:
- geographically dispersed servers
- static content (images, videos, css, javascripts)
- dynamic content (check [5] [6])
-
- DNS routes the request to the closest CDN server
-
- CDN doesn't have it, go to origin (web server, or Amazon S3)
-
- origin returns data to CDN, (with TTL)
-
- CDN caches the file, returns to User A;
stateful (add sticky session in load balancer, make adding/removing servers much harder)
stateless (move session state out of web tier, add to data-store)
prefer choosing NoSQL
geoDNS: resolve DNS according to the location of a user
case study (Netflix) to solve data synchronization [8]
Message-Queue: producer-consumer, (1 queue for photo, 1 queue for pdf)
logging, metrics and automation
sharding [11,12]
problems: hard to join across databases
most important factor: sharding key
resharding data (consistency hash)
moving things to NoSQL