theEmbeddedGeorge
diff --git a/‎Interview/Resources/load_balancer.png‎
29.2 KB b/‎Interview/Resources/load_balancer.png‎
29.2 KB
diff --git a/‎Interview/SystemDesign/caching.md‎
Lines changed: 41 additions & 0 deletions b/‎Interview/SystemDesign/caching.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎Interview/SystemDesign/dataPartitioning.md‎
Lines changed: 6 additions & 0 deletions b/‎Interview/SystemDesign/dataPartitioning.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎Interview/SystemDesign/keyCharacterDistributedSystem.md‎
Lines changed: 50 additions & 0 deletions b/‎Interview/SystemDesign/keyCharacterDistributedSystem.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎Interview/SystemDesign/loadBalancing.md‎
Lines changed: 64 additions & 0 deletions b/‎Interview/SystemDesign/loadBalancing.md‎
Lines changed: 64 additions & 0 deletions
@@ -0,0 +1,41 @@
+## Caching
+
+Load balancing helps you scale horizontally across an ever-increasing number of servers, but caching will enable you to make vastly better use of the resources you already have as well as making otherwise unattainable product requirements feasible. Caches take advantage of the locality of reference principle: recently requested data is likely to be requested again. They are used in almost every layer of computing: hardware, operating systems, web browsers, web applications, and more. A cache is like short-term memory: it has a limited amount of space, but is typically faster than the original data source and contains the most recently accessed items. Caches can exist at all levels in architecture, but are often found at the level nearest to the front end where they are implemented to return data quickly without taxing downstream levels.
+
+### **Application server cache**
+Placing a cache directly on a request layer node enables the local storage of response data. Each time a request is made to the service, the node will quickly return local cached data if it exists. If it is not in the cache, the requesting node will query the data from disk. The cache on one request layer node could also be located both in memory (which is very fast) and on the node’s local disk (faster than going to network storage).
+
+What happens when you expand this to many nodes? If the request layer is expanded to multiple nodes, it’s still quite possible to have each node host its own cache. However, if your load balancer randomly distributes requests across the nodes, the same request will go to different nodes, thus increasing cache misses. Two choices for overcoming this hurdle are global caches and distributed caches.
+
+### **Content Distribution Network (CDN)**
+CDNs are a kind of cache that comes into play for sites serving large amounts of static media. In a typical CDN setup, a request will first ask the CDN for a piece of static media; the CDN will serve that content if it has it locally available. If it isn’t available, the CDN will query the back-end servers for the file, cache it locally, and serve it to the requesting user.
+
+If the system we are building isn’t yet large enough to have its own CDN, we can ease a future transition by serving the static media off a separate subdomain (e.g. static.yourservice.com) using a lightweight HTTP server like Nginx, and cut-over the DNS from your servers to a CDN later.
+
+### **Cache Invalidation**
+
+While caching is fantastic, it does require some maintenance for keeping cache coherent with the source of truth (e.g., database). If the data is modified in the database, it should be invalidated in the cache; if not, this can cause inconsistent application behavior.
+
+Solving this problem is known as cache invalidation; there are three main schemes that are used:
+
+***Write-through cache***: Under this scheme, data is written into the cache and the corresponding database at the same time. The cached data allows for fast retrieval and, since the same data gets written in the permanent storage, we will have complete data consistency between the cache and the storage. Also, this scheme ensures that nothing will get lost in case of a crash, power failure, or other system disruptions.
+
+Although, write through minimizes the risk of data loss, since every write operation must be done twice before returning success to the client, this scheme has the disadvantage of higher latency for write operations.
+
+***Write-around cache***: This technique is similar to write through cache, but data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write operations that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.
+
+***Write-back cache***: Under this scheme, data is written to cache alone and completion is immediately confirmed to the client. The write to the permanent storage is done after specified intervals or under certain conditions. This results in low latency and high throughput for write-intensive applications, however, this speed comes with the risk of data loss in case of a crash or other adverse event because the only copy of the written data is in the cache.
+
+### **Cache eviction policies**
+Following are some of the most common cache eviction policies:
+
+1. First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
+2. Last In First Out (LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
+3. Least Recently Used (LRU): Discards the least recently used items first.
+4. Most Recently Used (MRU): Discards, in contrast to LRU, the most recently used items first.
+5. Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
+6. Random Replacement (RR): Randomly selects a candidate item and discards it to make space when necessary.
+
+## Reference
+
+Grokking the System Design Interview by Educative.io
@@ -0,0 +1,6 @@
+## Data Partitioning
+
+
+## Reference
+
+Grokking the System Design Interview by Educative.io
@@ -0,0 +1,50 @@
+## Key Characteristics of Distributed Systems
+- Scalability
+- Reliability
+- Availability
+- Efficiency
+- Serviceability or Manageability
+
+### ***Scalability***
+
+Scalability is the capability of a system, process, or a network to grow and manage increased demand. Any distributed system that can continuously evolve in order to support the growing amount of work is considered to be scalable.
+
+A system may have to scale because of many reasons like increased data volume or increased amount of work, e.g., number of transactions. A scalable system would like to achieve this scaling without performance loss.
+
+Generally, the performance of a system, although designed (or claimed) to be scalable, declines with the system size due to the management or environment cost. For instance, network speed may become slower because machines tend to be far apart from one another. More generally, some tasks may not be distributed, either because of their inherent atomic nature or because of some flaw in the system design. At some point, such tasks would limit the speed-up obtained by distribution. A scalable architecture avoids this situation and attempts to balance the load on all the participating nodes evenly.
+
+**Horizontal vs. Vertical Scaling**: Horizontal scaling means that you scale by adding more servers into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM, Storage, etc.) to an existing server.
+
+With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool; Vertical-scaling is usually limited to the capacity of a single server and scaling beyond that capacity often involves downtime and comes with an upper limit.
+
+### ***Reliability***
+
+By definition, reliability is the probability a system will fail in a given period. In simple terms, a distributed system is considered reliable if it keeps delivering its services even when one or several of its software or hardware components fail. Reliability represents one of the main characteristics of any distributed system, since in such systems any failing machine can always be replaced by another healthy one, ensuring the completion of the requested task.
+
+A reliable distributed system achieves this through redundancy of both the software components and data. Obviously, redundancy has a cost and a reliable system has to pay that to achieve such resilience for services by eliminating every single point of failure.
+
+### ***Availability***
+
+By definition, availability is the time a system remains operational to perform its required function in a specific period. It is a simple measure of the percentage of time that a system, service, or a machine remains operational under normal conditions. An aircraft that can be flown for many hours a month without much downtime can be said to have a high availability. Availability takes into account maintainability, repair time, spares availability, and other logistics considerations. If an aircraft is down for maintenance, it is considered not available during that time.
+
+Reliability is availability over time considering the full range of possible real-world conditions that can occur. An aircraft that can make it through any possible weather safely is more reliable than one that has vulnerabilities to possible conditions.
+
+**Reliability Vs. Availability**
+If a system is reliable, it is available. However, if it is available, it is not necessarily reliable. In other words, high reliability contributes to high availability, but it is possible to achieve a high availability even with an unreliable product by minimizing repair time and ensuring that spares are always available when they are needed. 
+
+### ***Efficiency***
+
+Two standard measures of its efficiency are the response time (or latency) that denotes the delay to obtain the first item and the throughput (or bandwidth) which denotes the number of items delivered in a given time unit (e.g., a second). The two measures correspond to the following unit costs:
+
+- Number of messages globally sent by the nodes of the system regardless of the message size.
+- Size of messages representing the volume of data exchanges.
+
+The complexity of operations supported by distributed data structures (e.g., searching for a specific key in a distributed index) can be characterized as a function of one of these cost units.
+
+### ***Serviceability or Manageability***
+
+Another important consideration while designing a distributed system is how easy it is to operate and maintain. Serviceability or manageability is the simplicity and speed with which a system can be repaired or maintained; if the time to fix a failed system increases, then availability will decrease. Things to consider for manageability are the ease of diagnosing and understanding problems when they occur, ease of making updates or modifications, and how simple the system is to operate.
+
+## Reference
+
+Grokking the System Design Interview by Educative.io
@@ -0,0 +1,64 @@
+## Load Balancing
+
+Load Balancer (LB) is another critical component of any distributed system. It helps to spread the traffic across a cluster of servers to improve responsiveness and availability of applications, websites or databases. LB also keeps track of the status of all the resources while distributing requests. If a server is not available to take new requests or is not responding or has elevated error rate, LB will stop sending traffic to such a server.
+
+Typically a load balancer sits between the client and the server accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various algorithms. By balancing application requests across multiple servers, a load balancer reduces individual server load and prevents any one application server from becoming a single point of failure, thus improving overall application availability and responsiveness.
+
+![Load Balancer](https://miro.medium.com/max/1586/1*tEaZGz-p1-E2ytNjl5RPJg.jpeg)
+
+To utilize full scalability and redundancy, we can try to balance the load at each layer of the system. We can add LBs at three places:
+
+- Between the user and the web server
+- Between web servers and an internal platform layer, like application - servers or cache servers
+- Between internal platform layer and database.
+
+![Load Balancing](../Resources/load_balancer.png)
+
+### **Benefits of Load Balancing**
+
+- Users experience faster, uninterrupted service. Users won’t have to wait for a single struggling server to finish its previous tasks. Instead, their requests are immediately passed on to a more readily available resource.
+  
+- Service providers experience less downtime and higher throughput. Even a full server failure won’t affect the end user experience as the load balancer will simply route around it to a healthy server.
+  
+- Load balancing makes it easier for system administrators to handle incoming requests while decreasing wait time for users.
+  
+- Smart load balancers provide benefits like predictive analytics that determine traffic bottlenecks before they happen. As a result, the smart load balancer gives an organization actionable insights. These are key to automation and can help drive business decisions.
+  
+- System administrators experience fewer failed or stressed components. Instead of a single device performing a lot of work, load balancing has several devices perform a little bit of work.
+
+### **Load Balancing Algorithms**
+
+***How does the load balancer choose the backend server?***
+
+Load balancers consider two factors before forwarding a request to a backend server. They will first ensure that the server they choose is actually responding appropriately to requests and then use a pre-configured algorithm to select one from the set of healthy servers. We will discuss these algorithms shortly.
+
+***Health Checks*** - Load balancers should only forward traffic to “healthy” backend servers. To monitor the health of a backend server, “health checks” regularly attempt to connect to backend servers to ensure that servers are listening. If a server fails a health check, it is automatically removed from the pool, and traffic will not be forwarded to it until it responds to the health checks again.
+
+Algorithms:
+
+- Least Connection Method — This method directs traffic to the server with the fewest active connections. This approach is quite useful when there are a large number of persistent client connections which are unevenly distributed between the servers.
+  
+- Least Response Time Method — This algorithm directs traffic to the server with the fewest active connections and the lowest average response time.
+  
+- Least Bandwidth Method - This method selects the server that is currently serving the least amount of traffic measured in megabits per second (Mbps).
+  
+- Round Robin Method — This method cycles through a list of servers and sends each new request to the next server. When it reaches the end of the list, it starts over at the beginning. It is most useful when the servers are of equal specifica tion and there are not many persistent connections.
+  
+- Weighted Round Robin Method — The weighted round-robin scheduling is designed to better handle servers with different processing capacities. Each server is assigned a weight (an integer value that indicates the processing capacity). Servers with higher weights receive new connections before those with less weights and servers with higher weights get more connections than those with less weights.
+  
+- IP Hash — Under this method, a hash of the IP address of the client is calculated to redirect the request to a server.
+
+### **Redundant Load Balancers**
+
+The load balancer can be a single point of failure; to overcome this, a second load balancer can be connected to the first to form a cluster. Each LB monitors the health of the other and, since both of them are equally capable of serving traffic and failure detection, in the event the main load balancer fails, the second load balancer takes over.
+
+### Advance Readings
+
+[What is load balancing](https://avinetworks.com/what-is-load-balancing/)
+
+[Introduction to architecting systems for scale](https://lethain.com/introduction-to-architecting-systems-for-scale/)
+
+
+## Reference
+
+Grokking the System Design Interview by Educative.io