|
| 1 | +# Preface |
| 2 | + |
| 3 | +## Preface |
| 4 | +Driving forces in database developments and distributed systems: |
| 5 | +- Companies are handling massive amounts of traffic and data, so they must engineer tools to efficiently handle this |
| 6 | +- To respond quickly to market insights, business must be able to test and iterate on hypotheses cheaply; thus, development cycles are short and data must be flexible to meet this dynamic environment |
| 7 | +- (F)OSS preference and dependability is growing |
| 8 | +- CPU clock speeds are barely increasing, but multi-core processors are standard and networks are getting faster |
| 9 | +- cloud infrastructure allows for distributed system across many machines and geographic regions |
| 10 | +- extended downtime is increasnigly unacceptable |
| 11 | + |
| 12 | +Data-Intensive applications: apps where data is the primary challenge, e.g. data quantity, complexity, and changes |
| 13 | + |
| 14 | +Compute-Intensive applications: apps where CPU cycles are the primary challenge |
| 15 | + |
| 16 | +Goals: |
| 17 | +- Understand successful data systems by examining their algorithms, their princicples, and trade-offs |
| 18 | +- Be able to architect applications with appropriate technology by combining the most appropriate tools after examining their trade-offs |
| 19 | + |
| 20 | +# Part 1: Foundations of Data Systems |
| 21 | +- Chapter 1: Examines reliability, scalability, and maintainability and methods to achieve these goals |
| 22 | +- Chapter 2: Compares data models and query langauges and discusses situations where each is most appropriate |
| 23 | +- Chapter 3: Examines optimization of different work loads by reviewing the internals of storage engines and looks at how databases lay out data on disk |
| 24 | +- Chapter 4: Compares serialization formats and examines how they fare in scaling environments |
| 25 | + |
| 26 | +## 1. Reliable, Scalable, and Maintainable Applications |
| 27 | + |
| 28 | +Common Application Needs: |
| 29 | +- databases: store data for finding/reading later |
| 30 | +- caches: remember the result of an operation, typically expensive, to be able to quickly return the data without having to retrieve from the database again |
| 31 | +- search indexes: allows searching by keywords or other filters |
| 32 | +- stream processing: deals with data in motion, sending data to another process, handled asynchronously |
| 33 | +- batch processing: collects and processes data in batches |
| 34 | + |
| 35 | +### Thinking About Data Systems |
| 36 | +Data Systems is an umbrella term used for optimized tools for data storage and processing. |
| 37 | +Different tools, each performing a single task efficiently, are stitched together using application code because of the growing requirements from modern applications. |
| 38 | +The example below shows "an application-managed caching layer (using Memcached or similar), or a full-text search server (such as Elasticsearch or Solr) separate from your main database, it is normally the application code’s responsibility to keep those caches and indexes in sync with the main database." |
| 39 | + |
| 40 | +The above data system stitches together different tools using application code. |
| 41 | + |
| 42 | +**Questions when designing a data system:** |
| 43 | +- How do you ensure that the data remains correct and complete, even when things go wrong internally? |
| 44 | +- How do you provide consistently good performance to clients, even when parts of your system are degraded? |
| 45 | +- How do you scale to handle an increase in load? |
| 46 | +- What does a good API for the service look like? |
| 47 | + |
| 48 | +**Concerns in Most Software Systems** |
| 49 | +- **Reliability:** |
| 50 | +The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error). |
| 51 | +- **Scalability:** |
| 52 | +As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth. |
| 53 | +- **Maintainability:** |
| 54 | +Over time, many different people will work on the system (engineering and operations, both maintaining current behavior and adapting the system to new use cases), and they should all be able to work on it productively |
| 55 | +### Reliability |
| 56 | +**Software Reliability Principles** |
| 57 | +- application performs as the users expects |
| 58 | +- it can tolerate unexpected user user or human errors |
| 59 | +- it performs under expected load and data volume |
| 60 | +- prevents abuse or unauthorized use of the application |
| 61 | + |
| 62 | +Systems should be resilient or fault tolerant, i.e. they should continue to function even when faced with errors as they should be anticipating faults |
| 63 | + |
| 64 | +fault vs. failure |
| 65 | +- fault: one component of the system deviating from spec |
| 66 | +- failure: system stops providing its service or function to users |
| 67 | + |
| 68 | + |
| 69 | +#### Hardware Faults |
| 70 | +Typical Faults: Hard disks crash, RAM becomes faulty, the power grid has a blackout, someone unplugs the wrong network cable |
| 71 | + |
| 72 | +Redundancy is typically the first line of defense - Due to parallelism in cloud instances, risk is diversified and can be loss-tolerant |
| 73 | +#### Software Errors |
| 74 | +**Systematic Errors:** Cause the most faults and are difficult to anticipate. e.g. "A software bug that causes every instance of an application server to crash when given a particular bad input. For example, consider the leap second on June 30, 2012, that caused many applications to hang simultaneously due to a bug in the Linux kernel" |
| 75 | +**Mitigations:** "carefully thinking about assumptions and interactions in the system; thorough testing; process isolation; allowing processes to crash and restart; measuring, monitoring, and analyzing system behavior in production." |
| 76 | + |
| 77 | +#### Human Errors |
| 78 | +Tips for reliable systems tolerant to human errors, as quoted in DDIA: |
| 79 | +- well-designed abstractions, APIs, and admin interfaces make it easy to do “the right thing” and discourage “the wrong thing.” |
| 80 | +- provide fully featured non-production sandbox environments where people can explore and experiment safely, using real data, without affecting real users |
| 81 | +- Test thoroughly at all levels, from unit tests to whole-system integration tests and manual tests |
| 82 | +- make it fast to roll back configuration changes, roll out new code gradually, and provide tools to recompute data |
| 83 | +- |
| 84 | +#### How Important Is Reliability? |
| 85 | +### Scalability |
| 86 | +#### Describing Load |
| 87 | +#### Describing Performance |
| 88 | +#### Approaches for Coping with Load |
| 89 | +### Maintainability |
| 90 | +#### Operability: Making Life Easy for Operations |
| 91 | +#### Simplicity: Managing Complexity |
| 92 | +#### Evolvability: Making Change Easy |
| 93 | +### Summary |
| 94 | + |
| 95 | +## 2. Data Models and Query Languages |
| 96 | + |
| 97 | + |
| 98 | + |
| 99 | + |
0 commit comments