|
1 | 1 | --- |
2 | | -icon: lucide/circle-question-mark |
| 2 | +icon: lucide/brain |
3 | 3 | --- |
4 | 4 |
|
5 | 5 | # Motivation |
6 | 6 |
|
| 7 | +## The AWS S3 Durability Model |
| 8 | + |
| 9 | +AWS S3 is renowned for providing 11 nines (99.999999999%) of durability. This impressive guarantee is achieved through |
| 10 | +a robust architecture that also maintains **at least 3 copies of your data across different Availability Zones (AZs)** |
| 11 | +within a region. Each AZ represents one or more physically separate data centers with independent power, cooling, |
| 12 | +and networking. |
7 | 13 |
|
8 | 14 | ```mermaid |
9 | 15 | graph TB |
10 | 16 | User[User] |
11 | | - |
| 17 | +
|
12 | 18 | subgraph Region["AWS Region"] |
13 | 19 | Endpoint[Regional Endpoint] |
14 | | - |
| 20 | +
|
15 | 21 | subgraph AZ1["Availability Zone 1"] |
16 | 22 | DC1A[Data Center 1A] |
17 | 23 | DC1B[Data Center 1B] |
18 | 24 | end |
19 | | - |
| 25 | +
|
20 | 26 | subgraph AZ2["Availability Zone 2"] |
21 | 27 | DC2A[Data Center 2A] |
22 | 28 | DC2B[Data Center 2B] |
23 | 29 | end |
24 | | - |
| 30 | +
|
25 | 31 | subgraph AZ3["Availability Zone 3"] |
26 | 32 | DC3A[Data Center 3A] |
27 | 33 | DC3B[Data Center 3B] |
28 | 34 | end |
29 | 35 | end |
30 | | - |
| 36 | +
|
31 | 37 | User -->|Request| Endpoint |
32 | 38 | Endpoint -->|Write Data| AZ1 |
33 | 39 | Endpoint -->|Write Data| AZ2 |
34 | 40 | Endpoint -->|Write Data| AZ3 |
35 | 41 | ``` |
| 42 | + |
| 43 | +This architecture ensures that if an entire data center experiences a catastrophic failure, your data remains safe and |
| 44 | +accessible. For even greater protection, AWS also offers **cross-region replication**, allowing data to be replicated |
| 45 | +across geographically distant regions. |
| 46 | + |
| 47 | +## The Budget Provider Model |
| 48 | + |
| 49 | +In contrast, budget-friendly S3-compatible providers like Backblaze B2 typically achieve durability through **erasure |
| 50 | +coding within a single data center** rather than replicating complete copies across multiple physical locations. |
| 51 | + |
| 52 | +```mermaid |
| 53 | +graph TB |
| 54 | + User[User] |
| 55 | +
|
| 56 | + subgraph DC["Single Data Center"] |
| 57 | + Endpoint[Storage Endpoint] |
| 58 | +
|
| 59 | + subgraph Vault["Backblaze Vault (20 Storage Pods)"] |
| 60 | + Pod1[Pod 1<br/>Shard 1] |
| 61 | + Pod2[Pod 2<br/>Shard 2] |
| 62 | + Pod3[Pod 3<br/>Shard 3] |
| 63 | + PodDots[...] |
| 64 | + Pod17[Pod 17<br/>Shard 17] |
| 65 | + Pod18[Pod 18<br/>Parity 1] |
| 66 | + Pod19[Pod 19<br/>Parity 2] |
| 67 | + Pod20[Pod 20<br/>Parity 3] |
| 68 | + end |
| 69 | + end |
| 70 | +
|
| 71 | + User -->|Request| Endpoint |
| 72 | + Endpoint -->|17 Data Shards| Pod1 |
| 73 | + Endpoint -->|+| Pod2 |
| 74 | + Endpoint -->|+| Pod3 |
| 75 | + Endpoint -->|+| PodDots |
| 76 | + Endpoint -->|+| Pod17 |
| 77 | + Endpoint -->|3 Parity Shards| Pod18 |
| 78 | + Endpoint -->|+| Pod19 |
| 79 | + Endpoint -->|+| Pod20 |
| 80 | +``` |
| 81 | + |
| 82 | +For example, Backblaze's architecture uses **Reed-Solomon erasure coding** (17 data shards + 3 parity shards) to |
| 83 | +achieve 11 nines durability[^3]. This means your file is split into 17 pieces, with 3 additional parity pieces |
| 84 | +calculated from the original data. The file can be reconstructed from any 17 of the 20 shards, allowing the system to |
| 85 | +tolerate up to 3 simultaneous drive/pod failures. |
| 86 | + |
| 87 | +While this provides excellent protection against individual hardware failures, all shards exist within a **single |
| 88 | +physical location**. If the entire data center experiences a catastrophic event, all 20 shards could be lost |
| 89 | +simultaneously. |
| 90 | + |
| 91 | +## The Cost vs. Durability Trade-off |
| 92 | + |
| 93 | +While AWS S3 provides exceptional durability, it comes at a premium price. Many S3-compatible storage providers have |
| 94 | +emerged offering significantly cheaper alternatives: |
| 95 | + |
| 96 | +- [**Backblaze B2**](https://www.backblaze.com/cloud-storage) |
| 97 | +- [**Cloudflare R2**](https://www.cloudflare.com/developer-platform/products/r2/) |
| 98 | +- [**Hetzner Object Storage**](https://www.hetzner.com/storage/object-storage/) |
| 99 | +- [**OVH Object Storage**](https://www.ovhcloud.com/en-ie/public-cloud/object-storage/) |
| 100 | +- [**MinIO**](https://www.min.io/) (allows self-hosting) |
| 101 | +- And many others |
| 102 | + |
| 103 | +These providers are often **considerably more affordable** than AWS S3. However, this cost savings comes with a |
| 104 | +trade-off: **reduced protection against data center-level failures**. |
| 105 | + |
| 106 | +### Single-Location Storage |
| 107 | + |
| 108 | +As shown in the Backblaze example above, budget-friendly S3-compatible providers typically use **erasure coding or |
| 109 | +RAID within a single data center** rather than maintaining complete copies across multiple physical locations. While |
| 110 | +this provides excellent protection against individual hardware failures, all data remains in one geographic location. |
| 111 | + |
| 112 | +### What It Takes to Lose Data |
| 113 | + |
| 114 | +A **catastrophic failure** means damage severe enough that the stored object data cannot be reconstructed. The |
| 115 | +difference in disaster resilience becomes clear when comparing what must fail for permanent data loss to occur: |
| 116 | + |
| 117 | +- **Single Data Center**: If that one DC suffers catastrophic failure, your data is gone |
| 118 | +- **Multi-AZ Architecture**: Requires catastrophic failures across **at least 3 different data centers** (affecting |
| 119 | + all 3 AZs) for data loss to occur |
| 120 | + |
| 121 | +```mermaid |
| 122 | +graph TB |
| 123 | + subgraph SingleDC["Single Data Center Model"] |
| 124 | + DC1["❌ Data Center<br/>(Catastrophic Failure)"] |
| 125 | + style DC1 fill:#ff6b6b,stroke:#c92a2a,stroke-width:4px,color:#fff |
| 126 | + Note1["Data cannot be reconstructed<br/>from anywhere else"] |
| 127 | + style Note1 fill:#fff,stroke:#c92a2a,stroke-width:2px |
| 128 | + DC1 -.-> Note1 |
| 129 | + end |
| 130 | +
|
| 131 | + subgraph MultiAZ["Multi-AZ Model"] |
| 132 | + subgraph AZ1M["Availability Zone 1"] |
| 133 | + DC1A["❌ DC 1A<br/>(Catastrophic)"] |
| 134 | + DC1B["DC 1B"] |
| 135 | + style DC1A fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff |
| 136 | + style DC1B fill:#51cf66,stroke:#2f9e44,stroke-width:1px,color:#000 |
| 137 | + end |
| 138 | + style AZ1M fill:#ffe0e0,stroke:#c92a2a,stroke-width:2px |
| 139 | +
|
| 140 | + subgraph AZ2M["Availability Zone 2"] |
| 141 | + DC2A["❌ DC 2A<br/>(Catastrophic)"] |
| 142 | + DC2B["DC 2B"] |
| 143 | + style DC2A fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff |
| 144 | + style DC2B fill:#51cf66,stroke:#2f9e44,stroke-width:1px,color:#000 |
| 145 | + end |
| 146 | + style AZ2M fill:#ffe0e0,stroke:#c92a2a,stroke-width:2px |
| 147 | +
|
| 148 | + subgraph AZ3M["Availability Zone 3"] |
| 149 | + DC3A["❌ DC 3A<br/>(Catastrophic)"] |
| 150 | + DC3B["DC 3B"] |
| 151 | + style DC3A fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px,color:#fff |
| 152 | + style DC3B fill:#51cf66,stroke:#2f9e44,stroke-width:1px,color:#000 |
| 153 | + end |
| 154 | + style AZ3M fill:#ffe0e0,stroke:#c92a2a,stroke-width:2px |
| 155 | +
|
| 156 | + Note2["Catastrophic failures in at least<br/>3 different DCs (one per AZ)<br/>= Data cannot be reconstructed"] |
| 157 | + style Note2 fill:#fff,stroke:#c92a2a,stroke-width:2px |
| 158 | + DC1A -.-> Note2 |
| 159 | + DC2A -.-> Note2 |
| 160 | + DC3A -.-> Note2 |
| 161 | + end |
| 162 | +``` |
| 163 | + |
| 164 | +With **single-location storage**, a catastrophic failure of one data center means total data loss—there's nowhere else |
| 165 | +to reconstruct from. With **multi-AZ architecture**, your data remains safe even if an entire AZ is destroyed, and |
| 166 | +requires the highly unlikely scenario of simultaneous catastrophic failures across at least 3 different data centers |
| 167 | +in geographically separated locations before data becomes unrecoverable. |
| 168 | + |
| 169 | +!!! danger "Risk of Data Loss" |
| 170 | + If the data center hosting your data experiences a catastrophic failure (fire, flood, power loss, etc.), you could |
| 171 | + face **permanent data loss**. Unlike AWS S3's multi-AZ architecture, there are no additional copies in separate |
| 172 | + physical locations to fall back on. |
| 173 | + |
| 174 | + This is not a theoretical risk: in March 2021, a fire at an OVH data center in Strasbourg destroyed servers and |
| 175 | + resulted in permanent data loss for customers who did not have off-site backups[^1] [^2]. |
| 176 | + |
| 177 | +## Limitations of Native Replication |
| 178 | + |
| 179 | +Some S3-compatible providers do offer native replication features. For example, **Backblaze B2** provides bucket |
| 180 | +replication[^4]. However, these solutions have significant limitations: |
| 181 | + |
| 182 | +### Async-Only Replication |
| 183 | + |
| 184 | +Native replication is typically **asynchronous**, meaning there's a delay between when data is written to the primary |
| 185 | +location and when it appears in replicas, which may be up to several hours[^4]. During this window, you're vulnerable |
| 186 | +to data loss if the primary fails. |
| 187 | + |
| 188 | +### Single-Cloud Restriction |
| 189 | + |
| 190 | +Native replication features are **confined to the same cloud provider**. For example, Backblaze can only replicate to |
| 191 | +other Backblaze buckets[^4]. You cannot replicate from Backblaze to MinIO, or from Hetzner to OVH. |
| 192 | + |
| 193 | +### No Cross-Cloud Disaster Recovery |
| 194 | + |
| 195 | +If you want to protect against a provider-level failure (e.g., provider goes out of business, widespread service |
| 196 | +outage, compliance issues), native replication cannot help you because all copies remain with the same vendor. |
| 197 | + |
| 198 | +## The Need for Manual Replication |
| 199 | + |
| 200 | +To achieve AWS-like durability with budget storage providers, you need to **manually implement replication as a backup |
| 201 | +strategy**. This increases your effective durability by maintaining copies across multiple independent storage |
| 202 | +locations or providers. |
| 203 | + |
| 204 | +### Option 1: Dual Writes in Your Application |
| 205 | + |
| 206 | +You can implement replication directly in your application code: |
| 207 | + |
| 208 | +```python |
| 209 | +# Pseudocode |
| 210 | +def upload_file(file, key): |
| 211 | + s3_client_primary.put_object(bucket='primary', key=key, body=file) |
| 212 | + s3_client_backup.put_object(bucket='backup', key=key, body=file) |
| 213 | +``` |
| 214 | + |
| 215 | +**Drawbacks:** |
| 216 | + |
| 217 | +- Requires modifying application code |
| 218 | +- Must be implemented consistently across all applications |
| 219 | +- Increases application complexity |
| 220 | +- Difficult to change replication strategies |
| 221 | +- Error handling becomes complicated |
| 222 | + |
| 223 | +### Option 2: Use a Transparent Proxy (ReplicaT4) |
| 224 | + |
| 225 | +ReplicaT4 acts as a proxy layer between your application and storage backends: |
| 226 | + |
| 227 | +```python |
| 228 | +# No code changes needed! |
| 229 | +s3_client = boto3.client('s3', endpoint_url='http://replicat4:3000') |
| 230 | +s3_client.put_object(bucket='my-bucket', key=key, body=file) |
| 231 | +# ReplicaT4 automatically replicates to all configured backends |
| 232 | +``` |
| 233 | + |
| 234 | +**Benefits:** |
| 235 | + |
| 236 | +- **Zero application code changes**: your apps continue using standard S3 APIs |
| 237 | +- **Centralized replication logic**: change strategies without touching application code |
| 238 | +- **Consistent replication** across all applications automatically |
| 239 | +- **Flexible consistency models**: choose between async (fast) and sync (consistent) replication |
| 240 | +- **Mix and match providers**: combine different storage backends seamlessly |
| 241 | + |
| 242 | +## Why ReplicaT4? |
| 243 | + |
| 244 | +ReplicaT4 solves these challenges by providing: |
| 245 | + |
| 246 | +- **Provider-agnostic replication**: works with any S3-compatible storage |
| 247 | +- **Cross-cloud capability**: replicate across different providers (Backblaze → MinIO → Hetzner) |
| 248 | +- **Flexible consistency models**: choose async for speed or sync for strong consistency |
| 249 | +- **Application transparency**: no code changes required |
| 250 | +- **Unified control**: manage all replication from a single configuration |
| 251 | + |
| 252 | +Whether you're using budget providers to reduce costs or implementing a defense-in-depth strategy against vendor |
| 253 | +lock-in, ReplicaT4 enables you to achieve the durability you need without sacrificing flexibility or breaking the bank. |
| 254 | + |
| 255 | + |
| 256 | +[^1]: [Reddit Discussion: Did OVH customers lose data that shouldn't have been lost?](https://www.reddit.com/r/webhosting/comments/m8e5so/eli5_did_ovh_customers_lose_data_that_shouldnt/) |
| 257 | +[^2]: [Techzine: OVH shares overview of data lost in fire](https://www.techzine.eu/news/infrastructure/57005/ovh-share-overview-of-data-lost-in-fire/) |
| 258 | +[^3]: [Backblaze Vaults: Zettabyte-Scale Cloud Storage Architecture](https://www.backblaze.com/blog/vault-cloud-storage-architecture/) |
| 259 | +[^4]: [Backblaze B2 Cloud Replication](https://www.backblaze.com/docs/cloud-storage-cloud-replication) |
0 commit comments