-
Notifications
You must be signed in to change notification settings - Fork 325
Closes DAR #535 - Adds Clustered reference/internals/durability/ #6365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
ce76763
Closes DAR #535 - Adds Clustered reference/internals/durability/\
jstirnaman 21568d3
Merge branch 'master' into fix-dar-535
jstirnaman 3aa8d7e
fix(v3): DAR-535 resolve duplication
jstirnaman 9ca2652
fix(v3): remove top-level TOC link, hide recovery in Clustered
jstirnaman 88792ab
Merge branch 'master' into fix-dar-535
jstirnaman fe72ad5
fix(v3): Apply code review suggestions\
jstirnaman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17 changes: 17 additions & 0 deletions
17
content/influxdb3/clustered/reference/internals/durability.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| --- | ||
| title: InfluxDB Clustered data durability | ||
| description: > | ||
| Data written to {{% product-name %}} progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage. | ||
| weight: 102 | ||
| menu: | ||
| influxdb3_clustered: | ||
| name: Data durability | ||
| parent: InfluxDB internals | ||
| influxdb3/clustered/tags: [backups, internals] | ||
| related: | ||
| - https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty | ||
| - /influxdb3/clustered/reference/internals/storage-engine/ | ||
| source: /shared/v3-distributed-internals-reference/durability.md | ||
| --- | ||
|
|
||
| <!--// SOURCE - content/shared/v3-distributed-internals-reference/durability.md --> |
92 changes: 92 additions & 0 deletions
92
content/shared/v3-distributed-internals-reference/durability.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| ## How data flows through {{% product-name %}} | ||
|
|
||
| When data is written to {{% product-name %}}, it progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage. | ||
|
|
||
| {{< svg "/static/svgs/v3-storage-architecture.svg" >}} | ||
|
|
||
| <span class="caption">Figure: Write request, response, and ingest flow for {{% product-name %}}</span> | ||
|
|
||
| - [Data ingest](#data-ingest) | ||
| - [Data storage](#data-storage) | ||
| - [Data deletion](#data-deletion) | ||
| - [Backups](#backups) | ||
| {{% hide-in "clustered" %}}- [Recovery](#recovery){{% /hide-in %}} | ||
|
|
||
| ## Data ingest | ||
|
|
||
| 1. [Write validation and memory buffer](#write-validation-and-memory-buffer) | ||
| 2. [Write-ahead log (WAL) persistence](#write-ahead-log-wal-persistence) | ||
|
|
||
| ### Write validation and memory buffer | ||
|
|
||
| The [Router](/influxdb3/version/reference/internals/storage-engine/#router) validates incoming data to prevent malformed or unsupported data from entering the system. | ||
| {{% product-name %}} writes accepted data to multiple write-ahead log (WAL) files on [Ingester](/influxdb3/version/reference/internals/storage-engine/#ingester) pods' local storage (default is 2 for redundancy) before acknowledging the write request. | ||
| The Ingester holds the data in memory to ensure leading-edge data is available for querying. | ||
|
|
||
| ### Write-ahead log (WAL) persistence | ||
|
|
||
| Ingesters persist the contents of | ||
| the WAL to Parquet files in object storage and updates the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) to | ||
| reference the newly created Parquet files. | ||
| {{% product-name %}} retains WALs until the data is persisted. | ||
|
|
||
| If an Ingester node is gracefully shut down (for example, during a new software deployment), it flushes the contents of the WAL to the Parquet files before shutting down. | ||
|
|
||
| ## Data storage | ||
|
|
||
| In {{< product-name >}}, all measurements are stored in | ||
| [Apache Parquet](https://parquet.apache.org/) files that represent a | ||
| point-in-time snapshot of the data. The Parquet files are immutable and are | ||
| never replaced nor modified. Parquet files are stored in object storage and | ||
| referenced in the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog), which InfluxDB uses to find the appropriate Parquet files for a particular set of data. | ||
|
|
||
| {{% hide-in "clustered" %}} | ||
| Parquet data files in object storage are redundantly stored on multiple devices | ||
| across a minimum of three availability zones in a cloud region. | ||
| {{% /hide-in %}} | ||
|
|
||
| ## Data deletion | ||
|
|
||
| When data is deleted or expires (reaches the database's [retention period](/influxdb3/version/reference/internals/data-retention/#database-retention-period)), InfluxDB performs the following steps: | ||
|
|
||
| 1. Marks the associated Parquet files as deleted in the catalog. | ||
| 2. Filters out data marked for deletion from all queries. | ||
| {{% hide-in "clustered" %}}3. Retains Parquet files marked for deletion in object storage for approximately 30 days after the youngest data in the file ages out of retention.{{% /hide-in %}} | ||
|
|
||
| ## Backups | ||
|
|
||
| {{< product-name >}} implements the following data backup strategies: | ||
|
|
||
| - **Backup of WAL file**: The WAL file is written on locally attached storage. | ||
| If an ingester process fails, the new ingester simply reads the WAL file on | ||
| startup and continues normal operation. WAL files are maintained until their | ||
| contents have been written to the Parquet files in object storage. | ||
| For added protection, ingesters can be configured for write replication, where | ||
| each measurement is written to two different WAL files before acknowledging | ||
| the write. | ||
|
|
||
| - **Backup of Parquet files**: Parquet files are stored in object storage {{% hide-in "clustered" %}}where | ||
| they are redundantly stored on multiple devices across a minimum of three | ||
| availability zones in a cloud region. Parquet files associated with each | ||
| database are kept in object storage for the duration of database retention period | ||
| plus an additional time period (approximately 30 days).{{% /hide-in %}} | ||
|
|
||
| - **Backup of catalog**: InfluxData keeps a transaction log of all recent updates | ||
| to the [InfluxDB catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) and generates a daily backup of | ||
| the catalog. {{% hide-in "clustered" %}}Backups are preserved for at least 30 days in object storage across a minimum of three availability zones.{{% /hide-in %}} | ||
|
|
||
| {{% hide-in "clustered" %}} | ||
| ## Recovery | ||
|
|
||
| InfluxData can perform the following recovery operations: | ||
|
|
||
| - **Recovery after ingester failure**: If an ingester fails, a new ingester is | ||
| started up and reads from the WAL file for the recently ingested data. | ||
|
|
||
| - **Recovery of Parquet files**: {{< product-name >}} uses the provided object | ||
| storage data durability to recover Parquet files. | ||
|
|
||
| - **Recovery of the catalog**: InfluxData can restore the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) to | ||
| the most recent daily backup and then reapply any transactions | ||
| that occurred since the interruption. | ||
| {{% /hide-in %}} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.