Skip to content

Commit ce76763

Browse files
committed
Closes DAR #535 - Adds Clustered reference/internals/durability/\
- Migrates Cloud Dedicated durability page to shared for Dedicated and Clustered.\ - Adds diagram (also used in storage-engine) to illustrate data flow. - Fixes typo in Serverless
1 parent e8ccbe2 commit ce76763

File tree

4 files changed

+120
-70
lines changed

4 files changed

+120
-70
lines changed
Lines changed: 4 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
---
22
title: InfluxDB Cloud Dedicated data durability
33
description: >
4-
InfluxDB Cloud Dedicated replicates all time series data in the storage tier across
4+
Data written to {{% product-name %}} progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage.
5+
{{% product-name %}} replicates all time series data in the storage tier across
56
multiple availability zones within a cloud region and automatically creates backups
67
that can be used to restore data in the event of a node failure or data corruption.
78
weight: 102
@@ -13,73 +14,7 @@ influxdb3/cloud-dedicated/tags: [backups, internals]
1314
related:
1415
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty
1516
- /influxdb3/cloud-dedicated/reference/internals/storage-engine/
17+
source: /shared/v3-distributed-internals-reference/durability.md
1618
---
1719

18-
{{< product-name >}} writes data to multiple Write-Ahead-Log (WAL) files on local
19-
storage and retains WALs until the data is persisted to Parquet files in object storage.
20-
Parquet data files in object storage are redundantly stored on multiple devices
21-
across a minimum of three availability zones in a cloud region.
22-
23-
## Data storage
24-
25-
In {{< product-name >}}, all measurements are stored in
26-
[Apache Parquet](https://parquet.apache.org/) files that represent a
27-
point-in-time snapshot of the data. The Parquet files are immutable and are
28-
never replaced nor modified. Parquet files are stored in object storage and
29-
referenced in the [Catalog](/influxdb3/cloud-dedicated/reference/internals/storage-engine/#catalog), which InfluxDB uses to find the appropriate Parquet files for a particular set of data.
30-
31-
### Data deletion
32-
33-
When data is deleted or expires (reaches the database's [retention period](/influxdb3/cloud-dedicated/reference/internals/data-retention/#database-retention-period)), InfluxDB performs the following steps:
34-
35-
1. Marks the associated Parquet files as deleted in the catalog.
36-
2. Filters out data marked for deletion from all queries.
37-
3. Retains Parquet files marked for deletion in object storage for approximately 30 days after the youngest data in the file ages out of retention.
38-
39-
## Data ingest
40-
41-
When data is written to {{< product-name >}}, InfluxDB first writes the data to a
42-
Write-Ahead-Log (WAL) on locally attached storage on the [Ingester](/influxdb3/cloud-dedicated/reference/internals/storage-engine/#ingester) node before
43-
acknowledging the write request. After acknowledging the write request, the
44-
Ingester holds the data in memory temporarily and then writes the contents of
45-
the WAL to Parquet files in object storage and updates the [Catalog](/influxdb3/cloud-dedicated/reference/internals/storage-engine/#catalog) to
46-
reference the newly created Parquet files. If an Ingester node is gracefully shut
47-
down (for example, during a new software deployment), it flushes the contents of
48-
the WAL to the Parquet files before shutting down.
49-
50-
## Backups
51-
52-
{{< product-name >}} implements the following data backup strategies:
53-
54-
- **Backup of WAL file**: The WAL file is written on locally attached storage.
55-
If an ingester process fails, the new ingester simply reads the WAL file on
56-
startup and continues normal operation. WAL files are maintained until their
57-
contents have been written to the Parquet files in object storage.
58-
For added protection, ingesters can be configured for write replication, where
59-
each measurement is written to two different WAL files before acknowledging
60-
the write.
61-
62-
- **Backup of Parquet files**: Parquet files are stored in object storage where
63-
they are redundantly stored on multiple devices across a minimum of three
64-
availability zones in a cloud region. Parquet files associated with each
65-
database are kept in object storage for the duration of database retention period
66-
plus an additional time period (approximately 30 days).
67-
68-
- **Backup of catalog**: InfluxData keeps a transaction log of all recent updates
69-
to the [InfluxDB catalog](/influxdb3/cloud-dedicated/reference/internals/storage-engine/#catalog) and generates a daily backup of
70-
the catalog. Backups are preserved for at least 30 days in object storage across a minimum
71-
of three availability zones.
72-
73-
## Recovery
74-
75-
InfluxData can perform the following recovery operations:
76-
77-
- **Recovery after ingester failure**: If an ingester fails, a new ingester is
78-
started up and reads from the WAL file for the recently ingested data.
79-
80-
- **Recovery of Parquet files**: {{< product-name >}} uses the provided object
81-
storage data durability to recover Parquet files.
82-
83-
- **Recovery of the catalog**: InfluxData can restore the [Catalog](/influxdb3/cloud-dedicated/reference/internals/storage-engine/#catalog) to
84-
the most recent daily backup and then reapply any transactions
85-
that occurred since the interruption.
20+
<!--// SOURCE - content/shared/v3-distributed-internals-reference/durability.md -->

content/influxdb3/cloud-serverless/reference/internals/durability.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ point-in-time snapshot of the data. The Parquet files are immutable and are
2727
never replaced nor modified. Parquet files are stored in object storage.
2828

2929
<span id="influxdb-catalog"></span>
30-
The _InfluxDB catalog_ is a relational, PostreSQL-compatible database that
30+
The _InfluxDB catalog_ is a relational, PostgreSQL-compatible database that
3131
contains references to all Parquet files in object storage and is used as an
3232
index to find the appropriate Parquet files for a particular set of data.
3333

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
title: InfluxDB Clustered data durability
3+
description: >
4+
Data written to {{% product-name %}} progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage.
5+
weight: 102
6+
menu:
7+
influxdb3_clustered:
8+
name: Data durability
9+
parent: InfluxDB internals
10+
influxdb3/clustered/tags: [backups, internals]
11+
related:
12+
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty
13+
- /influxdb3/clustered/reference/internals/storage-engine/
14+
source: /shared/v3-distributed-internals-reference/durability.md
15+
---
16+
17+
<!--// SOURCE - content/shared/v3-distributed-internals-reference/durability.md -->
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
## How data flows through {{% product-name %}}
2+
3+
When data is written to {{% product-name %}}, it progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage.
4+
5+
{{< svg "/static/svgs/v3-storage-architecture.svg" >}}
6+
7+
<span class="caption">Figure: Write request, response, and ingest flow for {{% product-name %}}</span>
8+
9+
- [How data flows through {{% product-name %}}](#how-data-flows-through--product-name-)
10+
- [Data ingest](#data-ingest)
11+
1. [Write validation](#write-validation)
12+
2. [Write-ahead log (WAL) persistence](#write-ahead-log-wal-persistence)
13+
- [Data storage](#data-storage)
14+
- [Data deletion](#data-deletion)
15+
- [Backups](#backups)
16+
- [Recovery](#recovery)
17+
18+
## Data ingest
19+
20+
1. [Write validation and memory buffer](#write-validation-and-memory-buffer)
21+
2. [Write-ahead log (WAL) persistence](#write-ahead-log-wal-persistence)
22+
23+
### Write validation
24+
25+
The [Router](/influxdb3/version/reference/internals/storage-engine/#router) validates incoming data to prevent malformed or unsupported data from entering the system.
26+
{{% product-name %}} writes accepted data to multiple write-ahead-log (WAL) files on local
27+
storage on the [Ingester](/influxdb3/version/reference/internals/storage-engine/#ingester) node before acknowledging the write request.
28+
The Ingester holds the data in memory to ensure leading edge data is available for querying.
29+
30+
### Write-ahead log (WAL) persistence
31+
32+
InfluxDB writes yet-to-be persisted data to multiple Write-Ahead-Log (WAL) files on local
33+
storage on the [Ingester](/influxdb3/version/reference/internals/storage-engine/#ingester) node before acknowledging the write request.
34+
{{% hide-in "clustered" %}}
35+
Parquet data files in object storage are redundantly stored on multiple devices
36+
across a minimum of three availability zones in a cloud region.
37+
{{% /hide-in %}}
38+
39+
The Ingester then writes the contents of
40+
the WAL to Parquet files in object storage and updates the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) to
41+
reference the newly created Parquet files.
42+
43+
If an Ingester node is gracefully shut down (for example, during a new software deployment), it flushes the contents of the WAL to the Parquet files before shutting down.
44+
{{% product-name %}} retains WALs until the data is persisted to Parquet files in object storage.
45+
46+
## Data storage
47+
48+
In {{< product-name >}}, all measurements are stored in
49+
[Apache Parquet](https://parquet.apache.org/) files that represent a
50+
point-in-time snapshot of the data. The Parquet files are immutable and are
51+
never replaced nor modified. Parquet files are stored in object storage and
52+
referenced in the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog), which InfluxDB uses to find the appropriate Parquet files for a particular set of data.
53+
54+
## Data deletion
55+
56+
When data is deleted or expires (reaches the database's [retention period](/influxdb3/version/reference/internals/data-retention/#database-retention-period)), InfluxDB performs the following steps:
57+
58+
1. Marks the associated Parquet files as deleted in the catalog.
59+
2. Filters out data marked for deletion from all queries.
60+
{{% hide-in "clustered" %}}3. Retains Parquet files marked for deletion in object storage for approximately 30 days after the youngest data in the file ages out of retention.{{% /hide-in %}}
61+
62+
## Backups
63+
64+
{{< product-name >}} implements the following data backup strategies:
65+
66+
- **Backup of WAL file**: The WAL file is written on locally attached storage.
67+
If an ingester process fails, the new ingester simply reads the WAL file on
68+
startup and continues normal operation. WAL files are maintained until their
69+
contents have been written to the Parquet files in object storage.
70+
For added protection, ingesters can be configured for write replication, where
71+
each measurement is written to two different WAL files before acknowledging
72+
the write.
73+
74+
- **Backup of Parquet files**: Parquet files are stored in object storage {{% hide-in "clustered" %}}where
75+
they are redundantly stored on multiple devices across a minimum of three
76+
availability zones in a cloud region. Parquet files associated with each
77+
database are kept in object storage for the duration of database retention period
78+
plus an additional time period (approximately 30 days).{{% /hide-in %}}
79+
80+
- **Backup of catalog**: InfluxData keeps a transaction log of all recent updates
81+
to the [InfluxDB catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) and generates a daily backup of
82+
the catalog. {{% hide-in "clustered" %}}Backups are preserved for at least 30 days in object storage across a minimum of three availability zones.{{% /hide-in %}}
83+
84+
{{% hide-in "clustered" %}}
85+
## Recovery
86+
87+
InfluxData can perform the following recovery operations:
88+
89+
- **Recovery after ingester failure**: If an ingester fails, a new ingester is
90+
started up and reads from the WAL file for the recently ingested data.
91+
92+
- **Recovery of Parquet files**: {{< product-name >}} uses the provided object
93+
storage data durability to recover Parquet files.
94+
95+
- **Recovery of the catalog**: InfluxData can restore the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) to
96+
the most recent daily backup and then reapply any transactions
97+
that occurred since the interruption.
98+
{{% /hide-in %}}

0 commit comments

Comments
 (0)