Skip to content

Commit ce0fcbb

Browse files
authored
Merge pull request #5740 from influxdata/jstirnaman/issue5716
Cloud Dedicated: update backup policy
2 parents 3718389 + 0b4c237 commit ce0fcbb

File tree

2 files changed

+25
-29
lines changed

2 files changed

+25
-29
lines changed

content/influxdb/cloud-dedicated/reference/internals/data-retention.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ are filtered out of query results, even though the data may still exist.
2222
## Database retention period
2323

2424
A **database retention period** is the duration of time that a database retains data.
25-
Retention periods are designed to automatically delete expired data and optimize
26-
storage without any user intervention.
25+
Retention periods automatically delete expired data and optimize
26+
storage without the need for user intervention.
2727

2828
Retention periods can be as short as an hour or infinite.
2929
[Points](/influxdb/cloud-dedicated/reference/glossary/#point) in a database with
@@ -40,6 +40,6 @@ to view your databases' retention periods.
4040
## When does data actually get deleted?
4141

4242
InfluxDB routinely deletes [Parquet](https://parquet.apache.org/) files containing only expired data.
43-
InfluxDB retains expired Parquet files for approximately 100 days for disaster recovery.
44-
After the disaster recovery period, expired Parquet files are permanently deleted
45-
and can't be recovered.
43+
Expired Parquet files are retained for approximately 30 days for disaster recovery purposes.
44+
After this period, the files are permanently deleted and cannot be recovered.
45+
For more information see [data durability](/influxdb/cloud-dedicated/reference/internals/durability/).

content/influxdb/cloud-dedicated/reference/internals/durability.md

Lines changed: 20 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ menu:
1212
influxdb/cloud-dedicated/tags: [backups, internals]
1313
related:
1414
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty
15+
- /influxdb/cloud-dedicated/reference/internals/storage-engine/
1516
---
1617

1718
{{< product-name >}} writes data to multiple Write-Ahead-Log (WAL) files on local
@@ -24,38 +25,33 @@ across a minimum of three availability zones in a cloud region.
2425
In {{< product-name >}}, all measurements are stored in
2526
[Apache Parquet](https://parquet.apache.org/) files that represent a
2627
point-in-time snapshot of the data. The Parquet files are immutable and are
27-
never replaced nor modified. Parquet files are stored in object storage.
28-
29-
<span id="influxdb-catalog"></span>
30-
The _InfluxDB catalog_ is a relational, PostreSQL-compatible database that
31-
contains references to all Parquet files in object storage and is used as an
32-
index to find the appropriate Parquet files for a particular set of data.
28+
never replaced nor modified. Parquet files are stored in object storage and
29+
referenced in the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog), which InfluxDB uses to find the appropriate Parquet files for a particular set of data.
3330

3431
### Data deletion
3532

36-
When data is deleted or when the retention period is reached for data within
37-
a database, the associated Parquet files are marked as deleted _in the catalog_,
38-
but the actual Parquet files are _not removed from object storage_.
39-
All queries filter out data that has been marked as deleted.
40-
Parquet files remain in object storage for approximately 100 days after the
41-
youngest data in the Parquet file ages out of retention.
33+
When data is deleted or expires (reaches the database's [retention period](/influxdb/cloud-dedicated/reference/internals/data-retention/#database-retention-period)), InfluxDB performs the following steps:
34+
35+
1. Marks the associated Parquet files as deleted in the catalog.
36+
2. Filters out data marked for deletion from all queries.
37+
3. Retains Parquet files marked for deletion in object storage for approximately 30 days after the youngest data in the file ages out of retention.
4238

4339
## Data ingest
4440

45-
When data is written to {{< product-name >}}, the data is first written to a
46-
Write-Ahead-Log (WAL) on locally-attached storage on the ingester node before
47-
the write request is acknowledged. After acknowledging the write request, the
48-
ingester holds the data in memory temporarily and then writes the contents of
49-
the WAL to Parquet files in object storage and updates the InfluxDB catalog to
50-
reference the newly created Parquet files. If an ingester is gracefully shut
41+
When data is written to {{< product-name >}}, InfluxDB first writes the data to a
42+
Write-Ahead-Log (WAL) on locally attached storage on the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) node before
43+
acknowledging the write request. After acknowledging the write request, the
44+
Ingester holds the data in memory temporarily and then writes the contents of
45+
the WAL to Parquet files in object storage and updates the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog) to
46+
reference the newly created Parquet files. If an Ingester node is gracefully shut
5147
down (for example, during a new software deployment), it flushes the contents of
5248
the WAL to the Parquet files before shutting down.
5349

5450
## Backups
5551

5652
{{< product-name >}} implements the following data backup strategies:
5753

58-
- **Backup of WAL file**: The WAL file is written on locally-attached storage.
54+
- **Backup of WAL file**: The WAL file is written on locally attached storage.
5955
If an ingester process fails, the new ingester simply reads the WAL file on
6056
startup and continues normal operation. WAL files are maintained until their
6157
contents have been written to the Parquet files in object storage.
@@ -67,11 +63,11 @@ the WAL to the Parquet files before shutting down.
6763
they are redundantly stored on multiple devices across a minimum of three
6864
availability zones in a cloud region. Parquet files associated with each
6965
database are kept in object storage for the duration of database retention period
70-
plus an additional time period (approximately 100 days).
66+
plus an additional time period (approximately 30 days).
7167

7268
- **Backup of catalog**: InfluxData keeps a transaction log of all recent updates
73-
to the [InfluxDB catalog](#influxdb-catalog) and generates a daily backup of
74-
the catalog. Backups are preserved for at least 100 days in object storage across a minimum
69+
to the [InfluxDB catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog) and generates a daily backup of
70+
the catalog. Backups are preserved for at least 30 days in object storage across a minimum
7571
of three availability zones.
7672

7773
## Recovery
@@ -84,6 +80,6 @@ InfluxData can perform the following recovery operations:
8480
- **Recovery of Parquet files**: {{< product-name >}} uses the provided object
8581
storage data durability to recover Parquet files.
8682

87-
- **Recovery of the catalog**: InfluxData can restore the InfluxDB catalog to
88-
the most recent daily backup of the catalog and then reapply any transactions
83+
- **Recovery of the catalog**: InfluxData can restore the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog) to
84+
the most recent daily backup and then reapply any transactions
8985
that occurred since the interruption.

0 commit comments

Comments
 (0)