|
| 1 | +## How data flows through {{% product-name %}} |
| 2 | + |
| 3 | +When data is written to {{% product-name %}}, it progresses through multiple stages to ensure durability, optimized performance and storage, and efficient querying. Configuration options at each stage affect system behavior, balancing reliability and resource usage. |
| 4 | + |
| 5 | +{{< svg "/static/svgs/v3-storage-architecture.svg" >}} |
| 6 | + |
| 7 | +<span class="caption">Figure: Write request, response, and ingest flow for {{% product-name %}}</span> |
| 8 | + |
| 9 | +- [How data flows through {{% product-name %}}](#how-data-flows-through--product-name-) |
| 10 | +- [Data ingest](#data-ingest) |
| 11 | + 1. [Write validation](#write-validation) |
| 12 | + 2. [Write-ahead log (WAL) persistence](#write-ahead-log-wal-persistence) |
| 13 | +- [Data storage](#data-storage) |
| 14 | +- [Data deletion](#data-deletion) |
| 15 | +- [Backups](#backups) |
| 16 | +- [Recovery](#recovery) |
| 17 | + |
| 18 | +## Data ingest |
| 19 | + |
| 20 | +1. [Write validation and memory buffer](#write-validation-and-memory-buffer) |
| 21 | +2. [Write-ahead log (WAL) persistence](#write-ahead-log-wal-persistence) |
| 22 | + |
| 23 | +### Write validation |
| 24 | + |
| 25 | +The [Router](/influxdb3/version/reference/internals/storage-engine/#router) validates incoming data to prevent malformed or unsupported data from entering the system. |
| 26 | +{{% product-name %}} writes accepted data to multiple write-ahead-log (WAL) files on local |
| 27 | +storage on the [Ingester](/influxdb3/version/reference/internals/storage-engine/#ingester) node before acknowledging the write request. |
| 28 | +The Ingester holds the data in memory to ensure leading edge data is available for querying. |
| 29 | + |
| 30 | +### Write-ahead log (WAL) persistence |
| 31 | + |
| 32 | +InfluxDB writes yet-to-be persisted data to multiple Write-Ahead-Log (WAL) files on local |
| 33 | +storage on the [Ingester](/influxdb3/version/reference/internals/storage-engine/#ingester) node before acknowledging the write request. |
| 34 | +{{% hide-in "clustered" %}} |
| 35 | +Parquet data files in object storage are redundantly stored on multiple devices |
| 36 | +across a minimum of three availability zones in a cloud region. |
| 37 | +{{% /hide-in %}} |
| 38 | + |
| 39 | +The Ingester then writes the contents of |
| 40 | +the WAL to Parquet files in object storage and updates the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) to |
| 41 | +reference the newly created Parquet files. |
| 42 | + |
| 43 | +If an Ingester node is gracefully shut down (for example, during a new software deployment), it flushes the contents of the WAL to the Parquet files before shutting down. |
| 44 | +{{% product-name %}} retains WALs until the data is persisted to Parquet files in object storage. |
| 45 | + |
| 46 | +## Data storage |
| 47 | + |
| 48 | +In {{< product-name >}}, all measurements are stored in |
| 49 | +[Apache Parquet](https://parquet.apache.org/) files that represent a |
| 50 | +point-in-time snapshot of the data. The Parquet files are immutable and are |
| 51 | +never replaced nor modified. Parquet files are stored in object storage and |
| 52 | +referenced in the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog), which InfluxDB uses to find the appropriate Parquet files for a particular set of data. |
| 53 | + |
| 54 | +## Data deletion |
| 55 | + |
| 56 | +When data is deleted or expires (reaches the database's [retention period](/influxdb3/version/reference/internals/data-retention/#database-retention-period)), InfluxDB performs the following steps: |
| 57 | + |
| 58 | +1. Marks the associated Parquet files as deleted in the catalog. |
| 59 | +2. Filters out data marked for deletion from all queries. |
| 60 | +{{% hide-in "clustered" %}}3. Retains Parquet files marked for deletion in object storage for approximately 30 days after the youngest data in the file ages out of retention.{{% /hide-in %}} |
| 61 | + |
| 62 | +## Backups |
| 63 | + |
| 64 | +{{< product-name >}} implements the following data backup strategies: |
| 65 | + |
| 66 | +- **Backup of WAL file**: The WAL file is written on locally attached storage. |
| 67 | + If an ingester process fails, the new ingester simply reads the WAL file on |
| 68 | + startup and continues normal operation. WAL files are maintained until their |
| 69 | + contents have been written to the Parquet files in object storage. |
| 70 | + For added protection, ingesters can be configured for write replication, where |
| 71 | + each measurement is written to two different WAL files before acknowledging |
| 72 | + the write. |
| 73 | + |
| 74 | +- **Backup of Parquet files**: Parquet files are stored in object storage {{% hide-in "clustered" %}}where |
| 75 | + they are redundantly stored on multiple devices across a minimum of three |
| 76 | + availability zones in a cloud region. Parquet files associated with each |
| 77 | + database are kept in object storage for the duration of database retention period |
| 78 | + plus an additional time period (approximately 30 days).{{% /hide-in %}} |
| 79 | + |
| 80 | +- **Backup of catalog**: InfluxData keeps a transaction log of all recent updates |
| 81 | + to the [InfluxDB catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) and generates a daily backup of |
| 82 | + the catalog. {{% hide-in "clustered" %}}Backups are preserved for at least 30 days in object storage across a minimum of three availability zones.{{% /hide-in %}} |
| 83 | + |
| 84 | +{{% hide-in "clustered" %}} |
| 85 | +## Recovery |
| 86 | + |
| 87 | +InfluxData can perform the following recovery operations: |
| 88 | + |
| 89 | +- **Recovery after ingester failure**: If an ingester fails, a new ingester is |
| 90 | + started up and reads from the WAL file for the recently ingested data. |
| 91 | + |
| 92 | +- **Recovery of Parquet files**: {{< product-name >}} uses the provided object |
| 93 | + storage data durability to recover Parquet files. |
| 94 | + |
| 95 | +- **Recovery of the catalog**: InfluxData can restore the [Catalog](/influxdb3/version/reference/internals/storage-engine/#catalog) to |
| 96 | + the most recent daily backup and then reapply any transactions |
| 97 | + that occurred since the interruption. |
| 98 | +{{% /hide-in %}} |
0 commit comments