|
| 1 | +--- |
| 2 | +title: "Apache Fluss (Incubating) 0.9 Release Announcement" |
| 3 | +sidebar_label: "Apache Fluss 0.9 Release Announcement" |
| 4 | +authors: [giannis, jark] |
| 5 | +date: 2026-02-26 |
| 6 | +tags: [releases] |
| 7 | +--- |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +🌊 We are excited to announce the official release of **Apache Fluss (Incubating)** v0.9! |
| 12 | + |
| 13 | +This release marks an important milestone for the project. |
| 14 | +0.9 significantly expands Fluss’s capabilities as a **streaming storage solution for real-time data analytics & AI**, with a strong focus on schema flexibility, richer data models, extended compute storage functionality, improved operational stability, and a more developer-friendly client experience. |
| 15 | + |
| 16 | +Whether you’re building **unified stream & lakehouse architectures**, **real-time analytics**, **feature/context stores**, or **stateful streaming pipelines**, |
| 17 | +Fluss 0.9 introduces powerful new primitives that make these systems easier, safer, and more efficient to operate at scale. |
| 18 | + |
| 19 | +<!-- truncate --> |
| 20 | +## Support for Complex Data Types |
| 21 | +Apache Fluss 0.9 strengthens and extends support for **complex data types** with a focus on deep nesting, safe schema evolution, and new ML-oriented use cases. |
| 22 | +Supported types now include `ARRAY`, `MAP`, `ROW`, and `deep nesting types`. |
| 23 | +For example, schemas such as the following: |
| 24 | +> `ARRAY< MAP<STRING, ROW<values ARRAY<FLOAT>, ts TIMESTAMP_LTZ(3)>>>` |
| 25 | +
|
| 26 | +are now fully supported in production. These nested structures are handled as **schema-aware** rows, not opaque payloads, ensuring correctness. |
| 27 | + |
| 28 | +Additionally, adding new columns does not affect existing jobs after clients upgrade to 0.9. Moreover, with support for the Lance format, Fluss can be used for the ingestion of multi-modal data and vector storage. Users can now store embeddings directly in tables using `ARRAY<FLOAT>` or `ARRAY<DOUBLE>`: |
| 29 | + |
| 30 | +```sql |
| 31 | +CREATE TABLE documents ( |
| 32 | + doc_id BIGINT PRIMARY KEY, |
| 33 | + embedding ARRAY<FLOAT> |
| 34 | +); |
| 35 | +``` |
| 36 | + |
| 37 | +This enables new user stories where Fluss acts as the source of truth for vector embeddings, which can then be incrementally consumed by vector engines and maintain ANN indexes. Another helpful feature for such use cases included in this release is the Compacted Log Format (see below). |
| 38 | + |
| 39 | +https://github.com/apache/fluss/issues/816 |
| 40 | + |
| 41 | +### Schema Evolution with Zero-Copy Semantics |
| 42 | +Schema evolution is critical for long-running streaming systems, and Fluss 0.9 delivers a major step forward in this area. |
| 43 | + |
| 44 | +The release adds support for altering table schemas by appending new columns, fully integrated with Flink SQL and Fluss DDL. For example: |
| 45 | +```sql |
| 46 | +ALTER TABLE orders |
| 47 | +ADD COLUMN discount DOUBLE; |
| 48 | +``` |
| 49 | + |
| 50 | +For more information, see [Flink DDL support](https://fluss.apache.org/docs/next/engine-flink/ddl/#alter-table). |
| 51 | + |
| 52 | +#### Zero-copy schema evolution |
| 53 | +Zero-copy schema evolution means that existing data files are not rewritten when a schema changes. Instead, only metadata is updated. |
| 54 | + |
| 55 | +Existing records simply do not contain the new column, and readers interpret missing fields as NULL or default values. New records immediately include the new column without impacting historical data. |
| 56 | + |
| 57 | +This approach avoids downtime, eliminates expensive backfills, and ensures predictable performance during schema changes. It is especially important for streaming pipelines that are expected to run continuously over long periods of time. |
| 58 | + |
| 59 | +## Auto-Increment Columns for Dictionary Tables |
| 60 | +This release introduces AUTO_INCREMENT columns in Fluss, enabling Dictionary Tables, a simple pattern for mapping long identifiers (such as strings or UUIDs) to compact numeric IDs in real-time systems. |
| 61 | + |
| 62 | +AUTO_INCREMENT columns automatically assign a unique numeric ID when a row is inserted and no value is provided. The assigned ID is stable and never changes. In distributed setups, IDs may not appear strictly sequential due to parallelism and bucketing, but they are guaranteed to be unique and monotonically increasing per allocation range. |
| 63 | + |
| 64 | +A Dictionary Table is a regular Fluss table that uses an AUTO_INCREMENT column to map long business identifiers, such as strings or UUIDs, to compact integer IDs. In simple terms, a Dictionary Table gives every unique value a short number and always returns the same number for the same value. |
| 65 | + |
| 66 | +```sql |
| 67 | +CREATE TABLE uid_mapping ( |
| 68 | + uid STRING, |
| 69 | + uid_int64 BIGINT, |
| 70 | + PRIMARY KEY (`uid`) NOT ENFORCED |
| 71 | +) WITH ( |
| 72 | + 'table.auto-increment.fields' = 'uid_int64' |
| 73 | +); |
| 74 | + |
| 75 | +INSERT INTO uid_mapping VALUES ('user1'); |
| 76 | +INSERT INTO uid_mapping VALUES ('user2'); |
| 77 | +INSERT INTO uid_mapping VALUES ('user3'); |
| 78 | +INSERT INTO uid_mapping VALUES ('user4'); |
| 79 | +INSERT INTO uid_mapping VALUES ('user5'); |
| 80 | + |
| 81 | +SELECT * FROM uid_mapping; |
| 82 | +``` |
| 83 | + |
| 84 | +| uid | number | |
| 85 | +| :---- | :----- | |
| 86 | +| user1 | 1 | |
| 87 | +| user2 | 2 | |
| 88 | +| user3 | 3 | |
| 89 | +| user4 | 4 | |
| 90 | +| user5 | 5 | |
| 91 | + |
| 92 | +Dictionary Tables are commonly used to answer operational questions such as: |
| 93 | +* How many unique users, devices, or sessions have we seen so far? |
| 94 | +* Is this the first time we are seeing this identifier? |
| 95 | +* Have we already processed this event?; |
| 96 | +* What is the current set of active entities/sessions? |
| 97 | + |
| 98 | +They also help keep identity-related state manageable over long periods and provide a shared, consistent ID mapping that can be reused across systems. |
| 99 | + |
| 100 | +## Aggregation Merge Engine |
| 101 | +Apache Fluss now supports storage-level aggregations via a new Aggregation Merge Engine, enabling real-time aggregation to be pushed down from the compute layer into the Fluss storage layer. |
| 102 | + |
| 103 | +Traditionally, real-time aggregations are maintained in Flink state, which can lead to: |
| 104 | +* Large and growing state size |
| 105 | +* Slower checkpoints and recovery |
| 106 | +* Limited scalability for high-cardinality aggregations |
| 107 | + |
| 108 | +With the Aggregation Merge Engine, aggregation state is externalized to Fluss, allowing Flink jobs to remain nearly stateless while Fluss efficiently maintains aggregated results. |
| 109 | +```sql |
| 110 | +CREATE TABLE campaign_uv ( |
| 111 | + campaign_id STRING, |
| 112 | + uv_bitmap BYTES, |
| 113 | + total_events BIGINT, |
| 114 | + last_event_time TIMESTAMP, |
| 115 | + PRIMARY KEY (campaign_id) NOT ENFORCED |
| 116 | +) WITH ( |
| 117 | + 'table.merge-engine' = 'aggregation', |
| 118 | + 'fields.uv_bitmap.agg' = 'rbm64', |
| 119 | + 'fields.total_events.agg' = 'sum', |
| 120 | + 'fields.last_event_time.agg' = 'last_value_ignore_nulls' |
| 121 | +); |
| 122 | +``` |
| 123 | + |
| 124 | +Maintain continuously updated metrics (for example, order counts or model aggregated features) directly in Fluss tables, while Flink focuses only on event ingestion and lightweight processing. |
| 125 | + |
| 126 | +The aggregation merge engine is another step towards Fluss’s compute storage separation, which you can find more about [here](https://www.ververica.com/blog/introducing-the-era-of-zero-state-streaming-joins?hs_preview=cdhHvcIE-199898654106). |
| 127 | + |
| 128 | +Design reference: |
| 129 | +https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine |
| 130 | + |
| 131 | +## Support for Compacted LogFormat |
| 132 | +By default, Fluss uses Apache Arrow–based columnar storage, which is ideal for analytical workloads with selective column access. However, some workloads do not benefit from columnar layouts, especially when all columns are read together. |
| 133 | +Fluss 0.9 introduces support for a Compacted LogFormat to address these cases. |
| 134 | +This format is designed for tables such as aggregated result tables and large vector or embedding tables, where full-row reads are the dominant access pattern. In these scenarios, columnar storage provides limited benefit and can introduce unnecessary overhead. |
| 135 | + |
| 136 | +The Compacted LogFormat stores rows in a tightly packed, compact representation on disk, resulting in: |
| 137 | +* Reduced disk footprint for wide rows |
| 138 | +* More efficient full-table and wide-row scans |
| 139 | +* Better storage efficiency for derived and materialized tables |
| 140 | + |
| 141 | +Arrow remains the default and preferred choice for column-pruned analytical workloads, while the Compacted LogFormat provides a more efficient option for full-row, compacted datasets. |
| 142 | +[Link to the docs] |
| 143 | + |
| 144 | +## KV Snapshot Lease |
| 145 | +Apache Fluss now supports KV Snapshot Lease, improving the reliability of snapshot-based reads for streaming and batch workloads. |
| 146 | + |
| 147 | +Fluss tables periodically generate KV snapshots that are used by readers (for example, Flink jobs) as a consistent starting point before continuing with incremental changelog consumption. Previously, snapshot cleanup was driven solely by retention policies and was unaware of whether a snapshot was actively being read. As a result, snapshots could be deleted while a job was still reading them, leading to job failures and making clean restarts impossible in some cases. |
| 148 | + |
| 149 | +With KV Snapshot Leases, snapshot lifecycle management becomes consumer-aware. Readers explicitly acquire a lease when they start reading a snapshot, which prevents that snapshot from being deleted while it is in use. Leases are periodically renewed during long-running reads, and snapshots are only eligible for cleanup once all associated leases have been released or have expired. This ensures snapshots remain available for the full duration of a read, while still allowing automatic cleanup if a reader crashes or stops renewing its lease. |
| 150 | + |
| 151 | +This feature makes snapshot-based reads safe and predictable, enabling reliable snapshot + changelog consumption, large table bootstrapping, and long-running snapshot scans without requiring overly conservative snapshot retention settings. |
| 152 | + |
| 153 | +More information: |
| 154 | +https://cwiki.apache.org/confluence/display/FLUSS/FIP-22+Support+Kv+Snapshot+Lease |
| 155 | + |
| 156 | +## Cluster Rebalance |
| 157 | +Apache Fluss now supports cluster rebalancing, enabling automatic redistribution of buckets and leaders across TabletServers to maintain balanced load and efficient resource utilization. |
| 158 | + |
| 159 | +**Highlights** |
| 160 | +* On-demand rebalancing for common operational scenarios: |
| 161 | + * Scaling up or down the cluster |
| 162 | + * Decommissioning TabletServers |
| 163 | + * Planned maintenance |
| 164 | + * Resolving load imbalance |
| 165 | + |
| 166 | + |
| 167 | +* Goal-driven rebalance with prioritized objectives: |
| 168 | + * REPLICA_DISTRIBUTION to balance replicas |
| 169 | + * LEADER_DISTRIBUTION to balance leaders |
| 170 | + |
| 171 | + |
| 172 | +* Server-aware rebalancing using tags: |
| 173 | + * PERMANENT_OFFLINE for decommissioning |
| 174 | + * TEMPORARY_OFFLINE for maintenance scenarios |
| 175 | + |
| 176 | +* Operational visibility and control: |
| 177 | + * Track rebalance progress and status |
| 178 | + * Cancel an in-progress rebalance if needed |
| 179 | + * Only one rebalance operation runs at a time per cluster |
| 180 | + |
| 181 | +This feature simplifies cluster operations, improves stability during topology changes, and helps ensure consistent performance as Fluss clusters scale. |
| 182 | +More information on the docs [link to the docs] |
| 183 | +https://github.com/swuferhong/fluss/blob/a94b6db0859c938e3e49b1d1e630165dfdaabf78/website/docs/maintenance/operations/rebalance.md |
| 184 | + |
| 185 | +## Virtual Tables |
| 186 | +Apache Fluss 0.9 introduces Virtual Tables, a new system-level abstraction |
| 187 | +that provides access to metadata, change data, and other system information |
| 188 | +without storing additional data. Virtual tables are accessed by appending a |
| 189 | +suffix to the base table name, offering a clean and intuitive way to interact |
| 190 | +with internal data streams. |
| 191 | + |
| 192 | +The first virtual table type introduced in this release is the $changelog table, |
| 193 | +which exposes the raw changelog stream of any Fluss table. By simply appending |
| 194 | +$changelog to a table name, users can access a complete audit trail of every |
| 195 | +data modification, including inserts, updates, and deletes, along with rich metadata. |
| 196 | + |
| 197 | +- - - - Access the changelog of any table |
| 198 | + |
| 199 | +```sql |
| 200 | +SELECT * FROM orders$changelog; |
| 201 | +``` |
| 202 | + |
| 203 | + |
| 204 | +Each changelog record includes three metadata columns prepended to the original |
| 205 | +table columns: |
| 206 | +- **_change_type**: The type of change operation (+I for insert, -U/+U for update before/after, -D for delete, or +A for log table appends) |
| 207 | +- **_log_offset**: The position in the log, useful for tracking and replay |
| 208 | +- **_commit_timestamp**: The exact timestamp when the change was committed |
| 209 | + |
| 210 | +Support for Both Primary Key and Log Tables |
| 211 | + |
| 212 | +The $changelog virtual table works with both table types in Fluss: |
| 213 | + |
| 214 | +Primary Key Tables emit the full spectrum of change types (+I, -U, +U, -D), enabling precise tracking of inserts, updates, and deletes. |
| 215 | +Log Tables (append-only) emit +A for every appended row, providing a structured view of the append stream with offset and timestamp metadata. |
| 216 | + |
| 217 | +Flexible Startup Modes |
| 218 | + |
| 219 | +Users can control where the changelog reading begins using startup modes: |
| 220 | + |
| 221 | +earliest: Read from the beginning of the log for full history replay |
| 222 | +latest: Read only new changes from the current position for real-time monitoring |
| 223 | +timestamp: Start from a specific point in time for targeted replay |
| 224 | + |
| 225 | + |
| 226 | + |
| 227 | +The $changelog virtual table unlocks several real-time use cases: |
| 228 | + |
| 229 | +ML/AI (Feature Store): Stream only changed records to feature stores, process deltas instead of recomputing entire datasets. Perfect for keeping embeddings, feature vectors, and RAG systems fresh without full refreshes. |
| 230 | +Audit Trails: Build comprehensive audit logs with timestamps and offsets. Immutable append-only audit logs showing what changed, when, and by whom are critical for AI explainability and regulatory requirements |
| 231 | + |
| 232 | +## Dynamic Sink Shuffle For Partition Table |
| 233 | + |
| 234 | +Fluss supports two typical write strategies. The first is a shuffle-based approach that relies on downstream bucket IDs. While this method achieves effective batching, it encounters a bottleneck when writing to a single bucket with a single concurrency, as increasing parallelism does not mitigate the issue. The second strategy employs Flink's round-robin distribution, which, when writing to log tables, results in high metadata overhead and low actual data volume. This leads to inefficient read performance and an excessive number of Sink connections (each sink must establish connections with all Fluss servers). For multi-partition tables, uneven traffic distribution further amplifies these challenges. |
| 235 | + |
| 236 | +The newly introduced dynamic shuffle for partitioned tables dynamically detects traffic distribution across partitions at runtime. It allocates the number of write nodes proportionally based on traffic levels — high-traffic partitions are assigned more sink nodes, while low-traffic partitions are allocated fewer nodes. This ensures that each sink is responsible for writing to only one or a small number of buckets, significantly enhancing batching efficiency. Even in scenarios with uneven partition traffic, the write traffic for each sink remains balanced, ensuring optimal performance. |
| 237 | + |
| 238 | +## Other Important Features: |
| 239 | +- **Java Client Pojo Support** |
| 240 | +- **Azure FS** |
| 241 | +- **Flink 2.2 Support**: Support `ALTER TABLE ... SET ('datalake.freshness' = ...)` |
| 242 | +- **Spark Engine**: Support for catalogs, stream/batch read, and stream/batch write. |
| 243 | + |
| 244 | +## More Improvements |
| 245 | +* [Add RocksDB Block Cache Configuration Options for Index and Filter Blocks](https://github.com/apache/fluss/issues/2393) |
| 246 | +* [Add an implementation of AutoIncIDBuffer](https://github.com/apache/fluss/pull/2161) |
| 247 | +* [Add roaring bitmap aggregation](https://github.com/apache/fluss/pull/2390) |
| 248 | +* [Refactor LanceArrowWriter](https://github.com/apache/fluss/issues/1569) |
| 249 | +* [Support Flink 2.2 Support alter table.datalake.freshness](https://github.com/apache/fluss/pull/2365) |
| 250 | +* [Flink CALL producers to update cluster configs dynamically](https://github.com/apache/fluss/pull/2279) |
| 251 | +* [WAL mode for changelog images of primary key table](https://github.com/apache/fluss/pull/2105) |
| 252 | +* [Report rich RocksDB metrics for debugging and production-ready usage](https://github.com/apache/fluss/pull/2282 |
| 253 | +* [Improved stability for primary-key tables under spiky write throughput through rate limiting and memory usage control for statistics](https://github.com/apache/fluss/pull/2178] |
| 254 | +* [Improve stability for large datalake enabled tables with more than 10K buckets](https://github.com/apache/fluss/issues/2224) |
| 255 | +* [Fix correctness issue in union reads of Paimon tables with deletion vectors](https://github.com/apache/fluss/pull/2326) |
| 256 | +* Fluss no longer requires the three system columns (`__offset`, `__timestamp`, and `__bucket`) on lakehouse tables. This reduces intrusiveness into users’ existing lakehouse systems and makes it easier to upgrade an existing lakehouse table to a streaming lakehouse table. (https://github.com/apache/fluss/pull/2417) |
0 commit comments