Skip to content

Commit b080905

Browse files
polyzoswuchong
authored andcommitted
[Docs] consistency & syntax fixes (#1243)
* change PrimaryKey table to Primary Key Table across pages * syntactic fixes * make some more minor fixes * fix broken link * address yuxia's comments
1 parent 7afdc99 commit b080905

File tree

8 files changed

+52
-56
lines changed

8 files changed

+52
-56
lines changed

website/docs/engine-flink/ddl.md

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -39,17 +39,17 @@ The following properties can be set if using the Fluss catalog:
3939

4040
| Option | Required | Default | Description |
4141
|--------------------------------|----------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
42-
| type | required | (none) | Catalog type, must to be 'fluss' here. |
42+
| type | required | (none) | Catalog type, must be 'fluss' here. |
4343
| bootstrap.servers | required | (none) | Comma separated list of Fluss servers. |
4444
| default-database | optional | fluss | The default database to use when switching to this catalog. |
4545
| client.security.protocol | optional | PLAINTEXT | The security protocol used to communicate with brokers. Currently, only `PLAINTEXT` and `SASL` are supported, the configuration value is case insensitive. |
46-
| `client.security.{protocol}.*` | optional | (none) | Client-side configuration properties for a specific authentication protocol. E.g., client.security.sasl.jaas.config. More Details in [authentication](../security/authentication.md) | (none) |
46+
| `client.security.{protocol}.*` | optional | (none) | Client-side configuration properties for a specific authentication protocol. E.g., client.security.sasl.jaas.config. More Details in [authentication](../security/authentication.md) |
4747

48-
The following introduced statements assuming the current catalog is switched to the Fluss catalog using `USE CATALOG <catalog_name>` statement.
48+
The following statements assume that the current catalog has been switched to the Fluss catalog using the `USE CATALOG <catalog_name>` statement.
4949

5050
## Create Database
5151

52-
By default, FlussCatalog will use the `fluss` database in Flink. Using the following example to create a separate database in order to avoid creating tables under the default `fluss` database:
52+
By default, FlussCatalog will use the `fluss` database in Flink. You can use the following example to create a separate database to avoid creating tables under the default `fluss` database:
5353

5454
```sql title="Flink SQL"
5555
CREATE DATABASE my_db;
@@ -75,9 +75,9 @@ DROP DATABASE my_db;
7575

7676
## Create Table
7777

78-
### PrimaryKey Table
78+
### Primary Key Table
7979

80-
The following SQL statement will create a [PrimaryKey Table](table-design/table-types/pk-table/index.md) with a primary key consisting of shop_id and user_id.
80+
The following SQL statement will create a [Primary Key Table](table-design/table-types/pk-table/index.md) with a primary key consisting of shop_id and user_id.
8181
```sql title="Flink SQL"
8282
CREATE TABLE my_pk_table (
8383
shop_id BIGINT,
@@ -105,14 +105,14 @@ CREATE TABLE my_log_table (
105105
);
106106
```
107107

108-
### Partitioned (PrimaryKey/Log) Table
108+
### Partitioned (Primary Key/Log) Table
109109

110110
:::note
111111
1. Currently, Fluss only supports partitioned field with `STRING` type
112-
2. For the Partitioned PrimaryKey Table, the partitioned field (`dt` in this case) must be a subset of the primary key (`dt, shop_id, user_id` in this case)
112+
2. For the Partitioned Primary Key Table, the partitioned field (`dt` in this case) must be a subset of the primary key (`dt, shop_id, user_id` in this case)
113113
:::
114114

115-
The following SQL statement creates a Partitioned PrimaryKey Table in Fluss.
115+
The following SQL statement creates a Partitioned Primary Key Table in Fluss.
116116

117117
```sql title="Flink SQL"
118118
CREATE TABLE my_part_pk_table (
@@ -145,7 +145,7 @@ But you can still use the [Add Partition](engine-flink/ddl.md#add-partition) sta
145145

146146
#### Multi-Fields Partitioned Table
147147

148-
Fluss also support [Multi-Fields Partitioning](table-design/data-distribution/partitioning.md#multi-field-partitioned-tables), the following SQL statement creates a Multi-Fields Partitioned Log Table in Fluss:
148+
Fluss also supports [Multi-Fields Partitioning](table-design/data-distribution/partitioning.md#multi-field-partitioned-tables), the following SQL statement creates a Multi-Fields Partitioned Log Table in Fluss:
149149

150150
```sql title="Flink SQL"
151151
CREATE TABLE my_multi_fields_part_log_table (
@@ -158,9 +158,9 @@ CREATE TABLE my_multi_fields_part_log_table (
158158
) PARTITIONED BY (dt, nation);
159159
```
160160

161-
#### Auto partitioned (PrimaryKey/Log) table
161+
#### Auto Partitioned (Primary Key/Log) Table
162162

163-
Fluss also support creat Auto Partitioned (PrimaryKey/Log) Table. The following SQL statement creates an Auto Partitioned PrimaryKey Table in Fluss.
163+
Fluss also supports creating Auto Partitioned (Primary Key/Log) Table. The following SQL statement creates an Auto Partitioned Primary Key Table in Fluss.
164164

165165
```sql title="Flink SQL"
166166
CREATE TABLE my_auto_part_pk_table (
@@ -193,7 +193,7 @@ CREATE TABLE my_auto_part_log_table (
193193
);
194194
```
195195

196-
For more details about Auto Partitioned (PrimaryKey/Log) Table, refer to [Auto Partitioning](table-design/data-distribution/partitioning.md#auto-partitioning).
196+
For more details about Auto Partitioned (Primary Key/Log) Table, refer to [Auto Partitioning](table-design/data-distribution/partitioning.md#auto-partitioning).
197197

198198

199199
### Options
@@ -238,8 +238,8 @@ This will entirely remove all the data of the table in the Fluss cluster.
238238

239239
## Add Partition
240240

241-
Fluss support manually add partitions to an exists partitioned table by Fluss Catalog. If the specified partition
242-
not exists, Fluss will create the partition. If the specified partition already exists, Fluss will ignore the request
241+
Fluss supports manually adding partitions to an existing partitioned table through the Fluss Catalog. If the specified partition
242+
does not exist, Fluss will create the partition. If the specified partition already exists, Fluss will ignore the request
243243
or throw an exception.
244244

245245
To add partitions, run:
@@ -275,8 +275,8 @@ For more details, refer to the [Flink SHOW PARTITIONS](https://nightlies.apache.
275275

276276
## Drop Partition
277277

278-
Fluss also support manually drop partitions from an exists partitioned table by Fluss Catalog. If the specified partition
279-
not exists, Fluss will ignore the request or throw an exception.
278+
Fluss also supports manually dropping partitions from an existing partitioned table through the Fluss Catalog. If the specified partition
279+
does not exist, Fluss will ignore the request or throw an exception.
280280

281281

282282
To drop partitions, run:
@@ -289,5 +289,3 @@ ALTER TABLE my_multi_fields_part_log_table DROP PARTITION (dt = '2025-03-05', na
289289
```
290290

291291
For more details, refer to the [Flink ALTER TABLE(DROP)](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/alter/#drop) documentation.
292-
293-

website/docs/intro.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Fluss is a streaming storage built for real-time analytics which can serve as th
2626

2727
![arch](/img/fluss.png)
2828

29-
It bridges the gap between **streaming data** and the data **Lakehouse** by enabling low-latency, high-throughput data ingestion and processing while seamlessly integrating with popular compute engines like **Apache Flink**, while **Apache Spark**, and **StarRocks** are coming soon.
29+
It bridges the gap between **streaming data** and the data **Lakehouse** by enabling low-latency, high-throughput data ingestion and processing while seamlessly integrating with popular compute engines like **Apache Flink**, with **Apache Spark** and **StarRocks** coming soon.
3030

3131
Fluss supports `streaming reads` and `writes` with sub-second latency and stores data in a columnar format, enhancing query performance and reducing storage costs.
3232
It offers flexible table types, including append-only **Log Tables** and updatable **PrimaryKey Tables**, to accommodate diverse real-time analytics and processing needs.
@@ -44,7 +44,7 @@ The following is a list of (but not limited to) use-cases that Fluss shines ✨:
4444
* **📡 Real-time IoT Pipelines**
4545
* **🚓 Real-time Fraud Detection**
4646
* **🚨 Real-time Alerting Systems**
47-
* **💫 Real-tim ETL/Data Warehouses**
47+
* **💫 Real-time ETL/Data Warehouses**
4848
* **🌐 Real-time Geolocation Services**
4949
* **🚚 Real-time Shipment Update Tracking**
5050

website/docs/table-design/overview.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,13 @@ Tables are classified into two types based on the presence of a primary key:
3232
- **Log Tables:**
3333
- Designed for append-only scenarios.
3434
- Support only INSERT operations.
35-
- **PrimaryKey Tables:**
35+
- **Primary Key Tables:**
3636
- Used for updating and managing data in business databases.
3737
- Support INSERT, UPDATE, and DELETE operations based on the defined primary key.
3838

39-
A Table becomes a [Partitioned Table](table-design/data-distribution/partitioning.md) when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and PrimaryKey Tables, but with specific considerations:
39+
A Table becomes a [Partitioned Table](data-distribution/partitioning.md) when a partition column is defined. Data with the same partition value is stored in the same partition. Partition columns can be applied to both Log Tables and Primary Key Tables, but with specific considerations:
4040
- **For Log Tables**, partitioning is commonly used for log data, typically based on date columns, to facilitate data separation and cleaning.
41-
- **For PrimaryKey Tables**, the partition column must be a subset of the primary key to ensure uniqueness.
41+
- **For Primary Key Tables**, the partition column must be a subset of the primary key to ensure uniqueness.
4242

4343
This design ensures efficient data organization, flexibility in handling different use cases, and adherence to data integrity constraints.
4444

@@ -58,14 +58,12 @@ The number of buckets `N` can be configured per table. A bucket is the smallest
5858
The data of a bucket consists of a LogTablet and a (optional) KvTablet.
5959

6060
### LogTablet
61-
A **LogTablet** needs to be generated for each bucket of Log and PrimaryKey tables.
62-
For Log Tables, the LogTablet is both the primary table data and the log data. For PrimaryKey tables, the LogTablet acts
61+
A **LogTablet** needs to be generated for each bucket of Log and Primary Key Tables.
62+
For Log Tables, the LogTablet is both the primary table data and the log data. For Primary Key Tables, the LogTablet acts
6363
as the log data for the primary table data.
6464
- **Segment:** The smallest unit of log storage in the **LogTablet**. A segment consists of an **.index** file and a **.log** data file.
65-
- **.index:** An `offset sparse index` that stores the mappings between the physical byte address in the message relative offset -> .log file.
65+
- **.index:** An `offset sparse index` that maps message relative offsets to their corresponding physical byte addresses in the .log file.
6666
- **.log:** Compact arrangement of log data.
6767

6868
### KvTablet
69-
Each bucket of the PrimaryKey table needs to generate a KvTablet. Underlying, each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM (log structured merge) engine which helps KvTablet supports high-performance updates and lookup query.
70-
71-
69+
Each bucket of the Primary Key Table needs to generate a KvTablet. Underlying, each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM (log structured merge) engine which helps KvTablet support high-performance updates and lookup queries.

website/docs/table-design/table-types/log-table.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Log Tables in Fluss allow real-time data consumption, preserving the order of da
6060
## Column Pruning
6161

6262
Column pruning is a technique used to reduce the amount of data that needs to be read from storage by eliminating unnecessary columns from the query.
63-
Fluss supports column pruning for Log Tables and the changelog of PrimaryKey Tables, which can significantly improve query performance by reducing the amount of data that needs to be read from storage and lowering networking costs.
63+
Fluss supports column pruning for Log Tables and the changelog of Primary Key Tables, which can significantly improve query performance by reducing the amount of data that needs to be read from storage and lowering networking costs.
6464

6565
What sets Fluss apart is its ability to apply **column pruning during streaming reads**, a capability that is both unique and industry-leading. This ensures that even in real-time streaming scenarios, only the required columns are processed, minimizing resource usage and maximizing efficiency.
6666

@@ -88,7 +88,7 @@ Additionally, compression is applied to each column independently, preserving th
8888

8989
When compression is enabled:
9090
- For **Log Tables**, data is compressed by the writer on the client side, written in a compressed format, and decompressed by the log scanner on the client side.
91-
- For **PrimaryKey Table changelogs**, compression is performed server-side since the changelog is generated on the server.
91+
- For **Primary Key Table changelogs**, compression is performed server-side since the changelog is generated on the server.
9292

9393
Log compression significantly reduces networking and storage costs. Benchmark results demonstrate that using the ZSTD compression with level 3 achieves a compression ratio of approximately **5x** (e.g., reducing 5GB of data to 1GB).
9494
Furthermore, read/write throughput improves substantially due to reduced networking overhead.
@@ -131,4 +131,4 @@ In the above example, we set the compression codec to `LZ4_FRAME` and the compre
131131
:::
132132

133133
## Log Tiering
134-
Log Table supports tiering data to different storage tiers. See more details about [Remote Log](maintenance/tiered-storage/remote-storage.md).
134+
Log Table supports tiering data to different storage tiers. See more details about [Remote Log](maintenance/tiered-storage/remote-storage.md).
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
{
2-
"label": "PrimaryKey Table",
2+
"label": "Primary Key Table",
33
"position": 1
44
}

website/docs/table-design/table-types/pk-table/index.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: PrimaryKey Table
2+
title: Primary Key Table
33
sidebar_position: 1
44
---
55

@@ -19,15 +19,15 @@ sidebar_position: 1
1919
limitations under the License.
2020
-->
2121

22-
# PrimaryKey Table
22+
# Primary Key Table
2323

2424
## Basic Concept
2525

26-
PrimaryKey Table in Fluss ensure the uniqueness of the specified primary key and supports `INSERT`, `UPDATE`,
26+
Primary Key Table in Fluss ensures the uniqueness of the specified primary key and supports `INSERT`, `UPDATE`,
2727
and `DELETE` operations.
2828

29-
A PrimaryKey Table is created by specifying a `PRIMARY KEY` clause in the `CREATE TABLE` statement. For example, the
30-
following Flink SQL statement creates a PrimaryKey Table with `shop_id` and `user_id` as the primary key and distributes
29+
A Primary Key Table is created by specifying a `PRIMARY KEY` clause in the `CREATE TABLE` statement. For example, the
30+
following Flink SQL statement creates a Primary Key Table with `shop_id` and `user_id` as the primary key and distributes
3131
the data into 4 buckets:
3232

3333
```sql title="Flink SQL"
@@ -47,13 +47,13 @@ In Fluss primary key table, each row of data has a unique primary key.
4747
If multiple entries with the same primary key are written to the Fluss primary key table, only the last entry will be
4848
retained.
4949

50-
For [Partitioned PrimaryKey Table](table-design/data-distribution/partitioning.md), the primary key must contain the
50+
For [Partitioned Primary Key Table](table-design/data-distribution/partitioning.md), the primary key must contain the
5151
partition key.
5252

5353
## Bucket Assigning
5454

5555
For primary key tables, Fluss always determines which bucket the data belongs to based on the hash value of the bucket
56-
key (It must be a subset of the primary keys excluding partition keys of the primary key table) for each record. If the bucket key is not specified, the bucket key will used as the primary key (excluding the partition key).
56+
key (It must be a subset of the primary keys excluding partition keys of the primary key table) for each record. If the bucket key is not specified, the bucket key will be used as the primary key (excluding the partition key).
5757
Data with the same hash value will be distributed to the same bucket.
5858

5959
## Partial Update
@@ -92,20 +92,20 @@ follows:
9292

9393
## Merge Engines
9494

95-
The **Merge Engine** in Fluss is a core component designed to efficiently handle and consolidate data updates for PrimaryKey Tables.
95+
The **Merge Engine** in Fluss is a core component designed to efficiently handle and consolidate data updates for Primary Key Tables.
9696
It offers users the flexibility to define how incoming data records are merged with existing records sharing the same primary key.
97-
However, users can specify a different merge engine to customize the merging behavior according to their specific use cases
97+
However, users can specify a different merge engine to customize the merging behavior according to their specific use cases.
9898

9999
The following merge engines are supported:
100100

101-
1. [Default Merge Engine (LastRow)](table-design/table-types/pk-table/merge-engines/default.md)
102-
2. [FirstRow Merge Engine](table-design/table-types/pk-table/merge-engines/first-row.md)
103-
3. [Versioned Merge Engine](table-design/table-types/pk-table/merge-engines/versioned.md)
101+
1. [Default Merge Engine (LastRow)](merge-engines/default.md)
102+
2. [FirstRow Merge Engine](merge-engines/first-row.md)
103+
3. [Versioned Merge Engine](merge-engines/versioned.md)
104104

105105

106106
## Changelog Generation
107107

108-
Fluss will capture the changes when inserting, updating, deleting records on the primary-key table, which is known as
108+
Fluss will capture the changes when inserting, updating, deleting records on the Primary Key Table, which is known as
109109
the changelog. Downstream consumers can directly consume the changelog to obtain the changes in the table. For example,
110110
consider the following primary key table in Fluss:
111111

@@ -119,7 +119,7 @@ CREATE TABLE T
119119
);
120120
```
121121

122-
If the data written to the primary-key table is
122+
If the data written to the Primary Key Table is
123123
sequentially `+I(1, 2.0, 'apple')`, `+I(1, 4.0, 'banana')`, `-D(1, 4.0, 'banana')`, then the following change data will
124124
be generated. For example, the following Flink SQL statements illustrate this behavior:
125125

@@ -162,13 +162,13 @@ For primary key tables, Fluss supports various kinds of querying abilities.
162162
For a primary key table, the default read method is a full snapshot followed by incremental data. First, the
163163
snapshot data of the table is consumed, followed by the changelog data of the table.
164164

165-
It is also possible to only consume the changelog data of the table. For more details, please refer to the [Flink Reads](engine-flink/reads.md)
165+
It is also possible to only consume the changelog data of the table. For more details, please refer to the [Flink Reads](../../../engine-flink/reads.md)
166166

167167
### Lookup
168168

169-
Fluss primary key table can lookup data by the primary keys. If the key exists in Fluss, lookup will return a unique row. it always used in [Flink Lookup Join](engine-flink/lookups.md#lookup).
169+
Fluss primary key table can lookup data by the primary keys. If the key exists in Fluss, lookup will return a unique row. It is always used in [Flink Lookup Join](../../../engine-flink/lookups.md#lookup).
170170

171171
### Prefix Lookup
172172

173173
Fluss primary key table can also do prefix lookup by the prefix subset primary keys. Unlike lookup, prefix lookup
174-
will scan data based on the prefix of primary keys and may return multiple rows. It always used in [Flink Prefix Lookup Join](engine-flink/lookups.md#prefix-lookup).
174+
will scan data based on the prefix of primary keys and may return multiple rows. It is always used in [Flink Prefix Lookup Join](../../../engine-flink/lookups.md#prefix-lookup).

0 commit comments

Comments
 (0)