Skip to content

Commit cc47b85

Browse files
committed
docs: revise docs for multiple key support
1 parent 09e8584 commit cc47b85

File tree

3 files changed

+18
-12
lines changed

3 files changed

+18
-12
lines changed

docs/docs/core/basics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Each piece of data has a **data type**, falling into one of the following catego
2323

2424
* *Basic type*.
2525
* *Struct type*: a collection of **fields**, each with a name and a type.
26-
* *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (which has a key field) or a *LTable* (ordered but without key field).
26+
* *Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (with key columns that uniquely identify each row) or a *LTable* (rows are ordered but without keys).
2727

2828
An indexing flow always has a top-level struct, containing all data within and managed by the flow.
2929

docs/docs/core/data_types.mdx

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -148,21 +148,27 @@ We have two specific types of *Table* types: *KTable* and *LTable*.
148148

149149
#### KTable
150150

151-
*KTable* is a *Table* type whose first column serves as the key.
151+
*KTable* is a *Table* type whose one or more columns together serve as the key.
152152
The row order of a *KTable* is not preserved.
153-
Type of the first column (key column) must be a [key type](#key-types).
153+
Each key column must be a [key type](#key-types). When multiple key columns are present, they form a composite key.
154154

155-
In Python, a *KTable* type is represented by `dict[K, V]`.
156-
The `K` should be the type binding to a key type,
157-
and the `V` should be the type binding to a *Struct* type representing the value fields of each row.
158-
When the specific type annotation is not provided,
159-
the key type is bound to a tuple with its key parts when it's a *Struct* type, the value type is bound to `dict[str, Any]`.
155+
In Python, a *KTable* type is represented by `dict[K, V]`.
156+
`K` represents the key and `V` represents the value for each row:
157+
158+
- `K` can be a Struct type (either a frozen dataclass or a `NamedTuple`) that contains all key parts as fields. This is the general way to model multi-part keys.
159+
- When there is only a single key part and it is a basic type (e.g. `str`, `int`), you may use that basic type directly as the dictionary key instead of wrapping it in a Struct.
160+
- `V` should be the type bound to a *Struct* representing the non-key value fields of each row.
161+
162+
When a specific type annotation is not provided:
163+
- For composite keys (multiple key parts), the key binds to a Python tuple of the key parts, e.g. `tuple[str, str]`.
164+
- For a single basic key part, the key binds to that basic Python type.
165+
- The value binds to `dict[str, Any]`.
160166

161167

162168
For example, you can use `dict[str, Person]` or `dict[str, PersonTuple]` to represent a *KTable*, with 4 columns: key (*Str*), `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
163169
It's bound to `dict[str, dict[str, Any]]` if you don't annotate the function argument with a specific type.
164170

165-
Note that if you want to use a *Struct* as the key, you need to ensure its value in Python is immutable. For `dataclass`, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
171+
Note that when using a Struct as the key, it must be immutable in Python. For a dataclass, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
166172

167173
```python
168174
@dataclass(frozen=True)
@@ -175,8 +181,8 @@ class PersonKeyTuple(NamedTuple):
175181
id: str
176182
```
177183

178-
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by `PersonKey` or `PersonKeyTuple`.
179-
It's bound to `dict[(str, str), dict[str, Any]]` if you don't annotate the function argument with a specific type.
184+
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by both `id_kind` and `id`.
185+
If you don't annotate the function argument with a specific type, it's bound to `dict[tuple[str, str], dict[str, Any]]`.
180186

181187

182188
#### LTable

docs/docs/getting_started/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ Notes:
105105
* `chunk`, representing each row of `chunks`.
106106

107107
3. A *data source* extracts data from an external source.
108-
In this example, the `LocalFile` data source imports local files as a KTable (table with a key field, see [KTable](../core/data_types#ktable) for details), each row has `"filename"` and `"content"` fields.
108+
In this example, the `LocalFile` data source imports local files as a KTable (table with key columns, see [KTable](../core/data_types#ktable) for details), each row has `"filename"` and `"content"` fields.
109109

110110
4. After defining the KTable, we extend a new field `"chunks"` to each row by *transforming* the `"content"` field using `SplitRecursively`. The output of the `SplitRecursively` is also a KTable representing each chunk of the document, with `"location"` and `"text"` fields.
111111

0 commit comments

Comments
 (0)