You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/core/basics.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Each piece of data has a **data type**, falling into one of the following catego
23
23
24
24
**Basic type*.
25
25
**Struct type*: a collection of **fields**, each with a name and a type.
26
-
**Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (which has a key field) or a *LTable* (ordered but without key field).
26
+
**Table type*: a collection of **rows**, each of which is a struct with specified schema. A table type can be a *KTable* (with key columns that uniquely identify each row) or a *LTable* (rows are ordered but without keys).
27
27
28
28
An indexing flow always has a top-level struct, containing all data within and managed by the flow.
Copy file name to clipboardExpand all lines: docs/docs/core/data_types.mdx
+16-10Lines changed: 16 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,21 +148,27 @@ We have two specific types of *Table* types: *KTable* and *LTable*.
148
148
149
149
#### KTable
150
150
151
-
*KTable* is a *Table* type whose first column serves as the key.
151
+
*KTable* is a *Table* type whose one or more columns together serve as the key.
152
152
The row order of a *KTable* is not preserved.
153
-
Type of the first column (key column) must be a [key type](#key-types).
153
+
Each key column must be a [key type](#key-types). When multiple key columns are present, they form a composite key.
154
154
155
-
In Python, a *KTable* type is represented by `dict[K, V]`.
156
-
The `K` should be the type binding to a key type,
157
-
and the `V` should be the type binding to a *Struct* type representing the value fields of each row.
158
-
When the specific type annotation is not provided,
159
-
the key type is bound to a tuple with its key parts when it's a *Struct* type, the value type is bound to `dict[str, Any]`.
155
+
In Python, a *KTable* type is represented by `dict[K, V]`.
156
+
`K` represents the key and `V` represents the value for each row:
157
+
158
+
-`K` can be a Struct type (either a frozen dataclass or a `NamedTuple`) that contains all key parts as fields. This is the general way to model multi-part keys.
159
+
- When there is only a single key part and it is a basic type (e.g. `str`, `int`), you may use that basic type directly as the dictionary key instead of wrapping it in a Struct.
160
+
-`V` should be the type bound to a *Struct* representing the non-key value fields of each row.
161
+
162
+
When a specific type annotation is not provided:
163
+
- For composite keys (multiple key parts), the key binds to a Python tuple of the key parts, e.g. `tuple[str, str]`.
164
+
- For a single basic key part, the key binds to that basic Python type.
165
+
- The value binds to `dict[str, Any]`.
160
166
161
167
162
168
For example, you can use `dict[str, Person]` or `dict[str, PersonTuple]` to represent a *KTable*, with 4 columns: key (*Str*), `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
163
169
It's bound to `dict[str, dict[str, Any]]` if you don't annotate the function argument with a specific type.
164
170
165
-
Note that if you want to use a *Struct* as the key, you need to ensure its value in Python is immutable. For `dataclass`, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
171
+
Note that when using a Struct as the key, it must be immutable in Python. For a dataclass, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
166
172
167
173
```python
168
174
@dataclass(frozen=True)
@@ -175,8 +181,8 @@ class PersonKeyTuple(NamedTuple):
175
181
id: str
176
182
```
177
183
178
-
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by `PersonKey` or `PersonKeyTuple`.
179
-
It's bound to `dict[(str, str), dict[str, Any]]` if you don't annotate the function argument with a specific type.
184
+
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by both `id_kind` and `id`.
185
+
If you don't annotate the function argument with a specific type, it's bound to `dict[tuple[str, str], dict[str, Any]]`.
Copy file name to clipboardExpand all lines: docs/docs/getting_started/quickstart.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,7 +105,7 @@ Notes:
105
105
*`chunk`, representing each row of `chunks`.
106
106
107
107
3. A *data source* extracts data from an external source.
108
-
In this example, the `LocalFile` data source imports local files as a KTable (table with a key field, see [KTable](../core/data_types#ktable) for details), each row has `"filename"` and `"content"` fields.
108
+
In this example, the `LocalFile` data source imports local files as a KTable (table with key columns, see [KTable](../core/data_types#ktable) for details), each row has `"filename"` and `"content"` fields.
109
109
110
110
4. After defining the KTable, we extend a new field `"chunks"` to each row by *transforming* the `"content"` field using `SplitRecursively`. The output of the `SplitRecursively` is also a KTable representing each chunk of the document, with `"location"` and `"text"` fields.
0 commit comments