Skip to content

Commit a304fef

Browse files
authored
docs: simplify data types docs - general typed annotation always allowed (#790)
1 parent 6bc61ab commit a304fef

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

docs/docs/core/data_types.mdx

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,13 @@ All you need to do is to make sure the data passed to functions and targets are
2121
Each type in CocoIndex type system is mapped to one or multiple types in Python.
2222
When you define a [custom function](/docs/core/custom_function), you need to annotate the data types of arguments and return values.
2323

24-
* When you pass a Python value to the engine (e.g. return values of a custom function), type annotation is required,
25-
as it provides the ground truth of the data type in the flow.
24+
* When you pass a Python value to the engine (e.g. return values of a custom function), a specific type annotation is required.
25+
The type annotation needs to be specific in describing the target data type, as it provides the ground truth of the data type in the flow.
2626

2727
* When you use a Python variable to bind to an engine value (e.g. arguments of a custom function),
28-
we use the type annotation as a guidance to construct the Python value.
29-
Type annotation is optional for basic types and struct types, and required for table types.
28+
the engine already knows the specific data type, so we don't require a specific type annotation, e.g. type annotations can be omitted, or you can use `Any` at any level.
29+
When a specific type annotation is provided, it's still used as a guidance to construct the Python value with compatible type.
30+
Otherwise, we will bind to a default Python type.
3031

3132
### Basic Types
3233

@@ -54,7 +55,7 @@ This is the list of all primitive types supported by CocoIndex:
5455
Notes:
5556

5657
* For some CocoIndex types, we support multiple Python types. You can annotate with any of these Python types.
57-
The first one is the default type, i.e. CocoIndex will create a value with this type when the type annotation is not provided (e.g. for arguments of a custom function).
58+
The first one is the default type, i.e. CocoIndex will create a value with this type when a specific type annotation is not provided (e.g. for arguments of a custom function).
5859

5960
* All Python types starting with `cocoindex.` are type aliases exported by CocoIndex. They're annotated types based on certain Python types:
6061

@@ -136,7 +137,7 @@ Both `Person` and `PersonTuple` are valid Struct types in CocoIndex, with identi
136137
Choose `dataclass` for mutable objects or when you need additional methods, and `NamedTuple` for immutable, lightweight structures.
137138

138139
Besides, for arguments of custom functions, CocoIndex also supports using dictionaries (`dict[str, Any]`) to represent a *Struct* type.
139-
It's the default Python type if you don't annotate the function argument.
140+
It's the default Python type if you don't annotate the function argument with a specific type.
140141

141142
### Table Types
142143

@@ -152,11 +153,16 @@ The row order of a *KTable* is not preserved.
152153
Type of the first column (key column) must be a [key type](#key-types).
153154

154155
In Python, a *KTable* type is represented by `dict[K, V]`.
155-
The `V` should be a *Struct* type, either a `dataclass` or `NamedTuple`, representing the value fields of each row.
156+
The `K` should be the type binding to a key type,
157+
and the `V` should be the type binding to a *Struct* type representing the value fields of each row.
158+
When the specific type annotation is not provided,
159+
the key type is bound to a tuple with its key parts when it's a *Struct* type, the value type is bound to `dict[str, Any]`.
160+
161+
156162
For example, you can use `dict[str, Person]` or `dict[str, PersonTuple]` to represent a *KTable*, with 4 columns: key (*Str*), `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
163+
It's bound to `dict[str, dict[str, Any]]` if you don't annotate the function argument with a specific type.
157164

158165
Note that if you want to use a *Struct* as the key, you need to ensure its value in Python is immutable. For `dataclass`, annotate it with `@dataclass(frozen=True)`. For `NamedTuple`, immutability is built-in. For example:
159-
For example:
160166

161167
```python
162168
@dataclass(frozen=True)
@@ -170,14 +176,16 @@ class PersonKeyTuple(NamedTuple):
170176
```
171177

172178
Then you can use `dict[PersonKey, Person]` or `dict[PersonKeyTuple, PersonTuple]` to represent a KTable keyed by `PersonKey` or `PersonKeyTuple`.
179+
It's bound to `dict[(str, str), dict[str, Any]]` if you don't annotate the function argument with a specific type.
173180

174181

175182
#### LTable
176183

177184
*LTable* is a *Table* type whose row order is preserved. *LTable* has no key column.
178185

179-
In Python, a *LTable* type is represented by `list[R]`, where `R` is a dataclass representing a row.
186+
In Python, a *LTable* type is represented by `list[R]`, where `R` is the type binding to the *Struct* type representing the value fields of each row.
180187
For example, you can use `list[Person]` to represent a *LTable* with 3 columns: `first_name` (*Str*), `last_name` (*Str*), `dob` (*Date*).
188+
It's bound to `list[dict[str, Any]]` if you don't annotate the function argument with a specific type.
181189

182190
## Key Types
183191

0 commit comments

Comments
 (0)