You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* The `cocoindex.op.function()` function decorator also takes optional parameters.
35
+
See [Parameters for custom functions](#parameters-for-custom-functions) for details.
36
+
* Types of arugments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields.
37
+
See [Data Types](/docs/core/data_types) for supported types.
38
+
39
+
</TabItem>
40
+
</Tabs>
41
+
42
+
### Examples
43
+
44
+
The cocoindex repository contains the following examples of custom functions defined in this way:
45
+
46
+
* In the [code_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/code_embedding/main.py) example,
47
+
`extract_extension` is a custom function to extract the extension of a file name.
48
+
* In the [manuals_llm_extraction](https://github.com/cocoindex-io/cocoindex/blob/main/examples/manuals_llm_extraction/main.py) example,
49
+
`summarize_manuals` is a custom function to summarize structured information of a manual page.
50
+
51
+
52
+
## Option 2: By a function spec and an executor
53
+
54
+
This is more advanced and flexible way to define a custom function.
55
+
It allows a function to be configured with the function spec, and allow preparation logic before execution, e.g. initialize a model based on the spec.
56
+
57
+
### Function Spec
58
+
59
+
The function spec of a function configures behavior of a specific instance of the function.
17
60
When you use this function in a flow (typically by a [`transform()`](/docs/core/flow_def#transform)), you instantiate this function spec, with specific parameter values.
18
61
19
62
<Tabs>
@@ -22,8 +65,7 @@ When you use this function in a flow (typically by a [`transform()`](/docs/core/
22
65
A function spec is defined as a class that inherits from `cocoindex.op.FunctionSpec`.
23
66
24
67
```python
25
-
26
-
classDemoFunctionSpec(cocoindex.op.FunctionSpec):
68
+
classComputeSomething(cocoindex.op.FunctionSpec):
27
69
"""
28
70
Documentation for the function.
29
71
"""
@@ -40,69 +82,97 @@ Notes:
40
82
</Tabs>
41
83
42
84
43
-
## Function Executor
85
+
###Function Executor
44
86
45
87
A function executor defines behavior of a function. It's initantiated for each operation that uses this function.
46
88
47
89
The function executor is responsible for:
48
90
49
-
**Prepare* for the function execution, based on the spec. It happens once and only once before execution. e.g. if the function calls a machine learning model, the model name can be a parameter as a field of the spec, and we may load the model in this phase.
50
-
**Run* the function, for each specific input arguments. This happens multiple times, for each specific rows of data.
91
+
**Prepare* for the function execution, based on the spec.
92
+
It happens once and only once before execution.
93
+
e.g. if the function calls a machine learning model, the model name can be a parameter as a field of the spec, and we may load the model in this phase.
94
+
**Run* the function, for each specific input arguments. This happens multiple times, for each specific row of data.
51
95
52
96
<Tabs>
53
97
<TabItemvalue="python"label="Python"default>
54
98
55
-
A function executor is defined as a class annotated by `@cocoindex.op.executor_class()`.
99
+
A function executor is defined as a class decorated by `@cocoindex.op.executor_class()`.
* The `cocoindex.op.executor_class()` class decorator also takes optional parameters.
118
+
See [Parameters for custom functions](#parameters-for-custom-functions) for details.
92
119
93
120
* A `spec` field must be present in the class, and must be annoated with the spec class name.
94
121
* The `prepare()` method is optional. It's executed once and only once before any `__call__` execution, to prepare the function execution.
95
122
* The `__call__()` method is required. It's executed for each specific rows of data.
96
-
Types of arugments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields. See [Data Types](/docs/core/data_types) for supported types.
123
+
Types of arugments and the return value must be decorated, so that CocoIndex will have information about data types of the operation's output fields.
124
+
See [Data Types](/docs/core/data_types) for supported types.
97
125
98
126
</TabItem>
99
127
</Tabs>
100
128
101
-
## Examples
129
+
###Examples
102
130
103
-
The cocoindex repository contains the following examples of custom functions:
131
+
The cocoindex repository contains the following examples of custom functions defined in this way:
104
132
105
-
* In the [pdf_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/pdf_embedding/pdf_embedding.py) example, we define a custom function `PdfToMarkdown`
133
+
* In the [pdf_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/pdf_embedding/main.py) example, we define a custom function `PdfToMarkdown`
106
134
* The `SentenceTransformerEmbed` function shipped with the CocoIndex Python package is defined by Python SDK.
107
-
Search for [`SentenceTransformerEmbedExecutor`](https://github.com/search?q=repo%3Acocoindex-io%2Fcocoindex+SentenceTransformerEmbedExecutor&type=code) to see the code.
108
-
135
+
Search for [`SentenceTransformerEmbedExecutor`](https://github.com/search?q=repo%3Acocoindex-io%2Fcocoindex+lang%3Apython+SentenceTransformerEmbedExecutor&type=code) to see the code.
136
+
137
+
## Parameters for custom functions
138
+
139
+
Custom functions take the following additional parameters:
140
+
141
+
*`gpu: bool`: Whether the executor will use GPU. It will affect the way the function is scheduled.
142
+
143
+
*`cache: bool`: Whether the executor will enable cache for this function.
144
+
When `True`, the executor will cache the result of the function for reuse during reprocessing.
145
+
We recommend to set this to `True` for any function that is computationally intensive.
146
+
147
+
*`behavior_version: int`: The version of the behavior of the function.
148
+
When the version is changed, the function will be re-executed even if cache is enabled.
0 commit comments