Skip to content

Commit b1ab6d2

Browse files
authored
Update docs for custom functions: add standalone functions. (#149)
1 parent 85c6e58 commit b1ab6d2

File tree

1 file changed

+109
-39
lines changed

1 file changed

+109
-39
lines changed

docs/docs/core/custom_function.mdx

Lines changed: 109 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,57 @@ description: Build Custom Functions
66
import Tabs from '@theme/Tabs';
77
import TabItem from '@theme/TabItem';
88

9-
# Build Custom Functions
9+
A custom function can be defined in one of the following ways:
1010

11-
To build a custom function, you need to define a function spec and an executor.
11+
* A standalone function. It's simpler and doesn't allow additional configurations and setup logic.
12+
* A function spec and an executor. It's more powerful, allows additional configurations and setup logic.
1213

13-
## Function Spec
14+
## Option 1: By a standalone function
1415

15-
The function spec of a function defines the function's parameters.
16-
These parameters configures behavior of a specific instance of the function.
16+
It fits into simple cases that the function doesn't need to take additional configurations and extra setup logic.
17+
18+
<Tabs>
19+
<TabItem value="python" label="Python" default>
20+
21+
The standalone function needs to be decorated by `@cocoindex.op.function()`, like this:
22+
23+
```python
24+
@cocoindex.op.function(...)
25+
def compute_something(arg1: str, arg2: int | None = None) -> str:
26+
"""
27+
Documentation for the function.
28+
"""
29+
...
30+
```
31+
32+
Notes:
33+
34+
* The `cocoindex.op.function()` function decorator also takes optional parameters.
35+
See [Parameters for custom functions](#parameters-for-custom-functions) for details.
36+
* Types of arugments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields.
37+
See [Data Types](/docs/core/data_types) for supported types.
38+
39+
</TabItem>
40+
</Tabs>
41+
42+
### Examples
43+
44+
The cocoindex repository contains the following examples of custom functions defined in this way:
45+
46+
* In the [code_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/code_embedding/main.py) example,
47+
`extract_extension` is a custom function to extract the extension of a file name.
48+
* In the [manuals_llm_extraction](https://github.com/cocoindex-io/cocoindex/blob/main/examples/manuals_llm_extraction/main.py) example,
49+
`summarize_manuals` is a custom function to summarize structured information of a manual page.
50+
51+
52+
## Option 2: By a function spec and an executor
53+
54+
This is more advanced and flexible way to define a custom function.
55+
It allows a function to be configured with the function spec, and allow preparation logic before execution, e.g. initialize a model based on the spec.
56+
57+
### Function Spec
58+
59+
The function spec of a function configures behavior of a specific instance of the function.
1760
When you use this function in a flow (typically by a [`transform()`](/docs/core/flow_def#transform)), you instantiate this function spec, with specific parameter values.
1861

1962
<Tabs>
@@ -22,8 +65,7 @@ When you use this function in a flow (typically by a [`transform()`](/docs/core/
2265
A function spec is defined as a class that inherits from `cocoindex.op.FunctionSpec`.
2366

2467
```python
25-
26-
class DemoFunctionSpec(cocoindex.op.FunctionSpec):
68+
class ComputeSomething(cocoindex.op.FunctionSpec):
2769
"""
2870
Documentation for the function.
2971
"""
@@ -40,69 +82,97 @@ Notes:
4082
</Tabs>
4183

4284

43-
## Function Executor
85+
### Function Executor
4486

4587
A function executor defines behavior of a function. It's initantiated for each operation that uses this function.
4688

4789
The function executor is responsible for:
4890

49-
* *Prepare* for the function execution, based on the spec. It happens once and only once before execution. e.g. if the function calls a machine learning model, the model name can be a parameter as a field of the spec, and we may load the model in this phase.
50-
* *Run* the function, for each specific input arguments. This happens multiple times, for each specific rows of data.
91+
* *Prepare* for the function execution, based on the spec.
92+
It happens once and only once before execution.
93+
e.g. if the function calls a machine learning model, the model name can be a parameter as a field of the spec, and we may load the model in this phase.
94+
* *Run* the function, for each specific input arguments. This happens multiple times, for each specific row of data.
5195

5296
<Tabs>
5397
<TabItem value="python" label="Python" default>
5498

55-
A function executor is defined as a class annotated by `@cocoindex.op.executor_class()`.
99+
A function executor is defined as a class decorated by `@cocoindex.op.executor_class()`.
56100

57101

58102
```python
59103
@cocoindex.op.executor_class(...)
60-
class DemoFunctionExecutor:
61-
spec: DemoFunctionSpec
104+
class ComputeSomethingExecutor:
105+
spec: ComputeSomething
62106
...
63107

64108
def prepare(self) -> None:
65109
...
66110

67-
def __call__(self, input_value: str) -> str:
111+
def __call__(self, arg1: str, arg2: int | None = None) -> str:
68112
...
69113
```
70114

71115
Notes:
72116

73-
* The `cocoindex.op.executor_class()` class decorator also takes the following optional arguments:
74-
75-
* `gpu: bool`: Whether the executor will use GPU. It will affect the way the function is scheduled.
76-
77-
* `cache: bool`: Whether the executor will enable cache for this function.
78-
When `True`, the executor will cache the result of the function for reuse during reprocessing.
79-
We recommend to set this to `True` for any function that is computationally intensive.
80-
81-
* `behavior_version: int`: The version of the behavior of the function.
82-
When the version is changed, the function will be re-executed even if cache is enabled.
83-
It's required to be set if `cache` is `True`.
84-
85-
For example, this enables cache for the function:
86-
87-
```python
88-
@cocoindex.op.executor_class(cache=True, behavior_version=1)
89-
class DemoFunctionExecutor:
90-
...
91-
```
117+
* The `cocoindex.op.executor_class()` class decorator also takes optional parameters.
118+
See [Parameters for custom functions](#parameters-for-custom-functions) for details.
92119

93120
* A `spec` field must be present in the class, and must be annoated with the spec class name.
94121
* The `prepare()` method is optional. It's executed once and only once before any `__call__` execution, to prepare the function execution.
95122
* The `__call__()` method is required. It's executed for each specific rows of data.
96-
Types of arugments and the return value must be annotated, so that CocoIndex will have information about data types of the operation's output fields. See [Data Types](/docs/core/data_types) for supported types.
123+
Types of arugments and the return value must be decorated, so that CocoIndex will have information about data types of the operation's output fields.
124+
See [Data Types](/docs/core/data_types) for supported types.
97125

98126
</TabItem>
99127
</Tabs>
100128

101-
## Examples
129+
### Examples
102130

103-
The cocoindex repository contains the following examples of custom functions:
131+
The cocoindex repository contains the following examples of custom functions defined in this way:
104132

105-
* In the [pdf_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/pdf_embedding/pdf_embedding.py) example, we define a custom function `PdfToMarkdown`
133+
* In the [pdf_embedding](https://github.com/cocoindex-io/cocoindex/blob/main/examples/pdf_embedding/main.py) example, we define a custom function `PdfToMarkdown`
106134
* The `SentenceTransformerEmbed` function shipped with the CocoIndex Python package is defined by Python SDK.
107-
Search for [`SentenceTransformerEmbedExecutor`](https://github.com/search?q=repo%3Acocoindex-io%2Fcocoindex+SentenceTransformerEmbedExecutor&type=code) to see the code.
108-
135+
Search for [`SentenceTransformerEmbedExecutor`](https://github.com/search?q=repo%3Acocoindex-io%2Fcocoindex+lang%3Apython+SentenceTransformerEmbedExecutor&type=code) to see the code.
136+
137+
## Parameters for custom functions
138+
139+
Custom functions take the following additional parameters:
140+
141+
* `gpu: bool`: Whether the executor will use GPU. It will affect the way the function is scheduled.
142+
143+
* `cache: bool`: Whether the executor will enable cache for this function.
144+
When `True`, the executor will cache the result of the function for reuse during reprocessing.
145+
We recommend to set this to `True` for any function that is computationally intensive.
146+
147+
* `behavior_version: int`: The version of the behavior of the function.
148+
When the version is changed, the function will be re-executed even if cache is enabled.
149+
It's required to be set if `cache` is `True`.
150+
151+
For example:
152+
153+
<Tabs>
154+
<TabItem value="python" label="Python" default>
155+
156+
This enables cache for a standalone function:
157+
158+
```python
159+
@cocoindex.op.function(cache=True, behavior_version=1)
160+
def compute_something(arg1: str, arg2: int | None = None) -> str:
161+
...
162+
```
163+
164+
This enables cache for a function defined by a spec and an executor:
165+
166+
```python
167+
class ComputeSomething(cocoindex.op.FunctionSpec):
168+
...
169+
170+
@cocoindex.op.executor_class(cache=True, behavior_version=1)
171+
class ComputeSomethingExecutor:
172+
spec: ComputeSomething
173+
174+
...
175+
```
176+
177+
</TabItem>
178+
</Tabs>

0 commit comments

Comments
 (0)