Skip to content

Commit 9a97191

Browse files
authored
refactor(targets): rename storage to target (#625)
* docs(targets): rename `storage` to `target` * refactor(targets): update Python to use `storages` instead of `targets` * refactor(targets): rename `storages` to `targets` for remaining code * examples: revert upgrades for now - will bring back after release
1 parent 96e704e commit 9a97191

28 files changed

+99
-90
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
111111
# Export collected data to a vector index.
112112
doc_embeddings.export(
113113
"doc_embeddings",
114-
cocoindex.storages.Postgres(),
114+
cocoindex.targets.Postgres(),
115115
primary_key_fields=["filename", "location"],
116116
vector_indexes=[
117117
cocoindex.VectorIndexDef(

docs/docs/core/basics.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ An **index** is a collection of data stored in a way that is easy for retrieval.
99

1010
CocoIndex is an ETL framework for building indexes from specified data sources, a.k.a. **indexing**. It also offers utilities for users to retrieve data from the indexes.
1111

12-
An **indexing flow** extracts data from specified data sources, upon specified transformations, and puts the transformed data into specified storage for later retrieval.
12+
An **indexing flow** extracts data from specified data sources, upon specified transformations, and puts the transformed data into specified target for later retrieval.
1313

1414
## Indexing flow elements
1515

1616
An indexing flow has two aspects: data and operations on data.
1717

1818
### Data
1919

20-
An indexing flow involves source data and transformed data (either as an intermediate result or the final result to be put into storage). All data within the indexing flow has **schema** determined at flow definition time.
20+
An indexing flow involves source data and transformed data (either as an intermediate result or the final result to be put into targets). All data within the indexing flow has **schema** determined at flow definition time.
2121

2222
Each piece of data has a **data type**, falling into one of the following categories:
2323

@@ -36,8 +36,8 @@ An **operation** in an indexing flow defines a step in the flow. An operation is
3636
* **Action**, which defines the behavior of the operation, e.g. *import*, *transform*, *for each*, *collect* and *export*.
3737
See [Flow Definition](flow_def) for more details for each action.
3838

39-
* Some actions (i.e. "import", "transform" and "export") require an **Operation Spec**, which describes the specific behavior of the operation, e.g. a source to import from, a function describing the transformation behavior, a target storage to export to (as an index).
40-
* Each operation spec has a **operation type**, e.g. `LocalFile` (data source), `SplitRecursively` (function), `SentenceTransformerEmbed` (function), `Postgres` (storage).
39+
* Some actions (i.e. "import", "transform" and "export") require an **Operation Spec**, which describes the specific behavior of the operation, e.g. a source to import from, a function describing the transformation behavior, a target to export to (as an index).
40+
* Each operation spec has a **operation type**, e.g. `LocalFile` (data source), `SplitRecursively` (function), `SentenceTransformerEmbed` (function), `Postgres` (target).
4141
* CocoIndex framework maintains a set of supported operation types. Users can also implement their own.
4242

4343
"import" and "transform" operations produce output data, whose data type is determined based on the operation spec and data types of input data (for "transform" operation only).
@@ -62,11 +62,11 @@ This shows schema and example data for the indexing flow:
6262

6363
## Life cycle of an indexing flow
6464

65-
An indexing flow, once set up, maintains a long-lived relationship between data source and data in target storage. This means:
65+
An indexing flow, once set up, maintains a long-lived relationship between data source and target. This means:
6666

67-
1. The target storage created by the flow remain available for querying at any time
67+
1. The target created by the flow remain available for querying at any time
6868

69-
2. As source data changes (new data added, existing data updated or deleted), data in the target storage are updated to reflect those changes,
69+
2. As source data changes (new data added, existing data updated or deleted), data in the target are updated to reflect those changes,
7070
on certain pace, according to the update mode:
7171

7272
* **One time update**: Once triggered, CocoIndex updates the target data to reflect the version of source data up to the current moment.

docs/docs/core/cli.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ The following subcommands are available:
6161
| ---------- | ----------- |
6262
| `ls` | List all flows present in the given file/module. Or list all persisted flows under the current app namespace if no file/module specified. |
6363
| `show` | Show the spec and schema for a specific flow. |
64-
| `setup` | Check and apply backend setup changes for flows, including the internal and target storage (to export). |
64+
| `setup` | Check and apply backend setup changes for flows, including the internal storage and target (to export). |
6565
| `drop` | Drop the backend setup for specified flows. |
6666
| `update` | Update the index defined by the flow. |
6767
| `evaluate` | Evaluate the flow and dump flow outputs to files. Instead of updating the index, it dumps what should be indexed to files. Mainly used for evaluation purpose. |

docs/docs/core/data_types.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ This makes schema of data processed by CocoIndex clear, and easily determine the
1313

1414
You don't need to spell out data types in CocoIndex, when you define the flow using existing operations (source, function, etc).
1515
These operations decide data types of fields produced by them based on the spec and input data types.
16-
All you need to do is to make sure the data passed to functions and storage targets are accepted by them.
16+
All you need to do is to make sure the data passed to functions and targets are accepted by them.
1717

1818
When you define [custom functions](/docs/core/custom_function), you need to specify the data types of arguments and return values.
1919

@@ -40,7 +40,7 @@ This is the list of all basic types supported by CocoIndex:
4040
| Vector[*T*, *Dim*?] | *T* can be a basic type or a numeric type. *Dim* is a positive integer and optional. | `cocoindex.Vector[T]` or `cocoindex.Vector[T, Dim]` | `numpy.typing.NDArray[T]` or `list[T]` |
4141

4242
Values of all data types can be represented by values in Python's native types (as described under the Native Python Type column).
43-
However, the underlying execution engine and some storage system (like Postgres) has finer distinctions for some types, specifically:
43+
However, the underlying execution engine has finer distinctions for some types, specifically:
4444

4545
* *Float32* and *Float64* for `float`, with different precision.
4646
* *LocalDateTime* and *OffsetDateTime* for `datetime.datetime`, with different timezone awareness.
@@ -50,7 +50,7 @@ However, the underlying execution engine and some storage system (like Postgres)
5050

5151
The native Python type is always more permissive and can represent a superset of possible values.
5252
* Only when you annotate the return type of a custom function, you should use the specific type,
53-
so that CocoIndex will have information about the precise type to be used in the execution engine and storage system.
53+
so that CocoIndex will have information about the precise type to be used in the execution engine and target.
5454
* For all other purposes, e.g. to provide annotation for argument types of a custom function, or used internally in your custom function,
5555
you can choose whatever to use.
5656
The native Python type is usually simpler.

docs/docs/core/flow_def.mdx

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: Flow Definition
3-
description: Define a CocoIndex flow, by specifying source, transformations and storages, and connect input/output data of them.
3+
description: Define a CocoIndex flow, by specifying source, transformations and targets, and connect input/output data of them.
44
---
55

66
import Tabs from '@theme/Tabs';
77
import TabItem from '@theme/TabItem';
88

99
# CocoIndex Flow Definition
1010

11-
In CocoIndex, to define an indexing flow, you provide a function to import source, transform data and put them into target storage (sinks).
11+
In CocoIndex, to define an indexing flow, you provide a function to import source, transform data and put them into targets.
1212
You connect input/output of these operations with fields of data scopes.
1313

1414
## Entry Point
@@ -246,14 +246,14 @@ and generates a `id` field with UUID and remains stable when `filename` and `sum
246246

247247
### Export
248248

249-
The `export()` method exports the collected data to an external storage.
249+
The `export()` method exports the collected data to an external target.
250250

251-
A *storage spec* needs to be provided for any export operation, to describe the storage and parameters related to the storage.
251+
A *target spec* needs to be provided for any export operation, to describe the target and parameters related to the target.
252252

253253
Export must happen at the top level of a flow, i.e. not within any child scopes created by "for each row". It takes the following arguments:
254254

255255
* `name`: the name to identify the export target.
256-
* `target_spec`: the storage spec as the export target.
256+
* `target_spec`: the target spec as the export target.
257257
* `setup_by_user` (optional):
258258
whether the export target is setup by user.
259259
By default, CocoIndex is managing the target setup (surfaced by the `cocoindex setup` CLI subcommand), e.g. create related tables/collections/etc. with compatible schema, and update them upon change.
@@ -270,22 +270,22 @@ def demo_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataSco
270270
demo_collector = data_scope.add_collector()
271271
...
272272
demo_collector.export(
273-
"demo_storage", DemoStorageSpec(...),
273+
"demo_target", DemoTargetSpec(...),
274274
primary_key_fields=["field1"],
275275
vector_indexes=[cocoindex.VectorIndexDef("field2", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])
276276
```
277277

278278
</TabItem>
279279
</Tabs>
280280

281-
The target storage is managed by CocoIndex, i.e. it'll be created by [CocoIndex CLI](/docs/core/cli) when you run `cocoindex setup`, and the data will be automatically updated (including stale data removal) when updating the index.
282-
The `name` for the same storage should remain stable across different runs.
283-
If it changes, CocoIndex will treat it as an old storage removed and a new one created, and perform setup changes and reindexing accordingly.
281+
The target is managed by CocoIndex, i.e. it'll be created by [CocoIndex CLI](/docs/core/cli) when you run `cocoindex setup`, and the data will be automatically updated (including stale data removal) when updating the index.
282+
The `name` for the same target should remain stable across different runs.
283+
If it changes, CocoIndex will treat it as an old target removed and a new one created, and perform setup changes and reindexing accordingly.
284284

285285
## Storage Indexes
286286

287-
Many storage supports indexes, to boost efficiency in retrieving data.
288-
CocoIndex provides a common way to configure indexes for various storages.
287+
Many targets are storage systems supporting indexes, to boost efficiency in retrieving data.
288+
CocoIndex provides a common way to configure indexes for various targets.
289289

290290
### Primary Key
291291

@@ -330,7 +330,7 @@ For example,
330330
```python
331331
doc_embeddings.export(
332332
"doc_embeddings",
333-
cocoindex.storages.Qdrant(
333+
cocoindex.targets.Qdrant(
334334
collection_name=cocoindex.get_app_namespace(trailing_delimiter='__') + "doc_embeddings",
335335
...
336336
),
@@ -345,8 +345,8 @@ It will use `Staging__doc_embeddings` as the collection name if the current app
345345

346346
### Target Declarations
347347

348-
Most time a target storage is created by calling `export()` method on a collector, and this `export()` call comes with configurations needed for the target storage, e.g. options for storage indexes.
349-
Occasionally, you may need to specify some configurations for target storage out of the context of any specific data collector.
348+
Most time a target is created by calling `export()` method on a collector, and this `export()` call comes with configurations needed for the target, e.g. options for storage indexes.
349+
Occasionally, you may need to specify some configurations for the target out of the context of any specific data collector.
350350

351351
For example, for graph database targets like `Neo4j` and `Kuzu`, you may have a data collector to export data to relationships, which will create nodes referenced by various relationships in turn.
352352
These nodes don't directly come from any specific data collector (consider relationships from different data collectors may share the same nodes).
@@ -359,7 +359,7 @@ To specify configurations for these nodes, you can *declare* spec for related no
359359

360360
```python
361361
flow_builder.declare(
362-
cocoindex.storages.Neo4jDeclarations(...)
362+
cocoindex.targets.Neo4jDeclarations(...)
363363
)
364364
```
365365

@@ -389,7 +389,7 @@ You can add an auth entry by `cocoindex.add_auth_entry()` function, which return
389389
```python
390390
my_graph_conn = cocoindex.add_auth_entry(
391391
"my_graph_conn",
392-
cocoindex.storages.Neo4jConnectionSpec(
392+
cocoindex.targets.Neo4jConnectionSpec(
393393
uri="bolt://localhost:7687",
394394
user="neo4j",
395395
password="cocoindex",
@@ -403,7 +403,7 @@ Then reference it when building a spec that takes an auth entry:
403403
```python
404404
demo_collector.export(
405405
"MyGraph",
406-
cocoindex.storages.Neo4jRelationship(connection=my_graph_conn, ...)
406+
cocoindex.targets.Neo4jRelationship(connection=my_graph_conn, ...)
407407
)
408408
```
409409

@@ -412,7 +412,7 @@ Then reference it when building a spec that takes an auth entry:
412412
```python
413413
demo_collector.export(
414414
"MyGraph",
415-
cocoindex.storages.Neo4jRelationship(connection=cocoindex.ref_auth_entry("my_graph_conn"), ...))
415+
cocoindex.targets.Neo4jRelationship(connection=cocoindex.ref_auth_entry("my_graph_conn"), ...))
416416
```
417417

418418
</TabItem>

docs/docs/core/flow_methods.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Run a Flow
33
toc_max_heading_level: 4
4-
description: Run a CocoIndex Flow, including build / update data in the target storage and evaluate the flow without changing the target storage.
4+
description: Run a CocoIndex Flow, including build / update data in the target and evaluate the flow without changing the target.
55
---
66

77
import Tabs from '@theme/Tabs';
@@ -37,7 +37,7 @@ It creates a `demo_flow` object in `cocoindex.Flow` type.
3737

3838
## Build / update target data
3939

40-
The major goal of a flow is to perform the transformations on source data and build / update data in the target storage (the index).
40+
The major goal of a flow is to perform the transformations on source data and build / update data in the target.
4141
This action has two modes:
4242

4343
* **One time update.**
@@ -53,7 +53,7 @@ This action has two modes:
5353
:::info
5454

5555
For both modes, CocoIndex is performing *incremental processing*,
56-
i.e. we only perform computations and storage mutations on source data that are changed, or the flow has changed.
56+
i.e. we only perform computations and target mutations on source data that are changed, or the flow has changed.
5757
This is to achieve best efficiency.
5858

5959
:::
@@ -63,7 +63,7 @@ This is to achieve best efficiency.
6363

6464
#### CLI
6565

66-
The `cocoindex update` subcommand creates/updates data in the target storage.
66+
The `cocoindex update` subcommand creates/updates data in the target.
6767

6868
Once it's done, the target data is fresh up to the moment when the function is called.
6969

@@ -76,7 +76,7 @@ cocoindex update main.py
7676
<Tabs>
7777
<TabItem value="python" label="Python">
7878

79-
The `update()` async method creates/updates data in the target storage.
79+
The `update()` async method creates/updates data in the target.
8080

8181
Once the function returns, the target data is fresh up to the moment when the function is called.
8282

@@ -207,7 +207,7 @@ CocoIndex also provides asynchronous versions of APIs for blocking operations, i
207207

208208
## Evaluate the flow
209209

210-
CocoIndex allows you to run the transformations defined by the flow without updating the target storage.
210+
CocoIndex allows you to run the transformations defined by the flow without updating the target.
211211

212212
### CLI
213213

docs/docs/getting_started/quickstart.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoind
8787
# Export collected data to a vector index.
8888
doc_embeddings.export(
8989
"doc_embeddings",
90-
cocoindex.storages.Postgres(),
90+
cocoindex.targets.Postgres(),
9191
primary_key_fields=["filename", "location"],
9292
vector_indexes=[
9393
cocoindex.VectorIndexDef(
@@ -214,7 +214,7 @@ from pgvector.psycopg import register_vector
214214
215215
def search(pool: ConnectionPool, query: str, top_k: int = 5):
216216
# Get the table name, for the export target in the text_embedding_flow above.
217-
table_name = cocoindex.utils.get_target_storage_default_name(text_embedding_flow, "doc_embeddings")
217+
table_name = cocoindex.utils.get_target_default_name(text_embedding_flow, "doc_embeddings")
218218
# Evaluate the transform flow defined above with the input query, to get the embedding.
219219
query_vector = text_to_embedding.eval(query)
220220
# Run the query and get the results.
@@ -237,7 +237,7 @@ There're two CocoIndex-specific logic:
237237
1. Get the table name from the export target in the `text_embedding_flow` above.
238238
Since the table name for the `Postgres` target is not explicitly specified in the `export()` call,
239239
CocoIndex uses a default name.
240-
`cocoindex.utils.get_target_storage_default_name()` is a utility function to get the default table name for this case.
240+
`cocoindex.utils.get_target_default_name()` is a utility function to get the default table name for this case.
241241
242242
2. Evaluate the transform flow defined above with the input query, to get the embedding.
243243
It's done by the `eval()` method of the transform flow `text_to_embedding`.

0 commit comments

Comments
 (0)