Skip to content

Commit d3c6aa1

Browse files
authored
feat(settings): support @cocoindex.settings and make init() optional (#993)
* refactor(init): make lib initiazation logic async * feat(settings): support `@cocoindex.settings` and make `init()` optional
1 parent 6ffb322 commit d3c6aa1

File tree

13 files changed

+257
-113
lines changed

13 files changed

+257
-113
lines changed

docs/docs/core/settings.mdx

Lines changed: 63 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -10,49 +10,82 @@ import TabItem from '@theme/TabItem';
1010

1111
Certain settings need to be provided for CocoIndex to work, e.g. database connections, app namespace, etc.
1212

13-
## Launch CocoIndex
13+
## Configure CocoIndex Settings
1414

15-
You have two ways to launch CocoIndex:
15+
Note that in general, you have two ways to launch CocoIndex:
1616

17+
* Call CocoIndex APIs from your own Python application or library.
1718
* Use [Cocoindex CLI](cli). It's handy for most routine indexing building and management tasks.
18-
It will load settings from environment variables, either already set in your environment, or specified in `.env` file.
19-
See [CLI](cli#environment-variables) for more details.
2019

21-
* Call CocoIndex functionality from your own Python application or library.
22-
It's needed when you want to leverage CocoIndex support for query, or have your custom logic to trigger indexing, etc.
2320

24-
<Tabs>
25-
<TabItem value="python" label="Python" default>
21+
CocoIndex exposes process-level settings specified by `cocoindex.Settings` dataclass.
22+
Settings can be configured in three different ways.
23+
In the following sections, the later ones will override the earlier ones.
2624

27-
You need to explicitly call `cocoindex.init()` before doing anything with CocoIndex, and settings will be loaded during the call.
25+
### Environment Variables
2826

29-
* If it's called without any argument, it will load settings from environment variables.
30-
Only existing environment variables already set in your environment will be used.
31-
If you want to load environment variables from a specific `.env` file, consider call `load_dotenv()` provided by the [`python-dotenv`](https://github.com/theskumar/python-dotenv) package.
27+
The simplest approach is to set corresponding environment variables.
28+
See [List of Environment Variables](#list-of-environment-variables) for specific environment variables.
3229

33-
```py
34-
from dotenv import load_dotenv
35-
import cocoindex
30+
:::tip
3631

37-
load_dotenv()
38-
cocoindex.init()
39-
```
32+
You can consider place a `.env` file in your directory.
33+
The [CLI](cli#environment-variables) will load environment variables from the `.env` file (see [CLI](cli#environment-variables) for more details).
34+
From your own main module, you can also load environment variables with a package like [`python-dotenv`](https://github.com/theskumar/python-dotenv).
4035

41-
* It takes an optional `cocoindex.Settings` dataclass object as argument, so you can also construct settings explicitly and pass to it:
36+
:::
37+
38+
### Setting Function
39+
40+
A more flexible approach is to provide a setting function that returns a `cocoindex.Settings` dataclass object.
41+
The setting function can have any name, and needs to be decorated with the `@cocoindex.settings` decorator, for example:
42+
43+
```py
44+
@cocoindex.settings
45+
def cocoindex_settings() -> cocoindex.Settings:
46+
return cocoindex.Settings(
47+
database=cocoindex.DatabaseConnectionSpec(
48+
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
49+
)
50+
)
51+
```
52+
53+
This setting function will be called once when CocoIndex is initialized.
54+
Once the settings function is provided, environment variables will be ignored.
4255

43-
```py
44-
import cocoindex
56+
### `cocoindex.init()` function
4557

46-
cocoindex.init(
47-
cocoindex.Settings(
48-
database=cocoindex.DatabaseConnectionSpec(
49-
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
50-
)
51-
)
58+
You can also call `cocoindex.init()` with a `cocoindex.Settings` dataclass object as argument, for example:
59+
60+
```py
61+
cocoindex.init(
62+
cocoindex.Settings(
63+
database=cocoindex.DatabaseConnectionSpec(
64+
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
5265
)
53-
```
54-
</TabItem>
55-
</Tabs>
66+
)
67+
)
68+
```
69+
70+
For example, you can call it in the main function of your application.
71+
Once the `cocoindex.init()` is called with a `cocoindex.Settings` dataclass object as argument, the `@cocoindex.settings` function and environment variables will be ignored.
72+
73+
This is more flexible, as you can more easily construct `cocoindex.Settings` based on other stuffs you loaded earlier.
74+
But be careful that if you call `cocoindex.init()` only under the path of main (e.g. within `if __name__ == "__main__":` guard), it won't be executed when you're using CocoIndex CLI, as it won't execute your main logic.
75+
76+
:::info
77+
78+
`cocoindex.init()` is optional:
79+
80+
- You can call `cocoindex.init()` with a `cocoindex.Settings` dataclass object as argument, or without any argument.
81+
When without argument, the settings will be loaded from the `@cocoindex.settings` function or environment variables.
82+
83+
- You don't have to explicitly call `cocoindex.init()`.
84+
CocoIndex will be automatically initialized when needed, e.g. when any method of any flow is called the first time.
85+
But calling `cocoindex.init()` explicitly (usually at startup time, e.g. in the main function of your application) has the benefit of making sure CocoIndex library is initialized and any potential exceptions are raised earlier before proceeding with the application.
86+
If you need this clarity, you can call it explicitly even if you don't want to provide settings by the `cocoindex.init()` call.
87+
88+
:::
5689

5790
## List of Settings
5891

python/cocoindex/__init__.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,20 @@
66

77
from . import targets as storages # Deprecated: Use targets instead
88

9-
from .auth_registry import AuthEntryReference, add_auth_entry, add_transient_auth_entry
9+
from .auth_registry import (
10+
AuthEntryReference,
11+
add_auth_entry,
12+
add_transient_auth_entry,
13+
ref_auth_entry,
14+
)
1015
from .flow import FlowBuilder, DataScope, DataSlice, Flow, transform_flow
1116
from .flow import flow_def
1217
from .flow import EvaluateAndDumpOptions, GeneratedField
1318
from .flow import FlowLiveUpdater, FlowLiveUpdaterOptions, FlowUpdaterStatusUpdates
1419
from .flow import open_flow
1520
from .flow import add_flow_def, remove_flow # DEPRECATED
1621
from .flow import update_all_flows_async, setup_all_flows, drop_all_flows
17-
from .lib import init, start_server, stop
22+
from .lib import settings, init, start_server, stop
1823
from .llm import LlmSpec, LlmApiType
1924
from .index import VectorSimilarityMetric, VectorIndexDef, IndexOptions
2025
from .setting import DatabaseConnectionSpec, Settings, ServerSettings
@@ -65,6 +70,7 @@
6570
"setup_all_flows",
6671
"drop_all_flows",
6772
# Lib
73+
"settings",
6874
"init",
6975
"start_server",
7076
"stop",

python/cocoindex/cli.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,6 @@ def _load_user_app(app_target: str) -> None:
9191

9292

9393
def _initialize_cocoindex_in_process() -> None:
94-
settings = setting.Settings.from_env()
95-
lib.init(settings)
9694
atexit.register(lib.stop)
9795

9896

python/cocoindex/flow.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -694,16 +694,11 @@ class Flow:
694694
"""
695695

696696
_name: str
697-
_full_name: str
698697
_lazy_engine_flow: Callable[[], _engine.Flow] | None
699698

700-
def __init__(
701-
self, name: str, full_name: str, engine_flow_creator: Callable[[], _engine.Flow]
702-
):
699+
def __init__(self, name: str, engine_flow_creator: Callable[[], _engine.Flow]):
703700
validate_flow_name(name)
704-
validate_full_flow_name(full_name)
705701
self._name = name
706-
self._full_name = full_name
707702
engine_flow = None
708703
lock = Lock()
709704

@@ -762,7 +757,7 @@ def full_name(self) -> str:
762757
"""
763758
Get the full name of the flow.
764759
"""
765-
return self._full_name
760+
return get_flow_full_name(self._name)
766761

767762
def update(self, /, *, reexport_targets: bool = False) -> _engine.IndexUpdateInfo:
768763
"""
@@ -861,9 +856,10 @@ def _create_lazy_flow(
861856
The flow will be built the first time when it's really needed.
862857
"""
863858
flow_name = _flow_name_builder.build_name(name, prefix="_flow_")
864-
flow_full_name = get_flow_full_name(flow_name)
865859

866860
def _create_engine_flow() -> _engine.Flow:
861+
flow_full_name = get_flow_full_name(flow_name)
862+
validate_full_flow_name(flow_full_name)
867863
flow_builder_state = _FlowBuilderState(flow_full_name)
868864
root_scope = DataScope(
869865
flow_builder_state, flow_builder_state.engine_flow_builder.root_scope()
@@ -873,7 +869,7 @@ def _create_engine_flow() -> _engine.Flow:
873869
execution_context.event_loop
874870
)
875871

876-
return Flow(flow_name, flow_full_name, _create_engine_flow)
872+
return Flow(flow_name, _create_engine_flow)
877873

878874

879875
_flows_lock = Lock()

python/cocoindex/lib.py

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,57 @@
22
Library level functions and states.
33
"""
44

5+
import threading
56
import warnings
6-
from typing import Callable, Any
77

88
from . import _engine # type: ignore
99
from . import flow, setting
1010
from .convert import dump_engine_object
11+
from .validation import validate_app_namespace_name
12+
from typing import Any, Callable, overload
13+
14+
15+
def prepare_settings(settings: setting.Settings) -> Any:
16+
"""Prepare the settings for the engine."""
17+
if settings.app_namespace:
18+
validate_app_namespace_name(settings.app_namespace)
19+
return dump_engine_object(settings)
20+
21+
22+
_engine.set_settings_fn(lambda: prepare_settings(setting.Settings.from_env()))
23+
24+
25+
_prev_settings_fn: Callable[[], setting.Settings] | None = None
26+
_prev_settings_fn_lock: threading.Lock = threading.Lock()
27+
28+
29+
@overload
30+
def settings(fn: Callable[[], setting.Settings]) -> Callable[[], setting.Settings]: ...
31+
@overload
32+
def settings(
33+
fn: None,
34+
) -> Callable[[Callable[[], setting.Settings]], Callable[[], setting.Settings]]: ...
35+
def settings(fn: Callable[[], setting.Settings] | None = None) -> Any:
36+
"""
37+
Decorate a function that returns a settings.Settings object.
38+
It registers the function as a settings provider.
39+
"""
40+
41+
def _inner(fn: Callable[[], setting.Settings]) -> Callable[[], setting.Settings]:
42+
global _prev_settings_fn # pylint: disable=global-statement
43+
with _prev_settings_fn_lock:
44+
if _prev_settings_fn is not None:
45+
warnings.warn(
46+
f"Setting a new settings function will override the previous one {_prev_settings_fn}."
47+
)
48+
_prev_settings_fn = fn
49+
_engine.set_settings_fn(lambda: prepare_settings(fn()))
50+
return fn
51+
52+
if fn is not None:
53+
return _inner(fn)
54+
else:
55+
return _inner
1156

1257

1358
def init(settings: setting.Settings | None = None) -> None:
@@ -16,9 +61,7 @@ def init(settings: setting.Settings | None = None) -> None:
1661
1762
If the settings are not provided, they are loaded from the environment variables.
1863
"""
19-
settings = settings or setting.Settings.from_env()
20-
_engine.init(dump_engine_object(settings))
21-
setting.set_app_namespace(settings.app_namespace)
64+
_engine.init(prepare_settings(settings) if settings is not None else None)
2265

2366

2467
def start_server(settings: setting.ServerSettings) -> None:

python/cocoindex/setting.py

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,15 @@
66

77
from typing import Callable, Self, Any, overload
88
from dataclasses import dataclass
9-
from .validation import validate_app_namespace_name
10-
11-
_app_namespace: str = ""
9+
from . import _engine # type: ignore
1210

1311

1412
def get_app_namespace(*, trailing_delimiter: str | None = None) -> str:
1513
"""Get the application namespace. Append the `trailing_delimiter` if not empty."""
16-
if _app_namespace == "" or trailing_delimiter is None:
17-
return _app_namespace
18-
return f"{_app_namespace}{trailing_delimiter}"
14+
app_namespace: str = _engine.get_app_namespace()
15+
if app_namespace == "" or trailing_delimiter is None:
16+
return app_namespace
17+
return f"{app_namespace}{trailing_delimiter}"
1918

2019

2120
def split_app_namespace(full_name: str, delimiter: str) -> tuple[str, str]:
@@ -26,14 +25,6 @@ def split_app_namespace(full_name: str, delimiter: str) -> tuple[str, str]:
2625
return (parts[0], parts[1])
2726

2827

29-
def set_app_namespace(app_namespace: str) -> None:
30-
"""Set the application namespace."""
31-
if app_namespace:
32-
validate_app_namespace_name(app_namespace)
33-
global _app_namespace # pylint: disable=global-statement
34-
_app_namespace = app_namespace
35-
36-
3728
@dataclass
3829
class DatabaseConnectionSpec:
3930
"""

src/builder/analyzer.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -995,7 +995,7 @@ pub async fn analyze_flow(
995995
impl Future<Output = Result<ExecutionPlan>> + Send + use<>,
996996
)> {
997997
let analyzer_ctx = AnalyzerContext {
998-
lib_ctx: get_lib_context()?,
998+
lib_ctx: get_lib_context().await?,
999999
flow_ctx,
10001000
};
10011001
let root_data_scope = Arc::new(Mutex::new(DataScopeBuilder::new()));
@@ -1109,7 +1109,7 @@ pub async fn analyze_transient_flow<'a>(
11091109
)> {
11101110
let mut root_data_scope = DataScopeBuilder::new();
11111111
let analyzer_ctx = AnalyzerContext {
1112-
lib_ctx: get_lib_context()?,
1112+
lib_ctx: get_lib_context().await?,
11131113
flow_ctx,
11141114
};
11151115
let mut input_fields = vec![];

src/builder/flow_builder.rs

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -247,8 +247,12 @@ pub struct FlowBuilder {
247247
#[pymethods]
248248
impl FlowBuilder {
249249
#[new]
250-
pub fn new(name: &str) -> PyResult<Self> {
251-
let lib_context = get_lib_context().into_py_result()?;
250+
pub fn new(py: Python<'_>, name: &str) -> PyResult<Self> {
251+
let lib_context = py
252+
.allow_threads(|| -> anyhow::Result<Arc<LibContext>> {
253+
get_runtime().block_on(get_lib_context())
254+
})
255+
.into_py_result()?;
252256
let root_op_scope = OpScope::new(
253257
spec::ROOT_SCOPE_NAME.to_string(),
254258
None,

src/execution/db_tracking_setup.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ impl ResourceSetupChange for TrackingTableSetupChange {
252252

253253
impl TrackingTableSetupChange {
254254
pub async fn apply_change(&self) -> Result<()> {
255-
let lib_context = get_lib_context()?;
255+
let lib_context = get_lib_context().await?;
256256
let pool = lib_context.require_builtin_db_pool()?;
257257
if let Some(desired) = &self.desired_state {
258258
for lagacy_name in self.legacy_tracking_table_names.iter() {

0 commit comments

Comments
 (0)