Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 63 additions & 30 deletions docs/docs/core/settings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,49 +10,82 @@ import TabItem from '@theme/TabItem';

Certain settings need to be provided for CocoIndex to work, e.g. database connections, app namespace, etc.

## Launch CocoIndex
## Configure CocoIndex Settings

You have two ways to launch CocoIndex:
Note that in general, you have two ways to launch CocoIndex:

* Call CocoIndex APIs from your own Python application or library.
* Use [Cocoindex CLI](cli). It's handy for most routine indexing building and management tasks.
It will load settings from environment variables, either already set in your environment, or specified in `.env` file.
See [CLI](cli#environment-variables) for more details.

* Call CocoIndex functionality from your own Python application or library.
It's needed when you want to leverage CocoIndex support for query, or have your custom logic to trigger indexing, etc.

<Tabs>
<TabItem value="python" label="Python" default>
CocoIndex exposes process-level settings specified by `cocoindex.Settings` dataclass.
Settings can be configured in three different ways.
In the following sections, the later ones will override the earlier ones.

You need to explicitly call `cocoindex.init()` before doing anything with CocoIndex, and settings will be loaded during the call.
### Environment Variables

* If it's called without any argument, it will load settings from environment variables.
Only existing environment variables already set in your environment will be used.
If you want to load environment variables from a specific `.env` file, consider call `load_dotenv()` provided by the [`python-dotenv`](https://github.com/theskumar/python-dotenv) package.
The simplest approach is to set corresponding environment variables.
See [List of Environment Variables](#list-of-environment-variables) for specific environment variables.

```py
from dotenv import load_dotenv
import cocoindex
:::tip

load_dotenv()
cocoindex.init()
```
You can consider place a `.env` file in your directory.
The [CLI](cli#environment-variables) will load environment variables from the `.env` file (see [CLI](cli#environment-variables) for more details).
From your own main module, you can also load environment variables with a package like [`python-dotenv`](https://github.com/theskumar/python-dotenv).

* It takes an optional `cocoindex.Settings` dataclass object as argument, so you can also construct settings explicitly and pass to it:
:::

### Setting Function

A more flexible approach is to provide a setting function that returns a `cocoindex.Settings` dataclass object.
The setting function can have any name, and needs to be decorated with the `@cocoindex.settings` decorator, for example:

```py
@cocoindex.settings
def cocoindex_settings() -> cocoindex.Settings:
return cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
)
```

This setting function will be called once when CocoIndex is initialized.
Once the settings function is provided, environment variables will be ignored.

```py
import cocoindex
### `cocoindex.init()` function

cocoindex.init(
cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
)
You can also call `cocoindex.init()` with a `cocoindex.Settings` dataclass object as argument, for example:

```py
cocoindex.init(
cocoindex.Settings(
database=cocoindex.DatabaseConnectionSpec(
url="postgres://cocoindex:cocoindex@localhost/cocoindex"
)
```
</TabItem>
</Tabs>
)
)
```

For example, you can call it in the main function of your application.
Once the `cocoindex.init()` is called with a `cocoindex.Settings` dataclass object as argument, the `@cocoindex.settings` function and environment variables will be ignored.

This is more flexible, as you can more easily construct `cocoindex.Settings` based on other stuffs you loaded earlier.
But be careful that if you call `cocoindex.init()` only under the path of main (e.g. within `if __name__ == "__main__":` guard), it won't be executed when you're using CocoIndex CLI, as it won't execute your main logic.

:::info

`cocoindex.init()` is optional:

- You can call `cocoindex.init()` with a `cocoindex.Settings` dataclass object as argument, or without any argument.
When without argument, the settings will be loaded from the `@cocoindex.settings` function or environment variables.

- You don't have to explicitly call `cocoindex.init()`.
CocoIndex will be automatically initialized when needed, e.g. when any method of any flow is called the first time.
But calling `cocoindex.init()` explicitly (usually at startup time, e.g. in the main function of your application) has the benefit of making sure CocoIndex library is initialized and any potential exceptions are raised earlier before proceeding with the application.
If you need this clarity, you can call it explicitly even if you don't want to provide settings by the `cocoindex.init()` call.

:::

## List of Settings

Expand Down
10 changes: 8 additions & 2 deletions python/cocoindex/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,20 @@

from . import targets as storages # Deprecated: Use targets instead

from .auth_registry import AuthEntryReference, add_auth_entry, add_transient_auth_entry
from .auth_registry import (
AuthEntryReference,
add_auth_entry,
add_transient_auth_entry,
ref_auth_entry,
)
from .flow import FlowBuilder, DataScope, DataSlice, Flow, transform_flow
from .flow import flow_def
from .flow import EvaluateAndDumpOptions, GeneratedField
from .flow import FlowLiveUpdater, FlowLiveUpdaterOptions, FlowUpdaterStatusUpdates
from .flow import open_flow
from .flow import add_flow_def, remove_flow # DEPRECATED
from .flow import update_all_flows_async, setup_all_flows, drop_all_flows
from .lib import init, start_server, stop
from .lib import settings, init, start_server, stop
from .llm import LlmSpec, LlmApiType
from .index import VectorSimilarityMetric, VectorIndexDef, IndexOptions
from .setting import DatabaseConnectionSpec, Settings, ServerSettings
Expand Down Expand Up @@ -65,6 +70,7 @@
"setup_all_flows",
"drop_all_flows",
# Lib
"settings",
"init",
"start_server",
"stop",
Expand Down
2 changes: 0 additions & 2 deletions python/cocoindex/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,6 @@ def _load_user_app(app_target: str) -> None:


def _initialize_cocoindex_in_process() -> None:
settings = setting.Settings.from_env()
lib.init(settings)
atexit.register(lib.stop)


Expand Down
14 changes: 5 additions & 9 deletions python/cocoindex/flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -694,16 +694,11 @@ class Flow:
"""

_name: str
_full_name: str
_lazy_engine_flow: Callable[[], _engine.Flow] | None

def __init__(
self, name: str, full_name: str, engine_flow_creator: Callable[[], _engine.Flow]
):
def __init__(self, name: str, engine_flow_creator: Callable[[], _engine.Flow]):
validate_flow_name(name)
validate_full_flow_name(full_name)
self._name = name
self._full_name = full_name
engine_flow = None
lock = Lock()

Expand Down Expand Up @@ -762,7 +757,7 @@ def full_name(self) -> str:
"""
Get the full name of the flow.
"""
return self._full_name
return get_flow_full_name(self._name)

def update(self, /, *, reexport_targets: bool = False) -> _engine.IndexUpdateInfo:
"""
Expand Down Expand Up @@ -861,9 +856,10 @@ def _create_lazy_flow(
The flow will be built the first time when it's really needed.
"""
flow_name = _flow_name_builder.build_name(name, prefix="_flow_")
flow_full_name = get_flow_full_name(flow_name)

def _create_engine_flow() -> _engine.Flow:
flow_full_name = get_flow_full_name(flow_name)
validate_full_flow_name(flow_full_name)
flow_builder_state = _FlowBuilderState(flow_full_name)
root_scope = DataScope(
flow_builder_state, flow_builder_state.engine_flow_builder.root_scope()
Expand All @@ -873,7 +869,7 @@ def _create_engine_flow() -> _engine.Flow:
execution_context.event_loop
)

return Flow(flow_name, flow_full_name, _create_engine_flow)
return Flow(flow_name, _create_engine_flow)


_flows_lock = Lock()
Expand Down
51 changes: 47 additions & 4 deletions python/cocoindex/lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,57 @@
Library level functions and states.
"""

import threading
import warnings
from typing import Callable, Any

from . import _engine # type: ignore
from . import flow, setting
from .convert import dump_engine_object
from .validation import validate_app_namespace_name
from typing import Any, Callable, overload


def prepare_settings(settings: setting.Settings) -> Any:
"""Prepare the settings for the engine."""
if settings.app_namespace:
validate_app_namespace_name(settings.app_namespace)
return dump_engine_object(settings)


_engine.set_settings_fn(lambda: prepare_settings(setting.Settings.from_env()))


_prev_settings_fn: Callable[[], setting.Settings] | None = None
_prev_settings_fn_lock: threading.Lock = threading.Lock()


@overload
def settings(fn: Callable[[], setting.Settings]) -> Callable[[], setting.Settings]: ...
@overload
def settings(
fn: None,
) -> Callable[[Callable[[], setting.Settings]], Callable[[], setting.Settings]]: ...
def settings(fn: Callable[[], setting.Settings] | None = None) -> Any:
"""
Decorate a function that returns a settings.Settings object.
It registers the function as a settings provider.
"""

def _inner(fn: Callable[[], setting.Settings]) -> Callable[[], setting.Settings]:
global _prev_settings_fn # pylint: disable=global-statement
with _prev_settings_fn_lock:
if _prev_settings_fn is not None:
warnings.warn(
f"Setting a new settings function will override the previous one {_prev_settings_fn}."
)
_prev_settings_fn = fn
_engine.set_settings_fn(lambda: prepare_settings(fn()))
return fn

if fn is not None:
return _inner(fn)
else:
return _inner


def init(settings: setting.Settings | None = None) -> None:
Expand All @@ -16,9 +61,7 @@ def init(settings: setting.Settings | None = None) -> None:

If the settings are not provided, they are loaded from the environment variables.
"""
settings = settings or setting.Settings.from_env()
_engine.init(dump_engine_object(settings))
setting.set_app_namespace(settings.app_namespace)
_engine.init(prepare_settings(settings) if settings is not None else None)


def start_server(settings: setting.ServerSettings) -> None:
Expand Down
19 changes: 5 additions & 14 deletions python/cocoindex/setting.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,15 @@

from typing import Callable, Self, Any, overload
from dataclasses import dataclass
from .validation import validate_app_namespace_name

_app_namespace: str = ""
from . import _engine # type: ignore


def get_app_namespace(*, trailing_delimiter: str | None = None) -> str:
"""Get the application namespace. Append the `trailing_delimiter` if not empty."""
if _app_namespace == "" or trailing_delimiter is None:
return _app_namespace
return f"{_app_namespace}{trailing_delimiter}"
app_namespace: str = _engine.get_app_namespace()
if app_namespace == "" or trailing_delimiter is None:
return app_namespace
return f"{app_namespace}{trailing_delimiter}"


def split_app_namespace(full_name: str, delimiter: str) -> tuple[str, str]:
Expand All @@ -26,14 +25,6 @@ def split_app_namespace(full_name: str, delimiter: str) -> tuple[str, str]:
return (parts[0], parts[1])


def set_app_namespace(app_namespace: str) -> None:
"""Set the application namespace."""
if app_namespace:
validate_app_namespace_name(app_namespace)
global _app_namespace # pylint: disable=global-statement
_app_namespace = app_namespace


@dataclass
class DatabaseConnectionSpec:
"""
Expand Down
4 changes: 2 additions & 2 deletions src/builder/analyzer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -995,7 +995,7 @@ pub async fn analyze_flow(
impl Future<Output = Result<ExecutionPlan>> + Send + use<>,
)> {
let analyzer_ctx = AnalyzerContext {
lib_ctx: get_lib_context()?,
lib_ctx: get_lib_context().await?,
flow_ctx,
};
let root_data_scope = Arc::new(Mutex::new(DataScopeBuilder::new()));
Expand Down Expand Up @@ -1109,7 +1109,7 @@ pub async fn analyze_transient_flow<'a>(
)> {
let mut root_data_scope = DataScopeBuilder::new();
let analyzer_ctx = AnalyzerContext {
lib_ctx: get_lib_context()?,
lib_ctx: get_lib_context().await?,
flow_ctx,
};
let mut input_fields = vec![];
Expand Down
8 changes: 6 additions & 2 deletions src/builder/flow_builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -247,8 +247,12 @@ pub struct FlowBuilder {
#[pymethods]
impl FlowBuilder {
#[new]
pub fn new(name: &str) -> PyResult<Self> {
let lib_context = get_lib_context().into_py_result()?;
pub fn new(py: Python<'_>, name: &str) -> PyResult<Self> {
let lib_context = py
.allow_threads(|| -> anyhow::Result<Arc<LibContext>> {
get_runtime().block_on(get_lib_context())
})
.into_py_result()?;
let root_op_scope = OpScope::new(
spec::ROOT_SCOPE_NAME.to_string(),
None,
Expand Down
2 changes: 1 addition & 1 deletion src/execution/db_tracking_setup.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ impl ResourceSetupChange for TrackingTableSetupChange {

impl TrackingTableSetupChange {
pub async fn apply_change(&self) -> Result<()> {
let lib_context = get_lib_context()?;
let lib_context = get_lib_context().await?;
let pool = lib_context.require_builtin_db_pool()?;
if let Some(desired) = &self.desired_state {
for lagacy_name in self.legacy_tracking_table_names.iter() {
Expand Down
Loading
Loading