Skip to content

Commit 2808ce1

Browse files
authored
feat: Update doc (#5521)
update doc Signed-off-by: HaoXuAI <[email protected]>
1 parent b49cea1 commit 2808ce1

File tree

13 files changed

+212
-54
lines changed

13 files changed

+212
-54
lines changed

docs/getting-started/components/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
[feature-server.md](feature-server.md)
1717
{% endcontent-ref %}
1818

19-
{% content-ref url="batch-materialization-engine.md" %}
20-
[batch-materialization-engine.md](batch-materialization-engine.md)
19+
{% content-ref url="compute-engine.md" %}
20+
[compute-engine.md](compute-engine.md)
2121
{% endcontent-ref %}
2222

2323
{% content-ref url="provider.md" %}

docs/getting-started/components/batch-materialization-engine.md

Lines changed: 0 additions & 11 deletions
This file was deleted.
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Compute Engine (Batch Materialization Engine)
2+
3+
Note: The materialization is now constructed via unified compute engine interface.
4+
5+
A Compute Engine in Feast is a component that handles materialization and historical retrieval tasks. It is responsible
6+
for executing the logic defined in feature views, such as aggregations, transformations, and custom user-defined
7+
functions (UDFs).
8+
9+
A materialization task abstracts over specific technologies or frameworks that are used to materialize data. It allows
10+
users to use a pure local serialized approach (which is the default LocalComputeEngine), or delegates the
11+
materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaComputeEngine).
12+
13+
If the built-in engines are not sufficient, you can create your own custom materialization engine. Please
14+
see [this guide](../../how-to-guides/customizing-feast/creating-a-custom-compute-engine.md) for more details.
15+
16+
Please see [feature\_store.yaml](../../reference/feature-repository/feature-store-yaml.md#overview) for configuring
17+
engines.
18+
19+
### Supported Compute Engines
20+
```markdown
21+
| Compute Engine | Description | Supported | Link |
22+
|-------------------------|-------------------------------------------------------------------------------------------------|------------|------|
23+
| LocalComputeEngine | Runs on Arrow + Pandas/Polars/Dask etc., designed for light weight transformation. | ✅ | |
24+
| SparkComputeEngine | Runs on Apache Spark, designed for large-scale distributed feature generation. | ✅ | |
25+
| LambdaComputeEngine | Runs on AWS Lambda, designed for serverless feature generation. | ✅ | |
26+
| FlinkComputeEngine | Runs on Apache Flink, designed for stream processing and real-time feature generation. | ❌ | |
27+
| RayComputeEngine | Runs on Ray, designed for distributed feature generation and machine learning workloads. | ❌ | |
28+
```
29+
30+
### Batch Engine
31+
Batch Engine Config can be configured in the `feature_store.yaml` file, and it serves as the default configuration for all materialization and historical retrieval tasks. The `batch_engine` config in BatchFeatureView. E.g
32+
```yaml
33+
batch_engine:
34+
type: SparkComputeEngine
35+
config:
36+
spark_master: "local[*]"
37+
spark_app_name: "Feast Batch Engine"
38+
spark_conf:
39+
spark.sql.shuffle.partitions: 100
40+
spark.executor.memory: "4g"
41+
42+
```
43+
in BatchFeatureView.
44+
```python
45+
from feast import BatchFeatureView
46+
47+
fv = BatchFeatureView(
48+
batch_engine={
49+
"spark_conf": {
50+
"spark.sql.shuffle.partitions": 200,
51+
"spark.executor.memory": "8g"
52+
},
53+
}
54+
)
55+
```
56+
Then, when you materialize the feature view, it will use the batch_engine configuration specified in the feature view, which has shuffle partitions set to 200 and executor memory set to 8g.
57+
58+
### Stream Engine
59+
Stream Engine Config can be configured in the `feature_store.yaml` file, and it serves as the default configuration for all stream materialization and historical retrieval tasks. The `stream_engine` config in FeatureView. E.g
60+
```yaml
61+
stream_engine:
62+
type: SparkComputeEngine
63+
config:
64+
spark_master: "local[*]"
65+
spark_app_name: "Feast Stream Engine"
66+
spark_conf:
67+
spark.sql.shuffle.partitions: 100
68+
spark.executor.memory: "4g"
69+
```
70+
```python
71+
from feast import StreamFeatureView
72+
fv = StreamFeatureView(
73+
stream_engine={
74+
"spark_conf": {
75+
"spark.sql.shuffle.partitions": 200,
76+
"spark.executor.memory": "8g"
77+
},
78+
}
79+
)
80+
```
81+
Then, when you materialize the feature view, it will use the stream_engine configuration specified in the feature view, which has shuffle partitions set to 200 and executor memory set to 8g.
82+
83+
### API
84+
85+
The compute engine builds the execution plan in a DAG format named FeatureBuilder. It derives feature generation from
86+
Feature View definitions including:
87+
88+
```
89+
1. Transformation (via Transformation API)
90+
2. Aggregation (via Aggregation API)
91+
3. Join (join with entity datasets, customized JOIN or join with another Feature View)
92+
4. Filter (Point in time filter, ttl filter, filter by custom expression)
93+
...
94+
```
95+
96+
### Components
97+
The compute engine is responsible for executing the materialization and retrieval tasks defined in the feature views. It
98+
builds a directed acyclic graph (DAG) of operations that need to be performed to generate the features.
99+
The Core components of the compute engine are:
100+
101+
102+
#### Feature Builder
103+
104+
The Feature builder is responsible for resolving the features from the feature views and executing the operations
105+
defined in the DAG. It handles the execution of transformations, aggregations, joins, and filters.
106+
107+
#### Feature Resolver
108+
109+
The Feature resolver is the core component of the compute engine that constructs the execution plan for feature
110+
generation. It takes the definitions from feature views and builds a directed acyclic graph (DAG) of operations that
111+
need to be performed to generate the features.

docs/getting-started/concepts/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,14 @@
2020
[feature-view.md](feature-view.md)
2121
{% endcontent-ref %}
2222

23+
{% content-ref url="batch-feature-view.md" %}
24+
[batch-feature-view.md](batch-feature-view.md)
25+
{% endcontent-ref %}
26+
27+
{% content-ref url="stream-feature-view.md" %}
28+
[stream-feature-view.md](stream-feature-view.md)
29+
{% endcontent-ref %}
30+
2331
{% content-ref url="feature-retrieval.md" %}
2432
[feature-retrieval.md](feature-retrieval.md)
2533
{% endcontent-ref %}

docs/getting-started/concepts/batch-feature-view.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
`BatchFeatureView` is a flexible abstraction in Feast that allows users to define features derived from batch data sources or even other `FeatureView`s, enabling composable and reusable feature pipelines. It is an extension of the `FeatureView` class, with support for user-defined transformations, aggregations, and recursive chaining of feature logic.
44

5+
## Supported Compute Engines
6+
- [x] LocalComputeEngine
7+
- [x] SparkComputeEngine
8+
- [ ] LambdaComputeEngine
9+
- [ ] KubernetesComputeEngine
10+
511
---
612

713
## ✅ Key Capabilities
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Stream Feature View
2+
`StreamFeatureView` is a type of feature view in Feast that allows you to define features that are continuously updated from a streaming source. It is designed to handle real-time data ingestion and feature generation, making it suitable for use cases where features need to be updated frequently as new data arrives.
3+
4+
### Supported Compute Engines
5+
- [x] LocalComputeEngine
6+
- [x] SparkComputeEngine
7+
- [ ] FlinkComputeEngine
8+
9+
### Key Capabilities
10+
- **Real-time Feature Generation**: Supports defining features that are continuously updated from a streaming source.
11+
12+
- **Transformations**: Apply transformation logic (e.g., `feature_transformation` or `udf`) to raw data source.
13+
14+
- **Aggregations**: Define time-windowed aggregations (e.g., `sum`, `avg`) over event-timestamped data.
15+
16+
- **Feature resolution & execution**: Automatically resolves and executes dependent views during materialization or retrieval.

docs/reference/compute-engine/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ This system builds and executes DAGs (Directed Acyclic Graphs) of typed operatio
1919
| `FeatureBuilder` | Constructs a DAG from Feature View definition for a specific backend | [link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/feature_builder.py) |
2020
| `FeatureResolver` | Resolves feature DAG by topological order for execution | [link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/feature_resolver.py) |
2121
| `DAG` | Represents a logical DAG operation (read, aggregate, join, etc.) | [link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md) |
22-
| `ExecutionPlan` | Executes nodes in dependency order and stores intermediate outputs | [link]([link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md)) |
23-
| `ExecutionContext` | Holds config, registry, stores, entity data, and node outputs | [link]([link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md)) |
22+
| `ExecutionPlan` | Executes nodes in dependency order and stores intermediate outputs | [link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md) |
23+
| `ExecutionContext` | Holds config, registry, stores, entity data, and node outputs | [link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md) |
2424

2525
---
2626

sdk/python/feast/batch_feature_view.py

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ class BatchFeatureView(FeatureView):
3434
3535
Attributes:
3636
name: The unique name of the batch feature view.
37+
mode: The transformation mode to use for the batch feature view. This can be one of TransformationMode
3738
entities: List of entities or entity join keys.
3839
ttl: The amount of time this group of features lives. A ttl of 0 indicates that
3940
this group of features lives forever. Note that large ttl's or a ttl of 0
@@ -46,6 +47,13 @@ class BatchFeatureView(FeatureView):
4647
description: A human-readable description.
4748
tags: A dictionary of key-value pairs to store arbitrary metadata.
4849
owner: The owner of the batch feature view, typically the email of the primary maintainer.
50+
udf: A user-defined function that applies transformations to the data in the batch feature view.
51+
udf_string: A string representation of the user-defined function.
52+
feature_transformation: A transformation object that defines how features are transformed.
53+
Note, feature_transformation has precedence over udf and udf_string.
54+
batch_engine: A dictionary containing configuration for the batch engine used to process the feature view.
55+
Note, it will override the repo-level default batch engine config defined in the yaml file.
56+
aggregations: A list of aggregations to be applied to the features in the batch feature view.
4957
"""
5058

5159
name: str
@@ -67,7 +75,7 @@ class BatchFeatureView(FeatureView):
6775
udf: Optional[Callable[[Any], Any]]
6876
udf_string: Optional[str]
6977
feature_transformation: Optional[Transformation]
70-
batch_engine: Optional[Field]
78+
batch_engine: Optional[Dict[str, Any]]
7179
aggregations: Optional[List[Aggregation]]
7280

7381
def __init__(
@@ -88,7 +96,7 @@ def __init__(
8896
udf: Optional[Callable[[Any], Any]] = None,
8997
udf_string: Optional[str] = "",
9098
feature_transformation: Optional[Transformation] = None,
91-
batch_engine: Optional[Field] = None,
99+
batch_engine: Optional[Dict[str, Any]] = None,
92100
aggregations: Optional[List[Aggregation]] = None,
93101
):
94102
if not flags_helper.is_test():
@@ -162,21 +170,9 @@ def batch_feature_view(
162170
schema: Optional[List[Field]] = None,
163171
):
164172
"""
165-
Args:
166-
name:
167-
mode:
168-
entities:
169-
ttl:
170-
source:
171-
tags:
172-
online:
173-
offline:
174-
description:
175-
owner:
176-
schema:
177-
178-
Returns:
179-
173+
Creates a BatchFeatureView object with the given user-defined function (UDF) as the transformation.
174+
Please make sure that the udf contains all non-built in imports within the function to ensure that the execution
175+
of a deserialized function does not miss imports.
180176
"""
181177

182178
def mainify(obj):

sdk/python/feast/infra/compute_engines/base.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,3 +131,34 @@ def get_execution_context(
131131
entity_defs=entity_defs,
132132
entity_df=entity_df,
133133
)
134+
135+
def _get_feature_view_engine_config(
136+
self, feature_view: Union[BatchFeatureView, StreamFeatureView, FeatureView]
137+
) -> dict:
138+
"""
139+
Merge repo-level default batch engine config with runtime engine overrides defined in the feature view.
140+
141+
Priority:
142+
1. Repo config (`self.repo_config.batch_engine_config`) - baseline
143+
2. FeatureView overrides (`batch_engine` for BatchFeatureView, `stream_engine` for StreamFeatureView`) - highest priority
144+
145+
Args:
146+
feature_view: A BatchFeatureView or StreamFeatureView.
147+
148+
Returns:
149+
dict: The merged engine configuration.
150+
"""
151+
default_conf = self.repo_config.batch_engine_config or {}
152+
153+
runtime_conf = None
154+
if isinstance(feature_view, BatchFeatureView):
155+
runtime_conf = feature_view.batch_engine
156+
elif isinstance(feature_view, StreamFeatureView):
157+
runtime_conf = feature_view.stream_engine
158+
159+
if runtime_conf is not None and not isinstance(runtime_conf, dict):
160+
raise TypeError(
161+
f"Engine config for {feature_view.name} must be a dict, got {type(runtime_conf)}."
162+
)
163+
164+
return {**default_conf, **runtime_conf} if runtime_conf else dict(default_conf)

sdk/python/feast/infra/compute_engines/local/README.md

Whitespace-only changes.

0 commit comments

Comments
 (0)