Skip to content

Commit 7411adf

Browse files
committed
Merge remote-tracking branch 'upstream/master'
2 parents 8c454d8 + ea53b2b commit 7411adf

File tree

13 files changed

+101
-33
lines changed

13 files changed

+101
-33
lines changed

docs/SUMMARY.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
* [Offline store](getting-started/components/offline-store.md)
3737
* [Online store](getting-started/components/online-store.md)
3838
* [Feature server](getting-started/components/feature-server.md)
39-
* [Batch Materialization Engine](getting-started/components/batch-materialization-engine.md)
39+
* [Compute Engine](getting-started/components/compute-engine.md)
4040
* [Provider](getting-started/components/provider.md)
4141
* [Authorization Manager](getting-started/components/authz_manager.md)
4242
* [OpenTelemetry Integration](getting-started/components/open-telemetry.md)
@@ -139,10 +139,10 @@
139139
* [Google Cloud Platform](reference/providers/google-cloud-platform.md)
140140
* [Amazon Web Services](reference/providers/amazon-web-services.md)
141141
* [Azure](reference/providers/azure.md)
142-
* [Batch Materialization Engines](reference/batch-materialization/README.md)
143-
* [Snowflake](reference/batch-materialization/snowflake.md)
144-
* [AWS Lambda (alpha)](reference/batch-materialization/lambda.md)
145-
* [Spark (contrib)](reference/batch-materialization/spark.md)
142+
* [Compute Engines](reference/compute-engine/README.md)
143+
* [Snowflake](reference/compute-engine/snowflake.md)
144+
* [AWS Lambda (alpha)](reference/compute-engine/lambda.md)
145+
* [Spark (contrib)](reference/compute-engine/spark.md)
146146
* [Feature repository](reference/feature-repository/README.md)
147147
* [feature\_store.yaml](reference/feature-repository/feature-store-yaml.md)
148148
* [.feastignore](reference/feature-repository/feast-ignore.md)

docs/getting-started/architecture/write-patterns.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ There are two ways a client (or Data Producer) can *_send_* data to the online s
1616
- Using a synchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](../../reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores))
1717
2. Asynchronously
1818
- Using an asynchronous API call for a small number of entities or a single entity (e.g., using the [`push` or `write_to_online_store` methods](../../reference/data-sources/push.md#pushing-data)) or the Feature Server's [`push` endpoint](../../reference/feature-servers/python-feature-server.md#pushing-features-to-the-online-and-offline-stores))
19-
- Using a "batch job" for a large number of entities (e.g., using a [batch materialization engine](../components/batch-materialization-engine.md))
19+
- Using a "batch job" for a large number of entities (e.g., using a [compute engine](../components/compute-engine.md))
2020

2121
Note, in some contexts, developers may "batch" a group of entities together and write them to the online store in a
2222
single API call. This is a common pattern when writing data to the online store to reduce write loads but we would

docs/getting-started/components/compute-engine.md

Lines changed: 53 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Compute Engine (Batch Materialization Engine)
1+
# Compute Engine
22

33
Note: The materialization is now constructed via unified compute engine interface.
44

@@ -20,8 +20,9 @@ engines.
2020
```markdown
2121
| Compute Engine | Description | Supported | Link |
2222
|-------------------------|-------------------------------------------------------------------------------------------------|------------|------|
23-
| LocalComputeEngine | Runs on Arrow + Pandas/Polars/Dask etc., designed for light weight transformation. | ✅ | |
23+
| LocalComputeEngine | Runs on Arrow + Pandas/Polars/Dask etc., designed for light weight transformation. | ✅ | |
2424
| SparkComputeEngine | Runs on Apache Spark, designed for large-scale distributed feature generation. | ✅ | |
25+
| SnowflakeComputeEngine | Runs on Snowflake, designed for scalable feature generation using Snowflake SQL. | ✅ | |
2526
| LambdaComputeEngine | Runs on AWS Lambda, designed for serverless feature generation. | ✅ | |
2627
| FlinkComputeEngine | Runs on Apache Flink, designed for stream processing and real-time feature generation. | ❌ | |
2728
| RayComputeEngine | Runs on Ray, designed for distributed feature generation and machine learning workloads. | ❌ | |
@@ -31,7 +32,7 @@ engines.
3132
Batch Engine Config can be configured in the `feature_store.yaml` file, and it serves as the default configuration for all materialization and historical retrieval tasks. The `batch_engine` config in BatchFeatureView. E.g
3233
```yaml
3334
batch_engine:
34-
type: SparkComputeEngine
35+
type: spark.engine
3536
config:
3637
spark_master: "local[*]"
3738
spark_app_name: "Feast Batch Engine"
@@ -59,7 +60,7 @@ Then, when you materialize the feature view, it will use the batch_engine config
5960
Stream Engine Config can be configured in the `feature_store.yaml` file, and it serves as the default configuration for all stream materialization and historical retrieval tasks. The `stream_engine` config in FeatureView. E.g
6061
```yaml
6162
stream_engine:
62-
type: SparkComputeEngine
63+
type: spark.engine
6364
config:
6465
spark_master: "local[*]"
6566
spark_app_name: "Feast Stream Engine"
@@ -108,4 +109,51 @@ defined in the DAG. It handles the execution of transformations, aggregations, j
108109

109110
The Feature resolver is the core component of the compute engine that constructs the execution plan for feature
110111
generation. It takes the definitions from feature views and builds a directed acyclic graph (DAG) of operations that
111-
need to be performed to generate the features.
112+
need to be performed to generate the features.
113+
114+
#### DAG
115+
The DAG represents the directed acyclic graph of operations that need to be performed to generate the features. It
116+
contains nodes for each operation, such as transformations, aggregations, joins, and filters. The DAG is built by the
117+
Feature Resolver and executed by the Feature Builder.
118+
119+
DAG nodes are defined as follows:
120+
```
121+
+---------------------+
122+
| SourceReadNode | <- Read data from offline store (e.g. Snowflake, BigQuery, etc. or custom source)
123+
+---------------------+
124+
|
125+
v
126+
+--------------------------------------+
127+
| TransformationNode / JoinNode (*) | <- Merge data sources, custom transformations by user, or default join
128+
+--------------------------------------+
129+
|
130+
v
131+
+---------------------+
132+
| FilterNode | <- used for point-in-time filtering
133+
+---------------------+
134+
|
135+
v
136+
+---------------------+
137+
| AggregationNode (*) | <- only if aggregations are defined
138+
+---------------------+
139+
|
140+
v
141+
+---------------------+
142+
| DeduplicationNode | <- used if no aggregation and for history
143+
+---------------------+ retrieval
144+
|
145+
v
146+
+---------------------+
147+
| ValidationNode (*) | <- optional validation checks
148+
+---------------------+
149+
|
150+
v
151+
+----------+
152+
| Output |
153+
+----------+
154+
/ \
155+
v v
156+
+----------------+ +----------------+
157+
| OnlineStoreWrite| OfflineStoreWrite|
158+
+----------------+ +----------------+
159+
```

docs/getting-started/components/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ A complete Feast deployment contains the following components:
2727
* Retrieve online features.
2828
* **Feature Server:** The Feature Server is a REST API server that serves feature values for a given entity key and feature reference. The Feature Server is designed to be horizontally scalable and can be deployed in a distributed manner.
2929
* **Stream Processor:** The Stream Processor can be used to ingest feature data from streams and write it into the online or offline stores. Currently, there's an experimental Spark processor that's able to consume data from Kafka.
30-
* **Batch Materialization Engine:** The [Batch Materialization Engine](batch-materialization-engine.md) component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process.
30+
* **Compute Engine:** The [Compute Engine](compute-engine.md) component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process.
3131
* **Online Store:** The online store is a database that stores only the latest feature values for each entity. The online store is either populated through materialization jobs or through [stream ingestion](../../reference/data-sources/push.md).
3232
* **Offline Store:** The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to support writes if Feast configures logging functionality of served features.
3333
* **Authorization Manager**: The authorization manager detects authentication tokens from client requests to Feast servers and uses this information to enforce permission policies on the requested services.

docs/getting-started/genai.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,4 +162,4 @@ For more detailed information and examples:
162162
* [MCP Feature Server Reference](../reference/feature-servers/mcp-feature-server.md)
163163
* [Spark Data Source](../reference/data-sources/spark.md)
164164
* [Spark Offline Store](../reference/offline-stores/spark.md)
165-
* [Spark Batch Materialization](../reference/batch-materialization/spark.md)
165+
* [Spark Compute Engine](../reference/compute-engine/spark.md)

docs/how-to-guides/running-feast-in-production.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,8 @@ To keep your online store up to date, you need to run a job that loads feature d
5757
Out of the box, Feast's materialization process uses an in-process materialization engine. This engine loads all the data being materialized into memory from the offline store, and writes it into the online store.
5858

5959
This approach may not scale to large amounts of data, which users of Feast may be dealing with in production.
60-
In this case, we recommend using one of the more [scalable materialization engines](./scaling-feast.md#scaling-materialization), such as [Snowflake Materialization Engine](../reference/batch-materialization/snowflake.md).
61-
Users may also need to [write a custom materialization engine](../how-to-guides/customizing-feast/creating-a-custom-materialization-engine.md) to work on their existing infrastructure.
60+
In this case, we recommend using one of the more [scalable compute engines](./scaling-feast.md#scaling-materialization), such as [Snowflake Compute Engine](../reference/compute-engine/snowflake.md).
61+
Users may also need to [write a custom compute engine](../how-to-guides/customizing-feast/creating-a-custom-compute-engine.md) to work on their existing infrastructure.
6262

6363

6464
### 2.2 Scheduled materialization with Airflow

docs/how-to-guides/scaling-feast.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The recommended solution in this case is to use the [SQL based registry](../tuto
2020
The default Feast materialization process is an in-memory process, which pulls data from the offline store before writing it to the online store.
2121
However, this process does not scale for large data sets, since it's executed on a single-process.
2222

23-
Feast supports pluggable [Materialization Engines](../getting-started/components/batch-materialization-engine.md), that allow the materialization process to be scaled up.
23+
Feast supports pluggable [Compute Engines](../getting-started/components/compute-engine.md), that allow the materialization process to be scaled up.
2424
Aside from the local process, Feast supports a [Lambda-based materialization engine](https://rtd.feast.dev/en/master/#alpha-lambda-based-engine), and a [Bytewax-based materialization engine](https://rtd.feast.dev/en/master/#bytewax-engine).
2525

2626
Users may also be able to build an engine to scale up materialization using existing infrastructure in their organizations.

docs/reference/batch-materialization/README.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

docs/reference/codebase-structure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ There are also several important submodules:
3434
* `ui/` contains the embedded Web UI, to be launched on the `feast ui` command.
3535

3636
Of these submodules, `infra/` is the most important.
37-
It contains the interfaces for the [provider](getting-started/components/provider.md), [offline store](getting-started/components/offline-store.md), [online store](getting-started/components/online-store.md), [batch materialization engine](getting-started/components/batch-materialization-engine.md), and [registry](getting-started/components/registry.md), as well as all of their individual implementations.
37+
It contains the interfaces for the [provider](getting-started/components/provider.md), [offline store](getting-started/components/offline-store.md), [online store](getting-started/components/online-store.md), [compute engine](getting-started/components/compute-engine.md), and [registry](getting-started/components/registry.md), as well as all of their individual implementations.
3838

3939
```
4040
$ tree --dirsfirst -L 1 infra

docs/reference/compute-engine/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,18 +48,35 @@ An example of built output from FeatureBuilder:
4848

4949
## ✨ Available Engines
5050

51+
5152
### 🔥 SparkComputeEngine
5253

54+
{% page-ref page="spark.md" %}
55+
5356
- Distributed DAG execution via Apache Spark
5457
- Supports point-in-time joins and large-scale materialization
5558
- Integrates with `SparkOfflineStore` and `SparkMaterializationJob`
5659

5760
### 🧪 LocalComputeEngine
5861

62+
{% page-ref page="local.md" %}
63+
5964
- Runs on Arrow + Specified backend (e.g., Pandas, Polars)
6065
- Designed for local dev, testing, or lightweight feature generation
6166
- Supports `LocalMaterializationJob` and `LocalHistoricalRetrievalJob`
6267

68+
### 🧊 SnowflakeComputeEngine
69+
70+
- Runs entirely in Snowflake
71+
- Supports Snowflake SQL for feature transformations and aggregations
72+
- Integrates with `SnowflakeOfflineStore` and `SnowflakeMaterializationJob`
73+
74+
{% page-ref page="snowflake.md" %}
75+
76+
### LambdaComputeEngine
77+
78+
{% page-ref page="lambda.md" %}
79+
6380
---
6481

6582
## 🛠️ Feature Builder Flow

0 commit comments

Comments
 (0)