|
| 1 | +# Compute Engine (Batch Materialization Engine) |
| 2 | + |
| 3 | +Note: The materialization is now constructed via unified compute engine interface. |
| 4 | + |
| 5 | +A Compute Engine in Feast is a component that handles materialization and historical retrieval tasks. It is responsible |
| 6 | +for executing the logic defined in feature views, such as aggregations, transformations, and custom user-defined |
| 7 | +functions (UDFs). |
| 8 | + |
| 9 | +A materialization task abstracts over specific technologies or frameworks that are used to materialize data. It allows |
| 10 | +users to use a pure local serialized approach (which is the default LocalComputeEngine), or delegates the |
| 11 | +materialization to seperate components (e.g. AWS Lambda, as implemented by the the LambdaComputeEngine). |
| 12 | + |
| 13 | +If the built-in engines are not sufficient, you can create your own custom materialization engine. Please |
| 14 | +see [this guide](../../how-to-guides/customizing-feast/creating-a-custom-compute-engine.md) for more details. |
| 15 | + |
| 16 | +Please see [feature\_store.yaml](../../reference/feature-repository/feature-store-yaml.md#overview) for configuring |
| 17 | +engines. |
| 18 | + |
| 19 | +### Supported Compute Engines |
| 20 | +```markdown |
| 21 | +| Compute Engine | Description | Supported | Link | |
| 22 | +|-------------------------|-------------------------------------------------------------------------------------------------|------------|------| |
| 23 | +| LocalComputeEngine | Runs on Arrow + Pandas/Polars/Dask etc., designed for light weight transformation. | ✅ | | |
| 24 | +| SparkComputeEngine | Runs on Apache Spark, designed for large-scale distributed feature generation. | ✅ | | |
| 25 | +| LambdaComputeEngine | Runs on AWS Lambda, designed for serverless feature generation. | ✅ | | |
| 26 | +| FlinkComputeEngine | Runs on Apache Flink, designed for stream processing and real-time feature generation. | ❌ | | |
| 27 | +| RayComputeEngine | Runs on Ray, designed for distributed feature generation and machine learning workloads. | ❌ | | |
| 28 | +``` |
| 29 | + |
| 30 | +### Batch Engine |
| 31 | +Batch Engine Config can be configured in the `feature_store.yaml` file, and it serves as the default configuration for all materialization and historical retrieval tasks. The `batch_engine` config in BatchFeatureView. E.g |
| 32 | +```yaml |
| 33 | +batch_engine: |
| 34 | + type: SparkComputeEngine |
| 35 | + config: |
| 36 | + spark_master: "local[*]" |
| 37 | + spark_app_name: "Feast Batch Engine" |
| 38 | + spark_conf: |
| 39 | + spark.sql.shuffle.partitions: 100 |
| 40 | + spark.executor.memory: "4g" |
| 41 | + |
| 42 | +``` |
| 43 | +in BatchFeatureView. |
| 44 | +```python |
| 45 | +from feast import BatchFeatureView |
| 46 | + |
| 47 | +fv = BatchFeatureView( |
| 48 | + batch_engine={ |
| 49 | + "spark_conf": { |
| 50 | + "spark.sql.shuffle.partitions": 200, |
| 51 | + "spark.executor.memory": "8g" |
| 52 | + }, |
| 53 | + } |
| 54 | +) |
| 55 | +``` |
| 56 | +Then, when you materialize the feature view, it will use the batch_engine configuration specified in the feature view, which has shuffle partitions set to 200 and executor memory set to 8g. |
| 57 | + |
| 58 | +### Stream Engine |
| 59 | +Stream Engine Config can be configured in the `feature_store.yaml` file, and it serves as the default configuration for all stream materialization and historical retrieval tasks. The `stream_engine` config in FeatureView. E.g |
| 60 | +```yaml |
| 61 | +stream_engine: |
| 62 | + type: SparkComputeEngine |
| 63 | + config: |
| 64 | + spark_master: "local[*]" |
| 65 | + spark_app_name: "Feast Stream Engine" |
| 66 | + spark_conf: |
| 67 | + spark.sql.shuffle.partitions: 100 |
| 68 | + spark.executor.memory: "4g" |
| 69 | +``` |
| 70 | +```python |
| 71 | +from feast import StreamFeatureView |
| 72 | +fv = StreamFeatureView( |
| 73 | + stream_engine={ |
| 74 | + "spark_conf": { |
| 75 | + "spark.sql.shuffle.partitions": 200, |
| 76 | + "spark.executor.memory": "8g" |
| 77 | + }, |
| 78 | + } |
| 79 | +) |
| 80 | +``` |
| 81 | +Then, when you materialize the feature view, it will use the stream_engine configuration specified in the feature view, which has shuffle partitions set to 200 and executor memory set to 8g. |
| 82 | + |
| 83 | +### API |
| 84 | + |
| 85 | +The compute engine builds the execution plan in a DAG format named FeatureBuilder. It derives feature generation from |
| 86 | +Feature View definitions including: |
| 87 | + |
| 88 | +``` |
| 89 | +1. Transformation (via Transformation API) |
| 90 | +2. Aggregation (via Aggregation API) |
| 91 | +3. Join (join with entity datasets, customized JOIN or join with another Feature View) |
| 92 | +4. Filter (Point in time filter, ttl filter, filter by custom expression) |
| 93 | +... |
| 94 | +``` |
| 95 | + |
| 96 | +### Components |
| 97 | +The compute engine is responsible for executing the materialization and retrieval tasks defined in the feature views. It |
| 98 | +builds a directed acyclic graph (DAG) of operations that need to be performed to generate the features. |
| 99 | +The Core components of the compute engine are: |
| 100 | + |
| 101 | + |
| 102 | +#### Feature Builder |
| 103 | + |
| 104 | +The Feature builder is responsible for resolving the features from the feature views and executing the operations |
| 105 | +defined in the DAG. It handles the execution of transformations, aggregations, joins, and filters. |
| 106 | + |
| 107 | +#### Feature Resolver |
| 108 | + |
| 109 | +The Feature resolver is the core component of the compute engine that constructs the execution plan for feature |
| 110 | +generation. It takes the definitions from feature views and builds a directed acyclic graph (DAG) of operations that |
| 111 | +need to be performed to generate the features. |
0 commit comments