|
| 1 | +# Feast Architecture |
| 2 | + |
| 3 | +This document describes the high-level architecture of Feast, the open-source feature store for machine learning. It is intended for contributors, AI agents, and anyone who needs to understand how the codebase is organized. |
| 4 | + |
| 5 | +## System Overview |
| 6 | + |
| 7 | +Feast manages the lifecycle of ML features: from batch data sources through offline storage, materialization into online stores, and low-latency serving for real-time inference. The system is designed around pluggable backends—every storage layer, compute engine, and registry can be swapped independently. |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +### Component Overview |
| 12 | + |
| 13 | +| Layer | Component | Implementations | |
| 14 | +|-------|-----------|-----------------| |
| 15 | +| **SDK / CLI** | `feast.feature_store` | Python, Go, Java | |
| 16 | +| **Registry** | Metadata catalog | File (S3/GCS), SQL (Postgres/MySQL/SQLite) | |
| 17 | +| **Provider** | Orchestrator | PassthroughProvider | |
| 18 | +| **Offline Store** | Historical retrieval | BigQuery, Snowflake, Redshift, Spark, DuckDB, Postgres, Trino, Athena | |
| 19 | +| **Online Store** | Low-latency serving | Redis, DynamoDB, Bigtable, Postgres, SQLite, Cassandra, Milvus, Qdrant | |
| 20 | +| **Compute Engine** | Materialization jobs | Local, Spark, Kubernetes, Ray, Snowflake, AWS Lambda | |
| 21 | + |
| 22 | +## Core Concepts |
| 23 | + |
| 24 | +| Concept | Description | Definition File | |
| 25 | +|---------|-------------|-----------------| |
| 26 | +| **Entity** | A real-world object (user, product) that features describe | `sdk/python/feast/entity.py` | |
| 27 | +| **FeatureView** | A group of features sourced from a single data source | `sdk/python/feast/feature_view.py` | |
| 28 | +| **OnDemandFeatureView** | Features computed at request time via transformations | `sdk/python/feast/on_demand_feature_view.py` | |
| 29 | +| **StreamFeatureView** | Features derived from streaming data sources | `sdk/python/feast/stream_feature_view.py` | |
| 30 | +| **FeatureService** | A named collection of feature views for a use case | `sdk/python/feast/feature_service.py` | |
| 31 | +| **DataSource** | Connection to raw data (file, warehouse, stream) | `sdk/python/feast/data_source.py` | |
| 32 | +| **Permission** | Authorization policy controlling access to resources | `sdk/python/feast/permissions/permission.py` | |
| 33 | + |
| 34 | +## Key Abstractions |
| 35 | + |
| 36 | +### FeatureStore (`sdk/python/feast/feature_store.py`) |
| 37 | + |
| 38 | +The main entry point for all SDK operations. Users interact with Feast through this class: |
| 39 | + |
| 40 | +- `apply()` — register feature definitions in the registry |
| 41 | +- `get_historical_features()` — point-in-time correct feature retrieval for training |
| 42 | +- `get_online_features()` — low-latency feature retrieval for inference |
| 43 | +- `materialize()` / `materialize_incremental()` — copy features from offline to online store |
| 44 | +- `push()` — push features directly to the online store |
| 45 | +- `teardown()` — remove infrastructure |
| 46 | + |
| 47 | +### Provider (`sdk/python/feast/infra/provider.py`) |
| 48 | + |
| 49 | +Orchestrates the offline store, online store, and compute engine. All cloud providers (GCP, AWS, Azure, local) use `PassthroughProvider`, which delegates directly to the configured store implementations. |
| 50 | + |
| 51 | +### OfflineStore (`sdk/python/feast/infra/offline_stores/offline_store.py`) |
| 52 | + |
| 53 | +Abstract base class for historical feature retrieval. Key methods: |
| 54 | + |
| 55 | +- `get_historical_features()` — point-in-time join of features with entity timestamps |
| 56 | +- `pull_latest_from_table_or_query()` — extract latest entity rows for materialization |
| 57 | +- `pull_all_from_table_or_query()` — extract all rows in a time range |
| 58 | +- `offline_write_batch()` — write features to the offline store |
| 59 | + |
| 60 | +Implementations: BigQuery, Snowflake, Redshift, Spark, Dask, DuckDB, Postgres, Trino, Athena, and more under `infra/offline_stores/contrib/`. |
| 61 | + |
| 62 | +### OnlineStore (`sdk/python/feast/infra/online_stores/online_store.py`) |
| 63 | + |
| 64 | +Abstract base class for low-latency feature serving. Key methods: |
| 65 | + |
| 66 | +- `online_read()` — read features by entity keys |
| 67 | +- `online_write_batch()` — write materialized features |
| 68 | +- `update()` — create/update cloud resources |
| 69 | +- `retrieve_online_documents()` — vector similarity search (for embedding stores) |
| 70 | + |
| 71 | +Implementations: Redis, DynamoDB, Bigtable, Snowflake, SQLite, Postgres, Cassandra, MongoDB, MySQL, Elasticsearch, Milvus, Qdrant, and a HybridOnlineStore that combines multiple backends. |
| 72 | + |
| 73 | +### Registry (`sdk/python/feast/infra/registry/`) |
| 74 | + |
| 75 | +The metadata catalog that stores all feature definitions (entities, feature views, feature services, permissions). Two main implementations: |
| 76 | + |
| 77 | +- **FileRegistry** (`registry.py`) — serializes the entire registry as a single protobuf file, stored on local disk, S3, GCS, or Azure Blob. Uses `RegistryStore` backends for storage. |
| 78 | +- **SqlRegistry** (`sql.py`) — stores metadata in a SQL database (PostgreSQL, MySQL, SQLite). |
| 79 | + |
| 80 | +### ComputeEngine (`sdk/python/feast/infra/compute_engines/base.py`) |
| 81 | + |
| 82 | +Abstract base class for materialization — the process of copying features from the offline store to the online store. Key method: |
| 83 | + |
| 84 | +- `materialize()` — execute materialization tasks, each representing a (feature_view, time_range) pair |
| 85 | + |
| 86 | +Implementations: Local (single-machine), Spark, Kubernetes (K8s Jobs), Ray, Snowflake (SQL-based), AWS Lambda. |
| 87 | + |
| 88 | +The compute engine also includes a DAG module (`compute_engines/dag/`) for building execution plans with nodes, values, and contexts. |
| 89 | + |
| 90 | +### Feature Server (`sdk/python/feast/feature_server.py`) |
| 91 | + |
| 92 | +A FastAPI application that exposes Feast operations over HTTP: |
| 93 | + |
| 94 | +- `POST /get-online-features` — retrieve online features |
| 95 | +- `POST /push` — push features to online/offline stores |
| 96 | +- `POST /materialize` — trigger materialization |
| 97 | +- `POST /materialize-incremental` — incremental materialization |
| 98 | + |
| 99 | +Started via `feast serve` CLI command. |
| 100 | + |
| 101 | +## Permissions and Authorization |
| 102 | + |
| 103 | +The permissions system (`sdk/python/feast/permissions/`) provides fine-grained access control: |
| 104 | + |
| 105 | +| Component | File | Purpose | |
| 106 | +|-----------|------|---------| |
| 107 | +| `Permission` | `permission.py` | Policy definition (resource type + action + roles) | |
| 108 | +| `SecurityManager` | `security_manager.py` | Runtime permission enforcement | |
| 109 | +| `AuthManager` | `auth/auth_manager.py` | Token extraction and parsing | |
| 110 | +| `AuthConfig` | `auth_model.py` | Auth configuration (OIDC, Kubernetes, NoAuth) | |
| 111 | + |
| 112 | +Auth flow: Client sends token → AuthManager extracts identity → SecurityManager checks Permission policies → access granted or denied. |
| 113 | + |
| 114 | +Server-side enforcement is implemented for REST (`permissions/server/rest.py`), gRPC (`permissions/server/grpc.py`), and Arrow Flight protocols. Client-side interceptors handle token injection for each transport. |
| 115 | + |
| 116 | +## CLI |
| 117 | + |
| 118 | +The Feast CLI (`sdk/python/feast/cli/cli.py`) is built with Click and provides commands for: |
| 119 | + |
| 120 | +- `feast apply` — register feature definitions |
| 121 | +- `feast materialize` / `feast materialize-incremental` — run materialization |
| 122 | +- `feast serve` — start the feature server |
| 123 | +- `feast plan` — preview changes before applying |
| 124 | +- `feast teardown` — remove infrastructure |
| 125 | +- `feast init` — scaffold a new feature repository |
| 126 | + |
| 127 | +## Kubernetes Operator |
| 128 | + |
| 129 | +The Feast Operator (`infra/feast-operator/`) is a Go-based Kubernetes operator built with controller-runtime (Kubebuilder): |
| 130 | + |
| 131 | +| Component | Location | Purpose | |
| 132 | +|-----------|----------|---------| |
| 133 | +| CRD (`FeatureStore`) | `api/v1/featurestore_types.go` | Custom Resource Definition | |
| 134 | +| Reconciler | `internal/controller/featurestore_controller.go` | Main control loop | |
| 135 | +| Service handlers | `internal/controller/services/` | Manage Deployments, Services, ConfigMaps | |
| 136 | +| AuthZ | `internal/controller/authz/` | RBAC/authorization setup | |
| 137 | + |
| 138 | +The operator watches `FeatureStore` custom resources and reconciles Deployments, Services, ConfigMaps, Secrets, CronJobs, and HPAs to run Feast components in Kubernetes. |
| 139 | + |
| 140 | +**Phases**: Ready, Pending, Failed |
| 141 | +**Conditions**: ClientReady, OfflineStoreReady, OnlineStoreReady, RegistryReady, UIReady, AuthorizationReady, CronJobReady |
| 142 | + |
| 143 | +## Protobuf Definitions |
| 144 | + |
| 145 | +All cross-language data models and service interfaces are defined in Protocol Buffers (`protos/feast/`): |
| 146 | + |
| 147 | +``` |
| 148 | +protos/feast/ |
| 149 | +├── core/ # Data models: Entity, FeatureView, FeatureService, Permission, Registry |
| 150 | +├── serving/ # ServingService, TransformationService gRPC APIs |
| 151 | +├── registry/ # RegistryServer gRPC API |
| 152 | +├── storage/ # Redis storage format |
| 153 | +└── types/ # Primitive types: Value, EntityKey, Field |
| 154 | +``` |
| 155 | + |
| 156 | +Protos are compiled to Python (`make compile-protos-python`), Go (`make compile-protos-go`), and Java. |
| 157 | + |
| 158 | +## Multi-Language SDKs |
| 159 | + |
| 160 | +| SDK | Location | Purpose | |
| 161 | +|-----|----------|---------| |
| 162 | +| **Python** | `sdk/python/` | Primary SDK — full feature store implementation | |
| 163 | +| **Go** | `go/` | Embedded online feature retrieval | |
| 164 | +| **Java** | `java/` | Serving client and feature server | |
| 165 | + |
| 166 | +The Python SDK is the canonical implementation. The Go and Java SDKs provide serving capabilities and client libraries. |
| 167 | + |
| 168 | +## Directory Structure |
| 169 | + |
| 170 | +``` |
| 171 | +feast/ |
| 172 | +├── sdk/python/feast/ # Python SDK (primary implementation) |
| 173 | +│ ├── cli/ # CLI commands (Click) |
| 174 | +│ ├── infra/ # Infrastructure abstractions |
| 175 | +│ │ ├── offline_stores/ # Offline store implementations |
| 176 | +│ │ ├── online_stores/ # Online store implementations |
| 177 | +│ │ ├── compute_engines/ # Materialization engines |
| 178 | +│ │ ├── registry/ # Registry implementations |
| 179 | +│ │ ├── feature_servers/ # Feature server deployments |
| 180 | +│ │ └── common/ # Shared infra code |
| 181 | +│ ├── permissions/ # Authorization system |
| 182 | +│ ├── transformation/ # Feature transformations |
| 183 | +│ ├── templates/ # Project templates |
| 184 | +│ └── feature_store.py # Main FeatureStore class |
| 185 | +├── go/ # Go SDK |
| 186 | +├── java/ # Java SDK (serving + client) |
| 187 | +├── protos/ # Protocol Buffer definitions |
| 188 | +├── ui/ # React/TypeScript web UI |
| 189 | +├── infra/ # Infrastructure and deployment |
| 190 | +│ ├── feast-operator/ # Kubernetes operator (Go) |
| 191 | +│ ├── charts/ # Helm charts |
| 192 | +│ ├── scripts/ # Build and release scripts |
| 193 | +│ ├── terraform/ # Cloud infrastructure (IaC) |
| 194 | +│ └── templates/ # Configuration templates |
| 195 | +├── docs/ # Documentation (GitBook) |
| 196 | +├── examples/ # Example feature repositories |
| 197 | +└── Makefile # Build targets (80+ targets) |
| 198 | +``` |
| 199 | + |
| 200 | +## Data Flow |
| 201 | + |
| 202 | +### Training (Offline) |
| 203 | + |
| 204 | +``` |
| 205 | +Data Source → OfflineStore.get_historical_features() → Point-in-Time Join → Training DataFrame |
| 206 | +``` |
| 207 | + |
| 208 | +1. User defines `FeatureView` + `Entity` + `DataSource` |
| 209 | +2. User calls `store.get_historical_features(entity_df, features)` |
| 210 | +3. OfflineStore performs point-in-time join against the data source |
| 211 | +4. Returns a `RetrievalJob` that materializes to a DataFrame or Arrow table |
| 212 | + |
| 213 | +### Serving (Online) |
| 214 | + |
| 215 | +``` |
| 216 | +OfflineStore → ComputeEngine.materialize() → OnlineStore → FeatureServer → Inference |
| 217 | +``` |
| 218 | + |
| 219 | +1. `feast materialize` triggers the compute engine |
| 220 | +2. ComputeEngine reads latest values from the offline store |
| 221 | +3. Values are written to the online store via `OnlineStore.online_write_batch()` |
| 222 | +4. Feature server or SDK reads from online store via `OnlineStore.online_read()` |
| 223 | + |
| 224 | +### Push-Based Ingestion |
| 225 | + |
| 226 | +``` |
| 227 | +Application → FeatureStore.push() → OnlineStore (+ optionally OfflineStore) |
| 228 | +``` |
| 229 | + |
| 230 | +Features can be pushed directly without materialization, useful for streaming or real-time features. |
| 231 | + |
| 232 | +## Extension Points |
| 233 | + |
| 234 | +Feast is designed for extensibility. To add a new backend: |
| 235 | + |
| 236 | +1. **Offline Store**: Subclass `OfflineStore` and `OfflineStoreConfig` in `infra/offline_stores/contrib/` |
| 237 | +2. **Online Store**: Subclass `OnlineStore` and `OnlineStoreConfig` in `infra/online_stores/` |
| 238 | +3. **Compute Engine**: Subclass `ComputeEngine` in `infra/compute_engines/` |
| 239 | +4. **Registry Store**: Subclass `RegistryStore` in `infra/registry/` |
| 240 | + |
| 241 | +Register the new implementation in `RepoConfig` (see `repo_config.py` for the class resolution logic). |
| 242 | + |
| 243 | +## Related Documents |
| 244 | + |
| 245 | +- [Development Guide](docs/project/development-guide.md) — build, test, and debug instructions |
| 246 | +- [ADR Index](docs/adr/README.md) — architecture decision records |
| 247 | +- [Operator README](infra/feast-operator/README.md) — Kubernetes operator documentation |
| 248 | +- [Helm Charts](infra/charts/) — deployment configuration |
0 commit comments