Skip to content

Commit 2fef7d9

Browse files
Ambient Code Botclaude
andcommitted
docs: Add ARCHITECTURE.md and Architecture Decision Records
Add top-level architecture documentation and ADRs for the Feast project to improve codebase understanding for contributors and AI agents. ARCHITECTURE.md covers: - System overview with architecture diagram - Component overview table (Registry, Provider, Offline/Online Store, Compute Engine) - Core concepts (Entity, FeatureView, FeatureService, DataSource, Permission) - Key abstractions (FeatureStore, Provider, OfflineStore, OnlineStore, Registry, ComputeEngine) - Feature Server endpoints - Permissions and authorization system - CLI commands - Kubernetes Operator structure - Protobuf definitions layout - Multi-language SDK overview - Directory structure map - Data flow diagrams (Training, Serving, Push-Based) - Extension points guide ADRs (docs/adr/): - 001: Pluggable Offline and Online Store Architecture - 002: Registry as Serialized Protobuf Metadata Store - 003: PassthroughProvider as Universal Provider - 004: Compute Engine Abstraction for Materialization - 005: Push-Based Feature Ingestion - 006: Fine-Grained Permissions and Authorization - 007: Kubernetes Operator for Feast Deployment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ambient Code Bot <bot@ambient-code.local>
1 parent c3332dc commit 2fef7d9

9 files changed

+580
-0
lines changed

ARCHITECTURE.md

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
# Feast Architecture
2+
3+
This document describes the high-level architecture of Feast, the open-source feature store for machine learning. It is intended for contributors, AI agents, and anyone who needs to understand how the codebase is organized.
4+
5+
## System Overview
6+
7+
Feast manages the lifecycle of ML features: from batch data sources through offline storage, materialization into online stores, and low-latency serving for real-time inference. The system is designed around pluggable backends—every storage layer, compute engine, and registry can be swapped independently.
8+
9+
![Feast Architecture](docs/.gitbook/assets/feast-marchitecture-211014.png)
10+
11+
### Component Overview
12+
13+
| Layer | Component | Implementations |
14+
|-------|-----------|-----------------|
15+
| **SDK / CLI** | `feast.feature_store` | Python, Go, Java |
16+
| **Registry** | Metadata catalog | File (S3/GCS), SQL (Postgres/MySQL/SQLite) |
17+
| **Provider** | Orchestrator | PassthroughProvider |
18+
| **Offline Store** | Historical retrieval | BigQuery, Snowflake, Redshift, Spark, DuckDB, Postgres, Trino, Athena |
19+
| **Online Store** | Low-latency serving | Redis, DynamoDB, Bigtable, Postgres, SQLite, Cassandra, Milvus, Qdrant |
20+
| **Compute Engine** | Materialization jobs | Local, Spark, Kubernetes, Ray, Snowflake, AWS Lambda |
21+
22+
## Core Concepts
23+
24+
| Concept | Description | Definition File |
25+
|---------|-------------|-----------------|
26+
| **Entity** | A real-world object (user, product) that features describe | `sdk/python/feast/entity.py` |
27+
| **FeatureView** | A group of features sourced from a single data source | `sdk/python/feast/feature_view.py` |
28+
| **OnDemandFeatureView** | Features computed at request time via transformations | `sdk/python/feast/on_demand_feature_view.py` |
29+
| **StreamFeatureView** | Features derived from streaming data sources | `sdk/python/feast/stream_feature_view.py` |
30+
| **FeatureService** | A named collection of feature views for a use case | `sdk/python/feast/feature_service.py` |
31+
| **DataSource** | Connection to raw data (file, warehouse, stream) | `sdk/python/feast/data_source.py` |
32+
| **Permission** | Authorization policy controlling access to resources | `sdk/python/feast/permissions/permission.py` |
33+
34+
## Key Abstractions
35+
36+
### FeatureStore (`sdk/python/feast/feature_store.py`)
37+
38+
The main entry point for all SDK operations. Users interact with Feast through this class:
39+
40+
- `apply()` — register feature definitions in the registry
41+
- `get_historical_features()` — point-in-time correct feature retrieval for training
42+
- `get_online_features()` — low-latency feature retrieval for inference
43+
- `materialize()` / `materialize_incremental()` — copy features from offline to online store
44+
- `push()` — push features directly to the online store
45+
- `teardown()` — remove infrastructure
46+
47+
### Provider (`sdk/python/feast/infra/provider.py`)
48+
49+
Orchestrates the offline store, online store, and compute engine. All cloud providers (GCP, AWS, Azure, local) use `PassthroughProvider`, which delegates directly to the configured store implementations.
50+
51+
### OfflineStore (`sdk/python/feast/infra/offline_stores/offline_store.py`)
52+
53+
Abstract base class for historical feature retrieval. Key methods:
54+
55+
- `get_historical_features()` — point-in-time join of features with entity timestamps
56+
- `pull_latest_from_table_or_query()` — extract latest entity rows for materialization
57+
- `pull_all_from_table_or_query()` — extract all rows in a time range
58+
- `offline_write_batch()` — write features to the offline store
59+
60+
Implementations: BigQuery, Snowflake, Redshift, Spark, Dask, DuckDB, Postgres, Trino, Athena, and more under `infra/offline_stores/contrib/`.
61+
62+
### OnlineStore (`sdk/python/feast/infra/online_stores/online_store.py`)
63+
64+
Abstract base class for low-latency feature serving. Key methods:
65+
66+
- `online_read()` — read features by entity keys
67+
- `online_write_batch()` — write materialized features
68+
- `update()` — create/update cloud resources
69+
- `retrieve_online_documents()` — vector similarity search (for embedding stores)
70+
71+
Implementations: Redis, DynamoDB, Bigtable, Snowflake, SQLite, Postgres, Cassandra, MongoDB, MySQL, Elasticsearch, Milvus, Qdrant, and a HybridOnlineStore that combines multiple backends.
72+
73+
### Registry (`sdk/python/feast/infra/registry/`)
74+
75+
The metadata catalog that stores all feature definitions (entities, feature views, feature services, permissions). Two main implementations:
76+
77+
- **FileRegistry** (`registry.py`) — serializes the entire registry as a single protobuf file, stored on local disk, S3, GCS, or Azure Blob. Uses `RegistryStore` backends for storage.
78+
- **SqlRegistry** (`sql.py`) — stores metadata in a SQL database (PostgreSQL, MySQL, SQLite).
79+
80+
### ComputeEngine (`sdk/python/feast/infra/compute_engines/base.py`)
81+
82+
Abstract base class for materialization — the process of copying features from the offline store to the online store. Key method:
83+
84+
- `materialize()` — execute materialization tasks, each representing a (feature_view, time_range) pair
85+
86+
Implementations: Local (single-machine), Spark, Kubernetes (K8s Jobs), Ray, Snowflake (SQL-based), AWS Lambda.
87+
88+
The compute engine also includes a DAG module (`compute_engines/dag/`) for building execution plans with nodes, values, and contexts.
89+
90+
### Feature Server (`sdk/python/feast/feature_server.py`)
91+
92+
A FastAPI application that exposes Feast operations over HTTP:
93+
94+
- `POST /get-online-features` — retrieve online features
95+
- `POST /push` — push features to online/offline stores
96+
- `POST /materialize` — trigger materialization
97+
- `POST /materialize-incremental` — incremental materialization
98+
99+
Started via `feast serve` CLI command.
100+
101+
## Permissions and Authorization
102+
103+
The permissions system (`sdk/python/feast/permissions/`) provides fine-grained access control:
104+
105+
| Component | File | Purpose |
106+
|-----------|------|---------|
107+
| `Permission` | `permission.py` | Policy definition (resource type + action + roles) |
108+
| `SecurityManager` | `security_manager.py` | Runtime permission enforcement |
109+
| `AuthManager` | `auth/auth_manager.py` | Token extraction and parsing |
110+
| `AuthConfig` | `auth_model.py` | Auth configuration (OIDC, Kubernetes, NoAuth) |
111+
112+
Auth flow: Client sends token → AuthManager extracts identity → SecurityManager checks Permission policies → access granted or denied.
113+
114+
Server-side enforcement is implemented for REST (`permissions/server/rest.py`), gRPC (`permissions/server/grpc.py`), and Arrow Flight protocols. Client-side interceptors handle token injection for each transport.
115+
116+
## CLI
117+
118+
The Feast CLI (`sdk/python/feast/cli/cli.py`) is built with Click and provides commands for:
119+
120+
- `feast apply` — register feature definitions
121+
- `feast materialize` / `feast materialize-incremental` — run materialization
122+
- `feast serve` — start the feature server
123+
- `feast plan` — preview changes before applying
124+
- `feast teardown` — remove infrastructure
125+
- `feast init` — scaffold a new feature repository
126+
127+
## Kubernetes Operator
128+
129+
The Feast Operator (`infra/feast-operator/`) is a Go-based Kubernetes operator built with controller-runtime (Kubebuilder):
130+
131+
| Component | Location | Purpose |
132+
|-----------|----------|---------|
133+
| CRD (`FeatureStore`) | `api/v1/featurestore_types.go` | Custom Resource Definition |
134+
| Reconciler | `internal/controller/featurestore_controller.go` | Main control loop |
135+
| Service handlers | `internal/controller/services/` | Manage Deployments, Services, ConfigMaps |
136+
| AuthZ | `internal/controller/authz/` | RBAC/authorization setup |
137+
138+
The operator watches `FeatureStore` custom resources and reconciles Deployments, Services, ConfigMaps, Secrets, CronJobs, and HPAs to run Feast components in Kubernetes.
139+
140+
**Phases**: Ready, Pending, Failed
141+
**Conditions**: ClientReady, OfflineStoreReady, OnlineStoreReady, RegistryReady, UIReady, AuthorizationReady, CronJobReady
142+
143+
## Protobuf Definitions
144+
145+
All cross-language data models and service interfaces are defined in Protocol Buffers (`protos/feast/`):
146+
147+
```
148+
protos/feast/
149+
├── core/ # Data models: Entity, FeatureView, FeatureService, Permission, Registry
150+
├── serving/ # ServingService, TransformationService gRPC APIs
151+
├── registry/ # RegistryServer gRPC API
152+
├── storage/ # Redis storage format
153+
└── types/ # Primitive types: Value, EntityKey, Field
154+
```
155+
156+
Protos are compiled to Python (`make compile-protos-python`), Go (`make compile-protos-go`), and Java.
157+
158+
## Multi-Language SDKs
159+
160+
| SDK | Location | Purpose |
161+
|-----|----------|---------|
162+
| **Python** | `sdk/python/` | Primary SDK — full feature store implementation |
163+
| **Go** | `go/` | Embedded online feature retrieval |
164+
| **Java** | `java/` | Serving client and feature server |
165+
166+
The Python SDK is the canonical implementation. The Go and Java SDKs provide serving capabilities and client libraries.
167+
168+
## Directory Structure
169+
170+
```
171+
feast/
172+
├── sdk/python/feast/ # Python SDK (primary implementation)
173+
│ ├── cli/ # CLI commands (Click)
174+
│ ├── infra/ # Infrastructure abstractions
175+
│ │ ├── offline_stores/ # Offline store implementations
176+
│ │ ├── online_stores/ # Online store implementations
177+
│ │ ├── compute_engines/ # Materialization engines
178+
│ │ ├── registry/ # Registry implementations
179+
│ │ ├── feature_servers/ # Feature server deployments
180+
│ │ └── common/ # Shared infra code
181+
│ ├── permissions/ # Authorization system
182+
│ ├── transformation/ # Feature transformations
183+
│ ├── templates/ # Project templates
184+
│ └── feature_store.py # Main FeatureStore class
185+
├── go/ # Go SDK
186+
├── java/ # Java SDK (serving + client)
187+
├── protos/ # Protocol Buffer definitions
188+
├── ui/ # React/TypeScript web UI
189+
├── infra/ # Infrastructure and deployment
190+
│ ├── feast-operator/ # Kubernetes operator (Go)
191+
│ ├── charts/ # Helm charts
192+
│ ├── scripts/ # Build and release scripts
193+
│ ├── terraform/ # Cloud infrastructure (IaC)
194+
│ └── templates/ # Configuration templates
195+
├── docs/ # Documentation (GitBook)
196+
├── examples/ # Example feature repositories
197+
└── Makefile # Build targets (80+ targets)
198+
```
199+
200+
## Data Flow
201+
202+
### Training (Offline)
203+
204+
```
205+
Data Source → OfflineStore.get_historical_features() → Point-in-Time Join → Training DataFrame
206+
```
207+
208+
1. User defines `FeatureView` + `Entity` + `DataSource`
209+
2. User calls `store.get_historical_features(entity_df, features)`
210+
3. OfflineStore performs point-in-time join against the data source
211+
4. Returns a `RetrievalJob` that materializes to a DataFrame or Arrow table
212+
213+
### Serving (Online)
214+
215+
```
216+
OfflineStore → ComputeEngine.materialize() → OnlineStore → FeatureServer → Inference
217+
```
218+
219+
1. `feast materialize` triggers the compute engine
220+
2. ComputeEngine reads latest values from the offline store
221+
3. Values are written to the online store via `OnlineStore.online_write_batch()`
222+
4. Feature server or SDK reads from online store via `OnlineStore.online_read()`
223+
224+
### Push-Based Ingestion
225+
226+
```
227+
Application → FeatureStore.push() → OnlineStore (+ optionally OfflineStore)
228+
```
229+
230+
Features can be pushed directly without materialization, useful for streaming or real-time features.
231+
232+
## Extension Points
233+
234+
Feast is designed for extensibility. To add a new backend:
235+
236+
1. **Offline Store**: Subclass `OfflineStore` and `OfflineStoreConfig` in `infra/offline_stores/contrib/`
237+
2. **Online Store**: Subclass `OnlineStore` and `OnlineStoreConfig` in `infra/online_stores/`
238+
3. **Compute Engine**: Subclass `ComputeEngine` in `infra/compute_engines/`
239+
4. **Registry Store**: Subclass `RegistryStore` in `infra/registry/`
240+
241+
Register the new implementation in `RepoConfig` (see `repo_config.py` for the class resolution logic).
242+
243+
## Related Documents
244+
245+
- [Development Guide](docs/project/development-guide.md) — build, test, and debug instructions
246+
- [ADR Index](docs/adr/README.md) — architecture decision records
247+
- [Operator README](infra/feast-operator/README.md) — Kubernetes operator documentation
248+
- [Helm Charts](infra/charts/) — deployment configuration
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# ADR-001: Pluggable Offline and Online Store Architecture
2+
3+
## Status
4+
5+
Accepted (January 2021)
6+
7+
## Context
8+
9+
Feast needs to support a wide variety of data infrastructure backends. Different organizations use different data warehouses (BigQuery, Snowflake, Redshift, Spark) for historical feature storage and different databases (Redis, DynamoDB, Bigtable, Postgres) for low-latency serving. A monolithic approach would require every user to install dependencies for all backends and would make it difficult for the community to contribute new integrations.
10+
11+
## Decision
12+
13+
Define abstract base classes `OfflineStore` and `OnlineStore` that declare the interface each backend must implement. Each backend is a separate module that can be selected via `RepoConfig`. Contributed backends live under `infra/offline_stores/contrib/` to keep them separate from core-maintained implementations.
14+
15+
Key interface methods:
16+
17+
- **OfflineStore**: `get_historical_features()`, `pull_latest_from_table_or_query()`, `pull_all_from_table_or_query()`, `offline_write_batch()`
18+
- **OnlineStore**: `online_read()`, `online_write_batch()`, `update()`, `teardown()`, `retrieve_online_documents()`
19+
20+
Backend selection is done via string identifiers in `feature_store.yaml` (e.g., `offline_store: bigquery`), which are resolved to Python classes at runtime through `RepoConfig`.
21+
22+
## Consequences
23+
24+
**Positive:**
25+
- Users only install dependencies for their chosen backends
26+
- New backends can be added without modifying core code
27+
- Community contributions are isolated under `contrib/`
28+
- Configuration-driven backend selection simplifies deployment
29+
30+
**Negative:**
31+
- Interface changes require updates across all implementations
32+
- Testing matrix grows with each new backend
33+
- Contributed backends may have inconsistent quality or maintenance levels
34+
35+
## References
36+
37+
- `sdk/python/feast/infra/offline_stores/offline_store.py` — OfflineStore base class
38+
- `sdk/python/feast/infra/online_stores/online_store.py` — OnlineStore base class
39+
- `sdk/python/feast/repo_config.py` — Backend resolution logic

docs/adr/002-registry-design.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# ADR-002: Registry as Serialized Protobuf Metadata Store
2+
3+
## Status
4+
5+
Accepted (January 2021)
6+
7+
## Context
8+
9+
Feast needs a metadata catalog to store feature definitions (entities, feature views, feature services, data sources, permissions). This registry must be accessible from multiple environments (local development, CI/CD, production serving) and should not require heavy infrastructure for simple deployments.
10+
11+
## Decision
12+
13+
Implement the registry as a single serialized Protocol Buffer file (`Registry.proto`) that can be stored on local disk or cloud object storage (S3, GCS, Azure Blob). This is the `FileRegistry` implementation, backed by pluggable `RegistryStore` classes.
14+
15+
For production deployments needing concurrent access and transactional updates, provide `SqlRegistry` as an alternative that stores metadata in a SQL database (PostgreSQL, MySQL, SQLite).
16+
17+
Both implementations share the `BaseRegistry` abstract interface, ensuring consistent behavior regardless of backend.
18+
19+
**Registry store backends:**
20+
- `FileRegistryStore` — local filesystem
21+
- `S3RegistryStore` — Amazon S3
22+
- `GCSRegistryStore` — Google Cloud Storage
23+
- `AzureRegistryStore` — Azure Blob Storage
24+
- `HDFSRegistryStore` — Hadoop HDFS
25+
26+
## Consequences
27+
28+
**Positive:**
29+
- Zero-infrastructure setup for local development (SQLite file)
30+
- Cloud-native storage for production (S3/GCS)
31+
- SQL backend provides transactional semantics for concurrent access
32+
- Protobuf serialization ensures cross-language compatibility
33+
34+
**Negative:**
35+
- FileRegistry has no built-in concurrency control (last-writer-wins)
36+
- Full registry serialization/deserialization on every read (mitigated by TTL-based caching)
37+
- Two distinct implementations to maintain (File and SQL)
38+
39+
## References
40+
41+
- `sdk/python/feast/infra/registry/registry.py` — FileRegistry implementation
42+
- `sdk/python/feast/infra/registry/sql.py` — SqlRegistry implementation
43+
- `sdk/python/feast/infra/registry/base_registry.py` — BaseRegistry interface
44+
- `protos/feast/core/Registry.proto` — Registry protobuf definition
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# ADR-003: PassthroughProvider as Universal Provider
2+
3+
## Status
4+
5+
Accepted (June 2021)
6+
7+
## Context
8+
9+
The `Provider` abstraction was originally intended to encapsulate cloud-specific logic. Separate providers existed for GCP, AWS, Azure, and local deployments. However, the pluggable offline/online store architecture (ADR-001) already handles backend-specific logic, making separate providers redundant. Maintaining multiple providers with nearly identical code increased maintenance burden.
10+
11+
## Decision
12+
13+
Collapse all cloud-specific providers into a single `PassthroughProvider` that delegates all operations to the configured offline store, online store, and compute engine. The provider string in configuration (`gcp`, `aws`, `azure`, `local`) still exists for backward compatibility but all resolve to `PassthroughProvider`.
14+
15+
```python
16+
PROVIDERS_CLASS_FOR_TYPE = {
17+
"gcp": "feast.infra.passthrough_provider.PassthroughProvider",
18+
"aws": "feast.infra.passthrough_provider.PassthroughProvider",
19+
"local": "feast.infra.passthrough_provider.PassthroughProvider",
20+
"azure": "feast.infra.passthrough_provider.PassthroughProvider",
21+
}
22+
```
23+
24+
## Consequences
25+
26+
**Positive:**
27+
- Single provider implementation to maintain
28+
- Backend-specific logic lives where it belongs (in store implementations)
29+
- Reduced code duplication across providers
30+
- Simpler mental model for contributors
31+
32+
**Negative:**
33+
- The `Provider` abstraction is now a thin orchestration layer with minimal logic
34+
- The provider config field is still required but functionally meaningless (any value maps to the same class)
35+
36+
## References
37+
38+
- `sdk/python/feast/infra/provider.py` — Provider base class and type mapping
39+
- `sdk/python/feast/infra/passthrough_provider.py` — PassthroughProvider implementation

0 commit comments

Comments
 (0)