Skip to content

Commit 611fe05

Browse files
authored
docs: Update type system reference with missing types (#6069)
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
1 parent 0719c06 commit 611fe05

File tree

3 files changed

+158
-12
lines changed

3 files changed

+158
-12
lines changed

docs/getting-started/concepts/feast-types.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@ Feast's type system is built on top of [protobuf](https://github.com/protocolbuf
88
Feast supports the following categories of data types:
99

1010
- **Primitive types**: numerical values (`Int32`, `Int64`, `Float32`, `Float64`), `String`, `Bytes`, `Bool`, and `UnixTimestamp`.
11+
- **Domain-specific primitives**: `PdfBytes` (PDF binary data for RAG/document pipelines) and `ImageBytes` (image binary data for multimodal pipelines). These are semantic aliases over `Bytes` and must be explicitly declared in schema — no backend infers them.
1112
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`.
12-
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`.
13+
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`. Set types are not inferred by any backend and must be explicitly declared. They are best suited for online serving use cases.
1314
- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
1415
- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
1516
- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.

docs/reference/data-sources/overview.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
In Feast, each batch data source is associated with corresponding offline stores.
66
For example, a `SnowflakeSource` can only be processed by the Snowflake offline store, while a `FileSource` can be processed by both File and DuckDB offline stores.
77
Otherwise, the primary difference between batch data sources is the set of supported types.
8-
Feast has an internal type system, and aims to support eight primitive types (`bytes`, `string`, `int32`, `int64`, `float32`, `float64`, `bool`, and `timestamp`) along with the corresponding array types.
8+
Feast has an internal type system that supports primitive types (`bytes`, `string`, `int32`, `int64`, `float32`, `float64`, `bool`, `timestamp`), array types, set types, map/JSON types, and struct types.
99
However, not every batch data source supports all of these types.
1010

1111
For more details on the Feast type system, see [here](../type-system.md).
@@ -29,3 +29,9 @@ Below is a matrix indicating which data sources support which types.
2929
| `bool` | yes | yes | yes | yes | yes | yes | yes | yes |
3030
| `timestamp` | yes | yes | yes | yes | yes | yes | yes | yes |
3131
| array types | yes | yes | yes | no | yes | yes | yes | no |
32+
| `Map` | yes | no | yes | yes | yes | yes | yes | no |
33+
| `Json` | yes | yes | yes | yes | yes | no | no | no |
34+
| `Struct` | yes | yes | no | no | yes | yes | no | no |
35+
| set types | yes* | no | no | no | no | no | no | no |
36+
37+
\* **Set types** are defined in Feast's proto and Python type system but are **not inferred** by any backend. They must be explicitly declared in the feature view schema and are best suited for online serving use cases. See [Type System](../type-system.md#set-types) for details.

docs/reference/type-system.md

Lines changed: 149 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Motivation
44

55
Feast uses an internal type system to provide guarantees on training and serving data.
6-
Feast supports primitive types, array types, set types, and map types for feature values.
6+
Feast supports primitive types, array types, set types, map types, JSON, and struct types for feature values.
77
Null types are not supported, although the `UNIX_TIMESTAMP` type is nullable.
88
The type system is controlled by [`Value.proto`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) in protobuf and by [`types.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py) in Python.
99
Type conversion logic can be found in [`type_map.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py).
@@ -25,6 +25,19 @@ Feast supports the following data types:
2525
| `Bool` | `bool` | Boolean value |
2626
| `UnixTimestamp` | `datetime` | Unix timestamp (nullable) |
2727

28+
### Domain-Specific Primitive Types
29+
30+
These types are semantic aliases over `Bytes` for domain-specific use cases (e.g., RAG pipelines, image processing). They are stored as `bytes` at the proto level.
31+
32+
| Feast Type | Python Type | Description |
33+
|------------|-------------|-------------|
34+
| `PdfBytes` | `bytes` | PDF document binary data (used in RAG / document processing pipelines) |
35+
| `ImageBytes` | `bytes` | Image binary data (used in image processing / multimodal pipelines) |
36+
37+
{% hint style="warning" %}
38+
`PdfBytes` and `ImageBytes` are not natively supported by any backend's type inference. You must explicitly declare them in your feature view schema. Backend storage treats them as raw `bytes`.
39+
{% endhint %}
40+
2841
### Array Types
2942

3043
All primitive types have corresponding array (list) types:
@@ -42,7 +55,7 @@ All primitive types have corresponding array (list) types:
4255

4356
### Set Types
4457

45-
All primitive types (except Map) have corresponding set types for storing unique values:
58+
All primitive types (except `Map` and `Json`) have corresponding set types for storing unique values:
4659

4760
| Feast Type | Python Type | Description |
4861
|------------|-------------|-------------|
@@ -57,16 +70,95 @@ All primitive types (except Map) have corresponding set types for storing unique
5770

5871
**Note:** Set types automatically remove duplicate values. When converting from lists or other iterables to sets, duplicates are eliminated.
5972

73+
{% hint style="warning" %}
74+
**Backend limitations for Set types:**
75+
76+
- **No backend infers Set types from schema.** No offline store (BigQuery, Snowflake, Redshift, PostgreSQL, Spark, Athena, MSSQL) maps its native types to Feast Set types. You **must** explicitly declare Set types in your feature view schema.
77+
- **No native PyArrow set type.** Feast converts Sets to `pyarrow.list_()` internally, but `feast_value_type_to_pa()` in `type_map.py` does not include Set mappings, which can cause errors in some code paths.
78+
- **Online stores** that serialize proto bytes (e.g., SQLite, Redis, DynamoDB) handle Sets correctly.
79+
- **Offline stores** may not handle Set types correctly during retrieval. For example, the Ray offline store only special-cases `_LIST` types, not `_SET`.
80+
- Set types are best suited for **online serving** use cases where feature values are written as Python sets and retrieved via `get_online_features`.
81+
{% endhint %}
82+
6083
### Map Types
6184

6285
Map types allow storing dictionary-like data structures:
6386

6487
| Feast Type | Python Type | Description |
6588
|------------|-------------|-------------|
66-
| `Map` | `Dict[str, Any]` | Dictionary with string keys and any supported Feast type as values (including nested maps) |
89+
| `Map` | `Dict[str, Any]` | Dictionary with string keys and values of any supported Feast type (including nested maps) |
6790
| `Array(Map)` | `List[Dict[str, Any]]` | List of dictionaries |
6891

69-
**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps.
92+
**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.
93+
94+
**Backend support for Map:**
95+
96+
| Backend | Native Type | Notes |
97+
|---------|-------------|-------|
98+
| PostgreSQL | `jsonb`, `jsonb[]` | `jsonb``Map`, `jsonb[]``Array(Map)` |
99+
| Snowflake | `VARIANT`, `OBJECT` | Inferred as `Map` |
100+
| Redshift | `SUPER` | Inferred as `Map` |
101+
| Spark | `map<string,string>` | `map<>``Map`, `array<map<>>``Array(Map)` |
102+
| Athena | `map` | Inferred as `Map` |
103+
| MSSQL | `nvarchar(max)` | Serialized as string |
104+
| DynamoDB / Redis | Proto bytes | Full proto Map support |
105+
106+
### JSON Type
107+
108+
The `Json` type represents opaque JSON data. Unlike `Map`, which is schema-free key-value storage, `Json` is stored as a string at the proto level but backends use native JSON types where available.
109+
110+
| Feast Type | Python Type | Description |
111+
|------------|-------------|-------------|
112+
| `Json` | `str` (JSON-encoded) | JSON data stored as a string at the proto level |
113+
| `Array(Json)` | `List[str]` | List of JSON strings |
114+
115+
**Backend support for Json:**
116+
117+
| Backend | Native Type |
118+
|---------|-------------|
119+
| PostgreSQL | `jsonb` |
120+
| Snowflake | `JSON` / `VARIANT` |
121+
| Redshift | `json` |
122+
| BigQuery | `JSON` |
123+
| Spark | Not natively distinguished from `String` |
124+
| MSSQL | `nvarchar(max)` |
125+
126+
{% hint style="info" %}
127+
When a backend's native type is ambiguous (e.g., PostgreSQL `jsonb` could be `Map` or `Json`), **the schema-declared Feast type takes precedence**. The backend-to-Feast mappings are only used during schema inference when no explicit type is provided.
128+
{% endhint %}
129+
130+
### Struct Type
131+
132+
The `Struct` type represents a schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation.
133+
134+
| Feast Type | Python Type | Description |
135+
|------------|-------------|-------------|
136+
| `Struct({"field": Type, ...})` | `Dict[str, Any]` | Named fields with typed values |
137+
| `Array(Struct({"field": Type, ...}))` | `List[Dict[str, Any]]` | List of structs |
138+
139+
**Example:**
140+
```python
141+
from feast.types import Struct, String, Int32, Array
142+
143+
# Struct with named, typed fields
144+
address_type = Struct({"street": String, "city": String, "zip": Int32})
145+
Field(name="address", dtype=address_type)
146+
147+
# Array of structs
148+
items_type = Array(Struct({"name": String, "quantity": Int32}))
149+
Field(name="order_items", dtype=items_type)
150+
```
151+
152+
**Backend support for Struct:**
153+
154+
| Backend | Native Type |
155+
|---------|-------------|
156+
| BigQuery | `STRUCT` / `RECORD` |
157+
| Spark | `struct<...>` / `array<struct<...>>` |
158+
| PostgreSQL | `jsonb` (serialized) |
159+
| Snowflake | `VARIANT` (serialized) |
160+
| MSSQL | `nvarchar(max)` (serialized) |
161+
| DynamoDB / Redis | Proto bytes |
70162

71163
## Complete Feature View Example
72164

@@ -77,7 +169,7 @@ from datetime import timedelta
77169
from feast import Entity, FeatureView, Field, FileSource
78170
from feast.types import (
79171
Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
80-
Array, Set, Map
172+
Array, Set, Map, Json, Struct
81173
)
82174

83175
# Define a data source
@@ -107,7 +199,7 @@ user_features = FeatureView(
107199
Field(name="profile_picture", dtype=Bytes),
108200
Field(name="is_active", dtype=Bool),
109201
Field(name="last_login", dtype=UnixTimestamp),
110-
202+
111203
# Array types
112204
Field(name="daily_steps", dtype=Array(Int32)),
113205
Field(name="transaction_history", dtype=Array(Int64)),
@@ -117,17 +209,24 @@ user_features = FeatureView(
117209
Field(name="document_hashes", dtype=Array(Bytes)),
118210
Field(name="notification_settings", dtype=Array(Bool)),
119211
Field(name="login_timestamps", dtype=Array(UnixTimestamp)),
120-
121-
# Set types (unique values only)
212+
213+
# Set types (unique values only — see backend caveats above)
122214
Field(name="visited_pages", dtype=Set(String)),
123215
Field(name="unique_categories", dtype=Set(Int32)),
124216
Field(name="tag_ids", dtype=Set(Int64)),
125217
Field(name="preferred_languages", dtype=Set(String)),
126-
218+
127219
# Map types
128220
Field(name="user_preferences", dtype=Map),
129221
Field(name="metadata", dtype=Map),
130222
Field(name="activity_log", dtype=Array(Map)),
223+
224+
# JSON type
225+
Field(name="raw_event", dtype=Json),
226+
227+
# Struct type
228+
Field(name="address", dtype=Struct({"street": String, "city": String, "zip": Int32})),
229+
Field(name="order_items", dtype=Array(Struct({"name": String, "qty": Int32}))),
131230
],
132231
source=user_features_source,
133232
)
@@ -184,6 +283,42 @@ activity_log = [
184283
]
185284
```
186285

286+
### JSON Type Usage Examples
287+
288+
Feast's `Json` type stores values as JSON strings at the proto level. You can pass either a
289+
pre-serialized JSON string or a Python dict/list — Feast will call `json.dumps()` automatically
290+
when the value is not already a string:
291+
292+
```python
293+
import json
294+
295+
# Option 1: pass a Python dict — Feast calls json.dumps() internally during proto conversion
296+
raw_event = {"type": "click", "target": "button_1", "metadata": {"page": "home"}}
297+
298+
# Option 2: pass an already-serialized JSON string — Feast validates it via json.loads()
299+
raw_event = '{"type": "click", "target": "button_1", "metadata": {"page": "home"}}'
300+
301+
# When building a DataFrame for store.push(), values must be strings since
302+
# Pandas/PyArrow columns expect uniform types:
303+
import pandas as pd
304+
event_df = pd.DataFrame({
305+
"user_id": ["user_1"],
306+
"event_timestamp": [datetime.now()],
307+
"raw_event": [json.dumps({"type": "click", "target": "button_1"})],
308+
})
309+
store.push("event_push_source", event_df)
310+
```
311+
312+
### Struct Type Usage Examples
313+
314+
```python
315+
# Struct — schema-aware, fields and types are declared
316+
from feast.types import Struct, String, Int32
317+
318+
address = Struct({"street": String, "city": String, "zip": Int32})
319+
# Value: {"street": "123 Main St", "city": "Springfield", "zip": 62704}
320+
```
321+
187322
## Type System in Practice
188323

189324
The sections below explain how Feast uses its type system in different contexts.
@@ -195,7 +330,11 @@ For example, if the `schema` parameter is not specified for a feature view, Feas
195330
Each of these columns must be associated with a Feast type, which requires conversion from the data source type system to the Feast type system.
196331
* The feature inference logic calls `_infer_features_and_entities`.
197332
* `_infer_features_and_entities` calls `source_datatype_to_feast_value_type`.
198-
* `source_datatype_to_feast_value_type` cals the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.
333+
* `source_datatype_to_feast_value_type` calls the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.
334+
335+
{% hint style="info" %}
336+
**Types that cannot be inferred:** `Set`, `Json`, `Struct`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
337+
{% endhint %}
199338

200339
### Materialization
201340

0 commit comments

Comments
 (0)