You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Domain-specific primitives**: `PdfBytes` (PDF binary data for RAG/document pipelines) and `ImageBytes` (image binary data for multimodal pipelines). These are semantic aliases over `Bytes` and must be explicitly declared in schema — no backend infers them.
11
12
-**Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`.
12
-
-**Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`.
13
+
-**Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`. Set types are not inferred by any backend and must be explicitly declared. They are best suited for online serving use cases.
13
14
-**Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
14
15
-**JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
15
16
-**Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.
Copy file name to clipboardExpand all lines: docs/reference/data-sources/overview.md
+7-1Lines changed: 7 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
In Feast, each batch data source is associated with corresponding offline stores.
6
6
For example, a `SnowflakeSource` can only be processed by the Snowflake offline store, while a `FileSource` can be processed by both File and DuckDB offline stores.
7
7
Otherwise, the primary difference between batch data sources is the set of supported types.
8
-
Feast has an internal type system, and aims to support eight primitive types (`bytes`, `string`, `int32`, `int64`, `float32`, `float64`, `bool`, and `timestamp`) along with the corresponding array types.
8
+
Feast has an internal type system that supports primitive types (`bytes`, `string`, `int32`, `int64`, `float32`, `float64`, `bool`, `timestamp`), array types, set types, map/JSON types, and struct types.
9
9
However, not every batch data source supports all of these types.
10
10
11
11
For more details on the Feast type system, see [here](../type-system.md).
@@ -29,3 +29,9 @@ Below is a matrix indicating which data sources support which types.
| array types | yes | yes | yes | no | yes | yes | yes | no |
32
+
|`Map`| yes | no | yes | yes | yes | yes | yes | no |
33
+
|`Json`| yes | yes | yes | yes | yes | no | no | no |
34
+
|`Struct`| yes | yes | no | no | yes | yes | no | no |
35
+
| set types | yes*| no | no | no | no | no | no | no |
36
+
37
+
\***Set types** are defined in Feast's proto and Python type system but are **not inferred** by any backend. They must be explicitly declared in the feature view schema and are best suited for online serving use cases. See [Type System](../type-system.md#set-types) for details.
Feast uses an internal type system to provide guarantees on training and serving data.
6
-
Feast supports primitive types, array types, set types, and map types for feature values.
6
+
Feast supports primitive types, array types, set types, map types, JSON, and struct types for feature values.
7
7
Null types are not supported, although the `UNIX_TIMESTAMP` type is nullable.
8
8
The type system is controlled by [`Value.proto`](https://github.com/feast-dev/feast/blob/master/protos/feast/types/Value.proto) in protobuf and by [`types.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/types.py) in Python.
9
9
Type conversion logic can be found in [`type_map.py`](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/type_map.py).
@@ -25,6 +25,19 @@ Feast supports the following data types:
These types are semantic aliases over `Bytes` for domain-specific use cases (e.g., RAG pipelines, image processing). They are stored as `bytes` at the proto level.
31
+
32
+
| Feast Type | Python Type | Description |
33
+
|------------|-------------|-------------|
34
+
|`PdfBytes`|`bytes`| PDF document binary data (used in RAG / document processing pipelines) |
35
+
|`ImageBytes`|`bytes`| Image binary data (used in image processing / multimodal pipelines) |
36
+
37
+
{% hint style="warning" %}
38
+
`PdfBytes` and `ImageBytes` are not natively supported by any backend's type inference. You must explicitly declare them in your feature view schema. Backend storage treats them as raw `bytes`.
39
+
{% endhint %}
40
+
28
41
### Array Types
29
42
30
43
All primitive types have corresponding array (list) types:
@@ -42,7 +55,7 @@ All primitive types have corresponding array (list) types:
42
55
43
56
### Set Types
44
57
45
-
All primitive types (except Map) have corresponding set types for storing unique values:
58
+
All primitive types (except `Map` and `Json`) have corresponding set types for storing unique values:
46
59
47
60
| Feast Type | Python Type | Description |
48
61
|------------|-------------|-------------|
@@ -57,16 +70,95 @@ All primitive types (except Map) have corresponding set types for storing unique
57
70
58
71
**Note:** Set types automatically remove duplicate values. When converting from lists or other iterables to sets, duplicates are eliminated.
59
72
73
+
{% hint style="warning" %}
74
+
**Backend limitations for Set types:**
75
+
76
+
-**No backend infers Set types from schema.** No offline store (BigQuery, Snowflake, Redshift, PostgreSQL, Spark, Athena, MSSQL) maps its native types to Feast Set types. You **must** explicitly declare Set types in your feature view schema.
77
+
-**No native PyArrow set type.** Feast converts Sets to `pyarrow.list_()` internally, but `feast_value_type_to_pa()` in `type_map.py` does not include Set mappings, which can cause errors in some code paths.
78
+
-**Online stores** that serialize proto bytes (e.g., SQLite, Redis, DynamoDB) handle Sets correctly.
79
+
-**Offline stores** may not handle Set types correctly during retrieval. For example, the Ray offline store only special-cases `_LIST` types, not `_SET`.
80
+
- Set types are best suited for **online serving** use cases where feature values are written as Python sets and retrieved via `get_online_features`.
81
+
{% endhint %}
82
+
60
83
### Map Types
61
84
62
85
Map types allow storing dictionary-like data structures:
63
86
64
87
| Feast Type | Python Type | Description |
65
88
|------------|-------------|-------------|
66
-
|`Map`|`Dict[str, Any]`| Dictionary with string keys and any supported Feast type as values (including nested maps) |
89
+
|`Map`|`Dict[str, Any]`| Dictionary with string keys and values of any supported Feast type (including nested maps) |
67
90
|`Array(Map)`|`List[Dict[str, Any]]`| List of dictionaries |
68
91
69
-
**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps.
92
+
**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.
| DynamoDB / Redis | Proto bytes | Full proto Map support |
105
+
106
+
### JSON Type
107
+
108
+
The `Json` type represents opaque JSON data. Unlike `Map`, which is schema-free key-value storage, `Json` is stored as a string at the proto level but backends use native JSON types where available.
109
+
110
+
| Feast Type | Python Type | Description |
111
+
|------------|-------------|-------------|
112
+
|`Json`|`str` (JSON-encoded) | JSON data stored as a string at the proto level |
113
+
|`Array(Json)`|`List[str]`| List of JSON strings |
114
+
115
+
**Backend support for Json:**
116
+
117
+
| Backend | Native Type |
118
+
|---------|-------------|
119
+
| PostgreSQL |`jsonb`|
120
+
| Snowflake |`JSON` / `VARIANT`|
121
+
| Redshift |`json`|
122
+
| BigQuery |`JSON`|
123
+
| Spark | Not natively distinguished from `String`|
124
+
| MSSQL |`nvarchar(max)`|
125
+
126
+
{% hint style="info" %}
127
+
When a backend's native type is ambiguous (e.g., PostgreSQL `jsonb` could be `Map` or `Json`), **the schema-declared Feast type takes precedence**. The backend-to-Feast mappings are only used during schema inference when no explicit type is provided.
128
+
{% endhint %}
129
+
130
+
### Struct Type
131
+
132
+
The `Struct` type represents a schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation.
133
+
134
+
| Feast Type | Python Type | Description |
135
+
|------------|-------------|-------------|
136
+
|`Struct({"field": Type, ...})`|`Dict[str, Any]`| Named fields with typed values |
137
+
|`Array(Struct({"field": Type, ...}))`|`List[Dict[str, Any]]`| List of structs |
138
+
139
+
**Example:**
140
+
```python
141
+
from feast.types import Struct, String, Int32, Array
*`source_datatype_to_feast_value_type` cals the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.
333
+
*`source_datatype_to_feast_value_type` calls the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.
334
+
335
+
{% hint style="info" %}
336
+
**Types that cannot be inferred:**`Set`, `Json`, `Struct`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
0 commit comments