You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/README.md
+8-9Lines changed: 8 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,6 @@ Feathr automatically computes your feature values and joins them to your trainin
28
28
-**Native cloud integration** with simplified and scalable architecture, which is illustrated in the next section.
29
29
-**Feature sharing and reuse made easy:** Feathr has built-in feature registry so that features can be easily shared across different teams and boost team productivity.
30
30
31
-
32
31
## Running Feathr on Azure with 3 Simple Steps
33
32
34
33
Feathr has native cloud integration. To use Feathr on Azure, you only need three steps:
@@ -50,7 +49,7 @@ Feathr has native cloud integration. To use Feathr on Azure, you only need three
50
49
If you are not using the above Jupyter Notebook and want to install Feathr client locally, use this:
51
50
52
51
```bash
53
-
pip install -U feathr
52
+
pip install feathr
54
53
```
55
54
56
55
Or use the latest code from GitHub:
@@ -126,31 +125,30 @@ Read the [Streaming Source Ingestion Guide](https://linkedin.github.io/feathr/ho
126
125
127
126
Read [Point-in-time Correctness and Point-in-time Join in Feathr](https://linkedin.github.io/feathr/concepts/point-in-time-join.html) for more details.
128
127
129
-
130
128
## Running Feathr Examples
131
129
132
-
Follow the [quick start Jupyter Notebook](./feathr_project/feathrcli/data/feathr_user_workspace/product_recommendation_demo.ipynb) to try it out. There is also a companion [quick start guide](https://linkedin.github.io/feathr/quickstart.html) containing a bit more explanation on the notebook.
133
-
130
+
Follow the [quick start Jupyter Notebook](https://github.com/linkedin/feathr/blob/main/feathr_project/feathrcli/data/feathr_user_workspace/product_recommendation_demo.ipynb) to try it out.
131
+
There is also a companion [quick start guide](https://linkedin.github.io/feathr/quickstart_synapse.html) containing a bit more explanation on the notebook.
134
132
135
133
## Cloud Architecture
136
134
137
135
Feathr has native integration with Azure and other cloud services, and here's the high-level architecture to help you get started.
138
136

139
137
140
-
# Next Steps
138
+
##Next Steps
141
139
142
-
## Quickstart
140
+
###Quickstart
143
141
144
142
-[Quickstart for Azure Synapse](quickstart_synapse.md)
Feature generation is the process to create features from raw source data into a certain persisted storage.
7
+
# Feature Generation and Materialization
9
8
10
-
User could utilize feature generation to pre-compute and materialize pre-defined features to online and/or offline storage. This is desirable when the feature transformation is computation intensive or when the features can be reused(usually in offline setting). Feature generation is also useful in generating embedding features. Embedding distill information from large data and it is usually more compact.
9
+
Feature generation (also known as feature materialization) is the process to create features from raw source data into a certain persisted storage in either offline store (for further reuse), or online store (for online inference).
10
+
11
+
User can utilize feature generation to pre-compute and materialize pre-defined features to online and/or offline storage. This is desirable when the feature transformation is computation intensive or when the features can be reused (usually in offline setting). Feature generation is also useful in generating embedding features, where those embeddings distill information from large data and is usually more compact.
11
12
12
13
## Generating Features to Online Store
13
-
When we need to serve the models online, we also need to serve the features online. We provide APIs to generate features to online storage for future consumption. For example:
14
+
15
+
When the models are served in an online environment, we also need to serve the corresponding features in the same online environment as well. Feathr provides APIs to generate features to online storage for future consumption. For example:
([MaterializationSettings API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.MaterializationSettings),
25
-
[RedisSink API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.RedisSink)
27
+
More reference on the APIs:
28
+
29
+
-[MaterializationSettings API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.MaterializationSettings)
30
+
-[RedisSink API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.RedisSink)
26
31
27
32
In the above example, we define a Redis table called `nycTaxiDemoFeature` and materialize two features called `f_location_avg_fare` and `f_location_max_fare` to Redis.
28
33
29
-
It is also possible to backfill the features for a previous time range, like below. If the `BackfillTime` part is not specified, it's by default to `now()` (i.e. if not specified, it's equivilant to `BackfillTime(start=now, end=now, step=timedelta(days=1))`).
34
+
## Feature Backfill
35
+
36
+
It is also possible to backfill the features for a particular time range, like below. If the `BackfillTime` part is not specified, it's by default to `now()` (i.e. if not specified, it's equivalent to `BackfillTime(start=now, end=now, step=timedelta(days=1))`).
([BackfillTime API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.BackfillTime),
43
-
[client.materialize_features() API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.materialize_features))
49
+
Note that if you don't have features available in `now`, you'd better specify a `BackfillTime` range where you have features.
44
50
45
-
## Consuming the online features
51
+
Also, Feathr will submit a materialization job for each of the step for performance reasons. I.e. if you have
52
+
`BackfillTime(start=datetime(2022, 2, 1), end=datetime(2022, 2, 20), step=timedelta(days=1))`, Feathr will submit 20 jobs to run in parallel for maximum performance.
46
53
47
-
```python
48
-
client.wait_job_to_finish(timeout_sec=600)
54
+
More reference on the APIs:
55
+
56
+
-[BackfillTime API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.BackfillTime)
57
+
-[client.materialize_features() API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.materialize_features)
49
58
50
-
res = client.get_online_features('nycTaxiDemoFeature', '265', [
51
-
'f_location_avg_fare', 'f_location_max_fare'])
59
+
60
+
61
+
## Consuming features in online environment
62
+
63
+
After the materialization job is finished, we can get the online features by querying the `feature table`, corresponding `entity key` and a list of `feature names`. In the example below, we query the online features called `f_location_avg_fare` and `f_location_max_fare`, and query with a key `265` (which is the location ID).
64
+
65
+
```python
66
+
res = client.get_online_features('nycTaxiDemoFeature', '265', ['f_location_avg_fare', 'f_location_max_fare'])
52
67
```
53
68
54
-
([client.get_online_features API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.get_online_features))
69
+
More reference on the APIs:
70
+
-[client.get_online_features API doc](https://feathr.readthedocs.io/en/latest/feathr.html#feathr.FeathrClient.get_online_features)
55
71
56
-
After we finish running the materialization job, we can get the online features by querying the feature name, with the
57
-
corresponding keys. In the example above, we query the online features called `f_location_avg_fare` and
58
-
`f_location_max_fare`, and query with a key `265` (which is the location ID).
72
+
## Materializing Features to Offline Store
59
73
60
-
## Generating Features to Offline Store
74
+
This is useful when the feature transformation is compute intensive and features can be re-used. For example, you have a feature that needs more than 24 hours to compute and the feature can be reused by more than one model training pipeline. In this case, you should consider generating features to offline.
61
75
62
-
This is a useful when the feature transformation is computation intensive and features can be re-used. For example, you
63
-
have a feature that needs more than 24 hours to compute and the feature can be reused by more than one model training
64
-
pipeline. In this case, you should consider generate features to offline. Here is an API example:
76
+
The API call is very similar to materializing features to online store, and here is an API example:
This will generate features from `2020/05/10` to `2020/05/20` and the output will have 11 folders, from
105
+
`abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/10` to `abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/20`. Note that currently Feathr only supports materializing data in daily step (i.e. even if you specify an hourly step, the generated features in offline store will still be presented in a daily hierarchy).
106
+
107
+
You can also specify the format of the materialized features in the offline store by using `execution_configurations` like below. Please refer to the [documentation](../how-to-guides/feathr-job-configuration.md) here for those configuration details.
For reading those materialized features, Feathr has a convenient helper function called `get_result_df` to help you view the data. For example, you can use the sample code below to read from the materialized result in offline store:
Copy file name to clipboardExpand all lines: docs/how-to-guides/client-callback-function.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,27 +10,29 @@ A callback function is a function that is sent to another function as an argumen
10
10
11
11
## How to use callback functions
12
12
13
-
Currently the below functions in feathr client support passing a callback as an argument:
13
+
We can pass a callback function when initializing the feathr client.
14
+
15
+
```python
16
+
client = FeathrClient(config_path, callback)
17
+
```
18
+
19
+
The below functions accept an optional parameters named **params**. params is a dictionary where user can pass the arguments for the callback function.
14
20
15
21
- get_online_features
16
22
- multi_get_online_features
17
23
- get_offline_features
18
24
- monitor_features
19
25
- materialize_features
20
26
21
-
These functions accept two optional parameters named **callback** and **params**.
22
-
callback is of type function and params is a dictionary where user can pass the arguments for the callback function.
0 commit comments