Hive is not included in current Feast roadmap, this project intends to add Hive support for Offline Store.
For more details, can check this Feast issue.
Important: This project is still being developed and not ready for using yet, please let me know if any of you need it, I will probably give it more priority.
pip install feastInstall the latest dev version by pip:
pip install git+https://github.com/baineng/feast-hive.git or by clone the repo:
git clone https://github.com/baineng/feast-hive.git
cd feast-hive
python setup.py installfeast init feature_repo
cd feature_reposet offline_store type to be feast_hive.HiveOfflineStore
project: ...
registry: ...
provider: local
offline_store:
type: feast_hive.HiveOfflineStore
online_store:
...# This is an example feature definition file
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, ValueType
from feast_hive import HiveSource
# Read data from Hive table
# Need make sure the table_ref exists and have data before continue.
driver_hourly_stats = HiveSource(
host='localhost',
port=10000,
table_ref='example.driver_stats',
event_timestamp_column="datetime",
created_timestamp_column="created",
)
# Define an entity for the driver.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)
# Define FeatureView
driver_hourly_stats_view = FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400 * 1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
input=driver_hourly_stats,
tags={},
)feast applyThe rest are as same as Feast Quickstart
git clone https://github.com/baineng/feast-hive.git
cd feast-hive
# creating virtual env ...
pip install .[dev]
# before commit
make format
makr lintpip install .[test]
FEAST_HIVE_HOST=localhost FEAST_HIVE_PORT=10000 pytest --verbose --color=yes tests