fix: Check duplicate names for feature view across types#5999
fix: Check duplicate names for feature view across types#5999ntkathole merged 5 commits intofeast-dev:masterfrom
Conversation
77cc24c to
c299a71
Compare
|
|
||
| # Check StreamFeatureView before FeatureView since StreamFeatureView is a subclass | ||
| # Note: All getters raise FeatureViewNotFoundException (not type-specific exceptions) | ||
| if isinstance(feature_view, StreamFeatureView): |
There was a problem hiding this comment.
Does this mean it will call to retrieve feature view many times? Can we optimize it?
There was a problem hiding this comment.
If I understand correctly, you're concerned that when creating 100 feature views, the apply_feature_view function calls the getters for the other two types each time. That results in 2 database calls per feature view — 200 total — and you’re wondering if we should optimize this part.
There was a problem hiding this comment.
@HaoXuAI
I have a plan. We need to validate in two places:
- In the incoming request — ensure the new feature view name doesn’t exist in the other two types within the request (and vice versa).
- In the database — check against existing records (for the SQL registry case).
Currently, the cache util doesn’t have an exists function to check if a name is already in the registry. Even if it did, calling it 100 times wouldn’t be efficient.
So I’m planning to create a util function per feature view type that takes a list of names and a project, and checks that none of the names exist in the DB.
For example, if 50 feature views come in:
- I’ll call the new util once with those 50 names against the stream and on-demand tables.
- Do the same vice versa for the other types.
This way, instead of calling per feature view, we make only two DB calls per type.
That’s my current idea. Do you see any better optimization?
There was a problem hiding this comment.
@HaoXuAI I can go ahead to implement, if you find we are good with above logic 😄
There was a problem hiding this comment.
i think that makes sense. Feel free to create a new issue for this. We can go with the current solution for now
3d8682d to
0332891
Compare
| f"More than one feature view with name {case_insensitive_fv_name} found. " | ||
| f"Please ensure that all feature view names are case-insensitively unique. " | ||
| f"It may be necessary to ignore certain files in your feature repository by using a .feastignore file." | ||
| if case_insensitive_fv_name in fv_by_name: |
There was a problem hiding this comment.
_validate_feature_views() in feature_store.py uses case-insensitive names (fv.name.lower()), while
_ensure_feature_view_name_is_unique() relies on the registry getters, which are typically case-sensitive. Worth aligning the behavior.
There was a problem hiding this comment.
@ntkathole Thanks for your comment.
On trying to work on that, I have a found an inconsistency (might be a potential issue).
I'm holding my progress on above issue, till I get clarified on this.
In current Feast master, FeatureStore.apply() behaves as an incremental upsert (not a full registry refresh): it validates only the objects in the current request and writes those objects, while existing unrelated records remain untouched. Because of this, duplicate-name protection is incomplete for previously registered objects—especially when names differ only by case (for example, existing driver and new Driver). The in-request validation checks only the incoming payload, and the SQL registry uniqueness constraints are per table and case-sensitive by default, so this scenario can pass and create ambiguous registry state. Expected behavior (per my understanding) is case-insensitive uniqueness for feature view names, so this appears to be a gap.
There was a problem hiding this comment.
@Prathap-P I think it's fine if you want to handle it separately and we keep scope of this PR for sql registry duplicate names
There was a problem hiding this comment.
@ntkathole Solving the test cases took a lot of time, finally was able to fix it. Do you have any other points to discuss?
There was a problem hiding this comment.
I know Devin is reporting these changes are not implemented in snowflake, but as you said to scope this pr to sql, ignoring it...
4c5be75 to
a2a0e29
Compare
c2acd43 to
7404904
Compare
786b948 to
86781e9
Compare
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Cross-type uniqueness check missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272)
The PR adds _ensure_feature_view_name_is_unique to BaseRegistry and calls it from SqlRegistry.apply_feature_view (sdk/python/feast/infra/registry/sql.py:580), but the same call is missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272). This means the "defense-in-depth" cross-type name conflict check is not enforced when users use the Snowflake registry and directly call apply_feature_view.
Incomplete application of fix across registry implementations
The SnowflakeRegistry.apply_feature_view at sdk/python/feast/infra/registry/snowflake.py:261-272 does not call self._ensure_feature_view_name_is_unique(feature_view, project) before applying the feature view:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
fv_table_str = self._infer_fv_table(feature_view)
fv_column_name = fv_table_str[:-1]
return self._apply_object(...)Compare with the updated SqlRegistry.apply_feature_view at sdk/python/feast/infra/registry/sql.py:577-585 which does call it:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
self._ensure_feature_view_name_is_unique(feature_view, project) # added
fv_table = self._infer_fv_table(feature_view)
...Impact: Snowflake registry users who directly call apply_feature_view (bypassing feast apply) can still register a FeatureView and a StreamFeatureView with the same name, leading to the same silent data correctness bug (issue #5995) that this PR intends to fix.
View 8 additional findings in Devin Review.
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Cross-type uniqueness check missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272)
The PR adds _ensure_feature_view_name_is_unique to BaseRegistry and calls it from SqlRegistry.apply_feature_view (sdk/python/feast/infra/registry/sql.py:580), but the same call is missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272). This means the "defense-in-depth" cross-type name conflict check is not enforced when users use the Snowflake registry and directly call apply_feature_view.
Incomplete application of fix across registry implementations
The SnowflakeRegistry.apply_feature_view at sdk/python/feast/infra/registry/snowflake.py:261-272 does not call self._ensure_feature_view_name_is_unique(feature_view, project) before applying the feature view:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
fv_table_str = self._infer_fv_table(feature_view)
fv_column_name = fv_table_str[:-1]
return self._apply_object(...)Compare with the updated SqlRegistry.apply_feature_view at sdk/python/feast/infra/registry/sql.py:577-585 which does call it:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
self._ensure_feature_view_name_is_unique(feature_view, project) # added
fv_table = self._infer_fv_table(feature_view)
...Impact: Snowflake registry users who directly call apply_feature_view (bypassing feast apply) can still register a FeatureView and a StreamFeatureView with the same name, leading to the same silent data correctness bug (issue #5995) that this PR intends to fix.
View 10 additional findings in Devin Review.
d2972de to
c299a71
Compare
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Cross-type uniqueness check missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272)
The PR adds _ensure_feature_view_name_is_unique to BaseRegistry and calls it from SqlRegistry.apply_feature_view (sdk/python/feast/infra/registry/sql.py:580), but the same call is missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272). This means the "defense-in-depth" cross-type name conflict check is not enforced when users use the Snowflake registry and directly call apply_feature_view.
Incomplete application of fix across registry implementations
The SnowflakeRegistry.apply_feature_view at sdk/python/feast/infra/registry/snowflake.py:261-272 does not call self._ensure_feature_view_name_is_unique(feature_view, project) before applying the feature view:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
fv_table_str = self._infer_fv_table(feature_view)
fv_column_name = fv_table_str[:-1]
return self._apply_object(...)Compare with the updated SqlRegistry.apply_feature_view at sdk/python/feast/infra/registry/sql.py:577-585 which does call it:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
self._ensure_feature_view_name_is_unique(feature_view, project) # added
fv_table = self._infer_fv_table(feature_view)
...Impact: Snowflake registry users who directly call apply_feature_view (bypassing feast apply) can still register a FeatureView and a StreamFeatureView with the same name, leading to the same silent data correctness bug (issue #5995) that this PR intends to fix.
View 13 additional findings in Devin Review.
b7e9e68 to
eac7fe0
Compare
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Cross-type uniqueness check missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272)
The PR adds _ensure_feature_view_name_is_unique to BaseRegistry and calls it from SqlRegistry.apply_feature_view (sdk/python/feast/infra/registry/sql.py:580), but the same call is missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272). This means the "defense-in-depth" cross-type name conflict check is not enforced when users use the Snowflake registry and directly call apply_feature_view.
Incomplete application of fix across registry implementations
The SnowflakeRegistry.apply_feature_view at sdk/python/feast/infra/registry/snowflake.py:261-272 does not call self._ensure_feature_view_name_is_unique(feature_view, project) before applying the feature view:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
fv_table_str = self._infer_fv_table(feature_view)
fv_column_name = fv_table_str[:-1]
return self._apply_object(...)Compare with the updated SqlRegistry.apply_feature_view at sdk/python/feast/infra/registry/sql.py:577-585 which does call it:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
self._ensure_feature_view_name_is_unique(feature_view, project) # added
fv_table = self._infer_fv_table(feature_view)
...Impact: Snowflake registry users who directly call apply_feature_view (bypassing feast apply) can still register a FeatureView and a StreamFeatureView with the same name, leading to the same silent data correctness bug (issue #5995) that this PR intends to fix.
View 12 additional findings in Devin Review.
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Cross-type uniqueness check missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272)
The PR adds _ensure_feature_view_name_is_unique to BaseRegistry and calls it from SqlRegistry.apply_feature_view (sdk/python/feast/infra/registry/sql.py:580), but the same call is missing from SnowflakeRegistry.apply_feature_view (sdk/python/feast/infra/registry/snowflake.py:261-272). This means the "defense-in-depth" cross-type name conflict check is not enforced when users use the Snowflake registry and directly call apply_feature_view.
Incomplete application of fix across registry implementations
The SnowflakeRegistry.apply_feature_view at sdk/python/feast/infra/registry/snowflake.py:261-272 does not call self._ensure_feature_view_name_is_unique(feature_view, project) before applying the feature view:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
fv_table_str = self._infer_fv_table(feature_view)
fv_column_name = fv_table_str[:-1]
return self._apply_object(...)Compare with the updated SqlRegistry.apply_feature_view at sdk/python/feast/infra/registry/sql.py:577-585 which does call it:
def apply_feature_view(
self, feature_view: BaseFeatureView, project: str, commit: bool = True
):
self._ensure_feature_view_name_is_unique(feature_view, project) # added
fv_table = self._infer_fv_table(feature_view)
...Impact: Snowflake registry users who directly call apply_feature_view (bypassing feast apply) can still register a FeatureView and a StreamFeatureView with the same name, leading to the same silent data correctness bug (issue #5995) that this PR intends to fix.
View 13 additional findings in Devin Review.
Signed-off-by: Prathap P <436prathap@gmail.com>
Signed-off-by: Prathap P <436prathap@gmail.com>
Signed-off-by: Prathap P <436prathap@gmail.com>
Signed-off-by: Prathap P <436prathap@gmail.com>
7a20e7b to
d7bac33
Compare
What this PR does / why we need it:
get_online_featuresresolved feature view names viaget_any_feature_view, which checks registry tables in a fixed order: FeatureView → StreamFeatureView → OnDemandFeatureView.Which issue(s) this PR fixes:
Fixes #5995
Misc