-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
Describe the bug
Using CountEmbedder.transform(...) raises a ColumnNotFoundError on the expected output feature (e.g. "accident_occurred") despite it being present in the features_gdf and correctly indexed by feature_id.
Code :
joint_gdf = IntersectionJoiner().transform(gdf_h3, gdf_accidents)
count_embeddings = CountEmbedder(
count_subcategories=False,
expected_output_features=["accident_occurred"]
).transform(gdf_h3, gdf_accidents, joint_gdf)Error message:
---> FAILED HERE RESOLVING 'sink' <---
DF ["accident_occurred"]; PROJECT */1 COLUMNS
It seems there may be an issue in srai/embedders/count_embedder.py:163, or perhaps I didn't define the columns or index as expected. Any input would be appreciated.
Indexes and columns names
print(gdf_accidents.index.name)
print(gdf_accidents.columns)
print(gdf_h3.index.name)
print(gdf_h3.columns)returns
feature_id
Index(['geometry', 'accident_occurred'], dtype='object')
region_id
Index(['geometry', 'type_of_surface', 'roundabout', 'traffic_significance',
'object_type', 'engineering_structure'],
dtype='object')Environment:
- Python: 3.12
sraiversion: 0.9.7- Polars: 1.31.0
- pyarrow : 15.0.0
- OS: macOS
Full error desciption :
---------------------------------------------------------------------------
ColumnNotFoundError Traceback (most recent call last)
Cell In[11], [line 19](vscode-notebook-cell:?execution_count=11&line=19)
10 print(gdf_h3.columns)
14 joint_gdf = IntersectionJoiner().transform(gdf_h3, gdf_accidents)
16 count_embeddings = CountEmbedder(
17 count_subcategories=False,
18 expected_output_features=["accident_occurred"]
---> [19](vscode-notebook-cell:?execution_count=11&line=19) ).transform(gdf_h3, gdf_accidents, joint_gdf)
File ~/Documents/Projects/EPFL_apply_data_sciences/capstone-project-adsml-ibex-c5-s2-8710-4645/.venv/lib/python3.12/site-packages/srai/embedders/count_embedder.py:163, in CountEmbedder.transform(self, regions_gdf, features_gdf, joint_gdf)
147 region_embeddings = joint_with_encodings.drop(FEATURES_INDEX).group_by(REGIONS_INDEX).sum()
148 region_embeddings, feature_columns = self._maybe_filter_to_expected_features(
149 region_embeddings, feature_columns
150 )
152 region_embeddings_df = (
153 (
154 regions_df.join(region_embeddings, on=REGIONS_INDEX, how="left")
155 .fill_null(0)
156 .with_columns(
157 [
158 pl.col(REGIONS_INDEX),
159 *(pl.col(col).cast(pl.Int32) for col in feature_columns),
160 ]
161 )
...
Resolved plan until failure:
---> FAILED HERE RESOLVING 'sink' <---
DF ["accident_occurred"]; PROJECT */1 COLUMNS`Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels