Skip to content

πŸ› Bug: CountEmbedder raises ColumnNotFoundError even when feature column existsΒ #536

@regislon

Description

@regislon

Describe the bug

Using CountEmbedder.transform(...) raises a ColumnNotFoundError on the expected output feature (e.g. "accident_occurred") despite it being present in the features_gdf and correctly indexed by feature_id.

Code :

joint_gdf = IntersectionJoiner().transform(gdf_h3, gdf_accidents)

count_embeddings = CountEmbedder(
    count_subcategories=False,
    expected_output_features=["accident_occurred"]
).transform(gdf_h3, gdf_accidents, joint_gdf)

Error message:

---> FAILED HERE RESOLVING 'sink' <---
DF ["accident_occurred"]; PROJECT */1 COLUMNS

It seems there may be an issue in srai/embedders/count_embedder.py:163, or perhaps I didn't define the columns or index as expected. Any input would be appreciated.

Indexes and columns names

print(gdf_accidents.index.name)       
print(gdf_accidents.columns)
print(gdf_h3.index.name)
print(gdf_h3.columns)

returns

feature_id
Index(['geometry', 'accident_occurred'], dtype='object')
region_id
Index(['geometry', 'type_of_surface', 'roundabout', 'traffic_significance',
       'object_type', 'engineering_structure'],
      dtype='object')

Environment:

  • Python: 3.12
  • srai version: 0.9.7
  • Polars: 1.31.0
  • pyarrow : 15.0.0
  • OS: macOS

Full error desciption :

---------------------------------------------------------------------------
ColumnNotFoundError                       Traceback (most recent call last)
Cell In[11], [line 19](vscode-notebook-cell:?execution_count=11&line=19)
     10 print(gdf_h3.columns)
     14 joint_gdf = IntersectionJoiner().transform(gdf_h3, gdf_accidents)
     16 count_embeddings = CountEmbedder(
     17     count_subcategories=False,
     18     expected_output_features=["accident_occurred"]
---> [19](vscode-notebook-cell:?execution_count=11&line=19) ).transform(gdf_h3, gdf_accidents, joint_gdf)

File ~/Documents/Projects/EPFL_apply_data_sciences/capstone-project-adsml-ibex-c5-s2-8710-4645/.venv/lib/python3.12/site-packages/srai/embedders/count_embedder.py:163, in CountEmbedder.transform(self, regions_gdf, features_gdf, joint_gdf)
    147 region_embeddings = joint_with_encodings.drop(FEATURES_INDEX).group_by(REGIONS_INDEX).sum()
    148 region_embeddings, feature_columns = self._maybe_filter_to_expected_features(
    149     region_embeddings, feature_columns
    150 )
    152 region_embeddings_df = (
    153     (
    154         regions_df.join(region_embeddings, on=REGIONS_INDEX, how="left")
    155         .fill_null(0)
    156         .with_columns(
    157             [
    158                 pl.col(REGIONS_INDEX),
    159                 *(pl.col(col).cast(pl.Int32) for col in feature_columns),
    160             ]
    161         )
...

Resolved plan until failure:

	---> FAILED HERE RESOLVING 'sink' <---
DF ["accident_occurred"]; PROJECT */1 COLUMNS`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions