Skip to content

feat(mlop-2760): Create metadata module (feature, feature_set, reader, writer and pipeline)#418

Merged
albjoaov merged 57 commits intostagingfrom
feat/mlop-2760
Jun 6, 2025
Merged

feat(mlop-2760): Create metadata module (feature, feature_set, reader, writer and pipeline)#418
albjoaov merged 57 commits intostagingfrom
feat/mlop-2760

Conversation

@albjoaov
Copy link
Collaborator

@albjoaov albjoaov commented May 20, 2025

Why

This change introduces a robust metadata generation system for Butterfree components. The primary motivation is to create a standardized and automated way to:

  • Document the schema of feature set pipelines and their components.
  • Facilitate serialization of pipeline metadata for various use cases, such as cataloging, lineage tracking, and integration with other tools.

This will improve the maintainability, understandability, and interoperability of feature set pipelines within the Butterfree ecosystem.

What

New metadata module:

- Introduced a new `butterfree/metadata` directory to house all Pydantic models for metadata.

Dependencies

 - `Pydantic` added to requirements

Tests

- Unit tests for EVERY behavior of new class
- Integration tests for pipeline with `FeatureSet` and with `AggregatedFeatureSet`

How

The solution refactors and centralizes metadata definition using Pydantic models within a dedicated butterfree/metadata directory.

  1. Hierarchical Metadata Construction:

    • Individual components (Feature, KeyFeature, TimestampFeature, Window, Reader subclasses, Writer) are responsible for building their own metadata Pydantic models (FeatureMetadata, ReaderMetadata variants, WriterMetadata).

    • Container classes (FeatureSet, AggregatedFeatureSet, FeatureSetPipeline) aggregate metadata from their constituent parts. For instance, FeatureSet.build_metadata() calls build_metadata() on its keys, timestamp, and then constructs metadata for its transformed features before packaging it all into a FeatureSetMetadata object.

  2. Pydantic Model Usage: Each metadata type (e.g., for a feature, a reader, a feature set) has a corresponding Pydantic model ensuring structure and validation.

@albjoaov albjoaov requested a review from a team as a code owner May 20, 2025 00:31
@homesbot
Copy link

homesbot commented May 20, 2025

🎉 Snyk checks have passed. No issues have been found so far.

code/snyk check is complete. No issues have been found. (View Details)

Copy link
Collaborator

@lecardozo lecardozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! I have some minor questions:

  • We have this butterfree/reports/metadata.py logic that is used behind the scenes for building the markdown catalog. Do you think we would benefit from that?
  • If we decide on replacing that abstraction with this pydantic-based one, I think we can maybe have each component generating its own "serialized" version so the "parent" component would call the children build_metadata method

@albjoaov albjoaov changed the title feat(mlop-2760): adding metadata from feature set pipeline feat(mlop-2760): Create metadata module (feature, feature_set, reader, writer and pipeline) May 29, 2025
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jun 6, 2025

@albjoaov albjoaov merged commit f5c12f1 into staging Jun 6, 2025
4 checks passed
albjoaov pushed a commit that referenced this pull request Jun 6, 2025
🤖 I have created a release *beep* *boop*
---


##
[1.8.0](1.7.2...1.8.0)
(2025-06-06)


### Features

* **mlop-2760:** Create metadata module (feature, feature_set, reader,
writer and pipeline)
([#418](#418))
([f5c12f1](f5c12f1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants