Skip to content

feat: add regexp metadata extraction step#39

Open
NoeFlandre wants to merge 3 commits intotimothepearce:mainfrom
NoeFlandre:feat/metadata-regexp
Open

feat: add regexp metadata extraction step#39
NoeFlandre wants to merge 3 commits intotimothepearce:mainfrom
NoeFlandre:feat/metadata-regexp

Conversation

@NoeFlandre
Copy link

Description

This PR introduces a new regexp method for the metadata step, allowing users to extract metadata using regular expressions.

It also includes a bug fix in synda/model/step.py to correctly handle Annotated types using TypeAdapter during step configuration validation.

Changes

  • Config: Added Regexp and RegexpParameters models.
  • Pipeline: Implemented Regexp executor with regex search logic and progress tracking.
  • Bug Fix: Updated Step.get_step_config to use TypeAdapter for metadata steps.

Verification

Verified with a manual smoke test using a CSV source and regex pattern for emails.

  • Successfully extracted email addresses with correct start/end offsets.
  • Verified progress bar UI and CSV output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant