Skip to content

Conversation

@markpollack
Copy link
Member

The Document class previously allowed multiple media entries while also having a
text field, leading to ambiguity in content handling. This change enforces a
clear separation between text and media documents to prevent content type
confusion and simplify document processing.

A Document now must contain either text content or a single media entry, but
never both. This aligns with the class's primary use in ETL pipelines where
clear content type boundaries are essential for proper embedding generation and
vector database storage.

Additional architectural changes:

  • Document now implements a cleaner API by removing deprecated methods
  • Removed MediaContent interface implementation from Document class
  • Document.getMedia() now returns a single Media object instead of Collection
  • Removed EMPTY_TEXT constant in favor of proper null handling
  • Constructor signatures simplified and streamlined
  • Builder pattern improved to enforce single content type constraint

The breaking changes include:

  • Media is now a single entry instead of a collection
  • Content field renamed to text for clarity
  • Removed support for mixed content types
  • Simplified builder API to prevent ambiguous construction

Prefer using text-related methods over deprecated content methods to
better reflect the actual content type being handled and improve API clarity.

The Document class previously allowed multiple media entries while also having a
text field, leading to ambiguity in content handling. This change enforces a
clear separation between text and media documents to prevent content type
confusion and simplify document processing.

A Document now must contain either text content or a single media entry, but
never both. This aligns with the class's primary use in ETL pipelines where
clear content type boundaries are essential for proper embedding generation and
vector database storage.

Additional architectural changes:
- Document now implements a cleaner API by removing deprecated methods
- Removed MediaContent interface implementation from Document class
- Document.getMedia() now returns a single Media object instead of Collection
- Removed EMPTY_TEXT constant in favor of proper null handling
- Constructor signatures simplified and streamlined
- Builder pattern improved to enforce single content type constraint

The breaking changes include:
- Media is now a single entry instead of a collection
- Content field renamed to text for clarity
- Removed support for mixed content types
- Simplified builder API to prevent ambiguous construction

We prefer using text-related methods over deprecated content methods to
better reflect the actual content type being handled and improve API clarity.
@ilayaperumalg ilayaperumalg self-assigned this Dec 9, 2024
@ilayaperumalg ilayaperumalg added this to the 1.0.0-M5 milestone Dec 9, 2024
@markpollack
Copy link
Member Author

merged in dfbc394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants