-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
enhancementNew feature or requestNew feature or request
Description
β¨ Feature Summary
Implement a multimodal parser that supports multiple modes (w/ & w/o LLM powered). With such parser, quantmind can easily convert the video / audio / images / text / file object to the ParserResult
# The mock parser result
class ParserResult(BaseModel):
text: str
images: Dict[str, bytes]@bridgeqiqi I'd love for you to add the remaining parts! β¨ (feel free to remove useless parts)
π― Motivation
π Detailed Description
π§ Proposed Implementation
API Design
# If applicable, show how you envision the API would lookConfiguration
# If applicable, show any new configuration optionsπ¨ User Experience
π Use Cases
- Use Case 1:
- Use Case 2:
- Use Case 3:
π Related Issues
- Relates to #
- Blocks #
- Depends on #
Implementation Considerations
Breaking Changes
- This feature would introduce breaking changes
- This feature is backward compatible
Dependencies
- Requires new dependencies
- Uses existing dependencies only
Checklist
- I have searched existing issues to avoid duplicates
- I have provided a clear and detailed description
- I have explained the motivation and use cases
- I have considered the implementation approach
- I have thought about potential breaking changes
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request