-
Notifications
You must be signed in to change notification settings - Fork 199
feat: mistral ocr converter #2376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: mistral ocr converter #2376
Conversation
Updated example usage in MCPTool documentation to reflect Streamable HTTP usage and mentioned deprecated SSE.
Updated documentation to reflect changes in connection types for MCPToolset.
docs: recommend Streamable HTTP instead of SSE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I have left some initial comments about the implementation. In general, I would make this component more similar to the AzureOCRDocumentConverter.
Other points you mentioned:
- It is OK to use Mistral SDK
- We want to keep Python 3.9 compatibility for the moment, so let's use
Union. - Once we agree on the main aspects of the implementation, let's add unit and integration tests.
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Outdated
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Outdated
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Outdated
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Outdated
Show resolved
Hide resolved
…integrations into add-mistral-ocr
|
I guess I have currently the last issue that I cant run hatch run fmt, because then my Will become: "haystack_integrations.components.converters.mistral.ocr_document_converter.MistralOCRDocumentConverter" And then three lines exceeding line limit 125>120 |
Co-authored-by: Stefano Fiorucci <[email protected]>
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Outdated
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Outdated
Show resolved
Hide resolved
...ns/mistral/src/haystack_integrations/components/converters/mistral/ocr_document_converter.py
Show resolved
Hide resolved
…erters/mistral/ocr_document_converter.py Co-authored-by: Stefano Fiorucci <[email protected]>
…integrations into add-mistral-ocr
Co-authored-by: Stefano Fiorucci <[email protected]>
Co-authored-by: Stefano Fiorucci <[email protected]>
…integrations into add-mistral-ocr
|
Hey @Hansehart, I've been a bit busy... I'll review it again soon |
|
Take your time. I have no deadline on this feature. Instead I realy appreciate your valulable feedback. It makes me really fun to contribute. |
|
Please update https://github.com/deepset-ai/haystack-core-integrations/blob/main/README.md In general, we are close to merging this PR. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran integration tests locally (they don't run on PRs from forks) and all works well.
Thank you!
Related Issues
Proposed Changes:
This PR adds a new OCR document converter component for the Mistral integration in Haystack.
The MistralOCRDocumentConverter uses Mistral’s Document AI / OCR API to extract text and structured annotations from documents and images.
Key features:
How did you test it?
Notes for the reviewer
There are definetly some things missing. Its works great on my structure, however needs adjustments for haystack. Including:
haystack_integrations.components.converters.mistral.ocr_document_converterwhich can not be resolved|insteadOptional[]) its for python>=3.10 but can be adjustedChecklist
- I added unit tests and updated the docstringsfix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.