Add structure extraction vlm#279
Conversation
…using Vision Language Models - Updated the table of contents to include a new section for "Structured Generation from Documents Using Vision Language Models". - Added a new Jupyter notebook that demonstrates how to extract structured information from documents using the SmolVLM-500M-Instruct model, including installation instructions, model initialization, and example usage.
…age Models - Introduced a new notebook demonstrating the extraction of structured information from documents using the SmolVLM-500M-Instruct model. - Included installation instructions, model initialization, and example usage with a focus on generating structured tags and confidence scores from images. - Added detailed markdown explanations for each step of the process.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…uage Models - Updated the description to clarify the use of the SmolVLM-Instruct model and its integration with the HuggingFace Transformers and Outlines libraries. - Added a reference to an outlines tutorial for better guidance. - Modified the installation command to remove the Gradio library, streamlining the dependencies.
stevhliu
left a comment
There was a problem hiding this comment.
This is nice, thanks for your contribution! I think it can be even better if you demonstrate a real world end-to-end application of it though 😄
|
I agree, let me rewrite a bit for a real-life scenario, making the outlines variant more reproducible and usable while creating some visibility. I wanted to keep the effort minimal but some additional work would be great. |
…nthetic data extraction - Updated notebook to use the RLAIF-V-Dataset for structured information extraction - Implemented a function to generate synthetic questions, descriptions, and quality tags for images - Added code to push the augmented dataset to the Hugging Face Hub - Simplified the notebook's imports and removed unused code - Updated markdown sections to provide clearer context and explanation
- Corrected the filename from `structured_generation_vision_languag_models.ipynb` to `structured_generation_vision_language_models.ipynb` - Updated the index.md to reflect the corrected notebook title and link - Updated the _toctree.yml to use the corrected notebook filename
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
stevhliu
left a comment
There was a problem hiding this comment.
Awesome thanks! Just need to resolve the conflict and we should be good 😄
- Fixed a small punctuation error in the introduction paragraph - Corrected "HuggingFaceTB" to "Hugging Face"
- Expanded notebook title to clarify generation from both images and documents - Minor enhancement to improve clarity of the notebook's scope
|
Looks like the issue is resolved now! |
What does this PR do?
Fixes # (issue)
Who can review?
Feel free to tag members/contributors who may be interested in your PR.