-
Notifications
You must be signed in to change notification settings - Fork 0
[DEVX-454]: Added Support for Docx & Markdown in Data Ingestion Pipeline #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEVX-454]: Added Support for Docx & Markdown in Data Ingestion Pipeline #32
Conversation
mogith-pn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added one comment. Rest looks good.
| llama-index-core==0.10.33 | ||
| llama-index-llms-clarifai==0.1.2 | ||
| pi_heif==0.18.0 | ||
| markdown==3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be added here. This is redundant though. @sanjaychelliah please give your suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the forked repo's installation( unstructured[pdf] @ git+https://github.com/clarifai/unstructured.git@support_clarifai_model) not importing the required libraries for DOCX and MD, we should add these, otherwise no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not imprting the required libraries during forked repo's installation. So, need to add these libraries
sanjaychelliah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the one comment.
Even though that is not part of this PR, we can quickly debug, based on the demo call discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline names are added as concepts in the platform, That should be debugged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes done with additional changes in clarifai-python repo - Clarifai/clarifai-python#471
| llama-index-core==0.10.33 | ||
| llama-index-llms-clarifai==0.1.2 | ||
| pi_heif==0.18.0 | ||
| markdown==3.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the forked repo's installation( unstructured[pdf] @ git+https://github.com/clarifai/unstructured.git@support_clarifai_model) not importing the required libraries for DOCX and MD, we should add these, otherwise no.
sanjaychelliah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Added support for Docx and Markdown formats in Data Ingestion Pipelines with unstructured library
Ref - https://clarifai.atlassian.net/browse/DEVX-454