Skip to content

Conversation

@srikanthbachala20
Copy link
Contributor

Added support for Docx and Markdown formats in Data Ingestion Pipelines with unstructured library
Ref - https://clarifai.atlassian.net/browse/DEVX-454

Copy link
Contributor

@mogith-pn mogith-pn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added one comment. Rest looks good.

llama-index-core==0.10.33
llama-index-llms-clarifai==0.1.2
pi_heif==0.18.0
markdown==3.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be added here. This is redundant though. @sanjaychelliah please give your suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the forked repo's installation( unstructured[pdf] @ git+https://github.com/clarifai/unstructured.git@support_clarifai_model) not importing the required libraries for DOCX and MD, we should add these, otherwise no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not imprting the required libraries during forked repo's installation. So, need to add these libraries

Copy link
Contributor

@sanjaychelliah sanjaychelliah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the one comment.
Even though that is not part of this PR, we can quickly debug, based on the demo call discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipeline names are added as concepts in the platform, That should be debugged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes done with additional changes in clarifai-python repo - Clarifai/clarifai-python#471

llama-index-core==0.10.33
llama-index-llms-clarifai==0.1.2
pi_heif==0.18.0
markdown==3.7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the forked repo's installation( unstructured[pdf] @ git+https://github.com/clarifai/unstructured.git@support_clarifai_model) not importing the required libraries for DOCX and MD, we should add these, otherwise no.

Copy link
Contributor

@sanjaychelliah sanjaychelliah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@srikanthbachala20 srikanthbachala20 merged commit 13cf2e7 into main Jan 2, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants