Conversation
There was a problem hiding this comment.
Thanks for the PR, @HackyRoot , highly appreciate your effort into this feature, it has great potential to add support for many file formats! I left some comments here and there, take a look and let me know what you think.
Oh also note that we are using Ruff for formatting the files, thats why the Lint action is failing. To resolve this, just do pip install pre-commit and then pre-commit install before commiting any changes
|
pre-commit.ci run |
for more information, see https://pre-commit.ci
|
@daavoo seems the pre-commit CI is not working properly? |
It is because the remaining errors can't be auto-fixed. If load_pdf and load_docx are not used in DATA_LOADERS, they should not be imported |
…oad_file function
…ument-to-podcast into markitdown_support
… markitdown_support
for more information, see https://pre-commit.ci
…oad_file function
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
814e263 to
5087aab
Compare
for more information, see https://pre-commit.ci
…ument-to-podcast into markitdown_support
|
@daavoo any idea why do I have to merge the remote branch before pushing? |
The branch in your fork is missing the latest changes from You can undo the last merge you pushed and then just |
|
@HackyRoot - As discussed, going to close this PR and instead we will feature your fork as an extension on the Blueprints Hub. As you suggested, we're going to start tagging certain Issues as 'extensions' and will get feedback from users. |
What's changing
.pdfand.htmlfiles.todata_loadersand 'markdown_to_texttodata_cleaners.load_filewhich uses MarkItDown.data_loadersanddata_cleanersfunctions.Closes #66
How to test it
Steps to test the changes:
Additional notes for reviewers
The changes includes support for the MarkItDown package. The Streamlit app has been updated to use the new
load_fileandmarkdown_to_textfunctions.I already...
/docs)