-
Notifications
You must be signed in to change notification settings - Fork 11
Add illuminator notebook #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add illuminator notebook #6
Conversation
adb4520 to
420d143
Compare
|
|
||
|
|
||
| def analyze_pdf_with_docling(file_path) -> Dict[str, Union[int, List[Any], set]]: | ||
| def convert_pdf_with_docling(file_path: str) -> DoclingDocument: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to rename this to `convert_to_docling_document since it's not pdf specific anymore. It would also be nice for this function to have an argument on whether or not the markdown should be saved or not. I'm not sure in every case we want that.
|
@alinaryan This PR title mentions adding an illuminator notebook but there isn't on in here. Maybe the title should change to reflect the refactoring you're doing. Also if you added the illuminator into the |
974086f to
f6f8fc0
Compare
@alimaredia |
f6f8fc0 to
f276f14
Compare
|
I think its worth going through this and clearing all the outputs from the |
|
I like the little emoji's in the console output |
f276f14 to
d625f13
Compare
|
@JustinXHale LMK what you think about the UX design here |
|
This looks great @alinaryan! UX Review of Data Pre-Processing: From source PDF to SDG-ready
Reviewer NotesThe notebook provides a clear and well-structured workflow for data pre-processing. Improving the initial setup instructions and adding more descriptive inline comments in key code sections could further enhance its user experience. The use of markdown headers and logical code separation is generally good. The numbered list in the introduction serves well as a table of contents, guiding the user through the notebook's flow. Improvement to the "Why/goal" before each section could be improved, similar to what is done in chunking. |
| " try:\n", | ||
| " generate_summary(results)\n", | ||
| " finally:\n", | ||
| " sys.stdout = original_stdout\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alinaryan when I ran through the notebook I wasn't seeing any output in the illuminator_readable_summary.txt. Any idea why this might be happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated, the full output now prints to the summary file and the notebook cell output
7b00ea8 to
1f1cd45
Compare
1f1cd45 to
0243883
Compare
| "\n", | ||
| "***" | ||
| ] | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it back!
notebooks/instructlab-knowledge/utils/illuminator/illuminator.py
Outdated
Show resolved
Hide resolved
notebooks/instructlab-knowledge/utils/illuminator/illuminator.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Alina Ryan <[email protected]>
385f1b1 to
d06d067
Compare
Signed-off-by: Alina Ryan <[email protected]>
d06d067 to
34107d9
Compare
This change adds the Illuminator tool’s core functions to the instructlab-knowledge notebook for analyzing a converted document and summarizing merged table cell issues for each table.
Also refactor's the illuminator to accept json as input and adjusts some imports to be relative