Commit 66bf4b0
authored
feat: support extracting image url in html (#3955)
also removes mimetype when base64 is not included in image metadata
---------
Co-authored-by: ryannikolaidis <[email protected]>1 parent 2dceac3 commit 66bf4b0
File tree
18 files changed
+63
-8
lines changed- example-docs
- test_unstructured_ingest
- expected-structured-output
- confluence-diff
- MFS
- testteamsp
- notion
- salesforce/EmailMessage
- src
- test_unstructured/partition
- html
- unstructured
- documents
- partition/html
18 files changed
+63
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
1 | 11 | | |
2 | 12 | | |
3 | 13 | | |
| |||
File renamed without changes.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
335 | 335 | | |
336 | 336 | | |
337 | 337 | | |
338 | | - | |
339 | 338 | | |
340 | 339 | | |
| 340 | + | |
341 | 341 | | |
342 | 342 | | |
| 343 | + | |
343 | 344 | | |
344 | 345 | | |
345 | 346 | | |
346 | | - | |
| 347 | + | |
347 | 348 | | |
348 | 349 | | |
349 | 350 | | |
350 | 351 | | |
351 | | - | |
| 352 | + | |
352 | 353 | | |
353 | 354 | | |
354 | 355 | | |
355 | 356 | | |
356 | 357 | | |
357 | 358 | | |
358 | 359 | | |
359 | | - | |
| 360 | + | |
360 | 361 | | |
361 | 362 | | |
362 | 363 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
632 | 632 | | |
633 | 633 | | |
634 | 634 | | |
635 | | - | |
| 635 | + | |
636 | 636 | | |
637 | 637 | | |
638 | 638 | | |
639 | 639 | | |
640 | 640 | | |
641 | 641 | | |
642 | 642 | | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
643 | 650 | | |
644 | 651 | | |
645 | 652 | | |
| |||
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
| 58 | + | |
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
| |||
236 | 237 | | |
237 | 238 | | |
238 | 239 | | |
| 240 | + | |
239 | 241 | | |
240 | 242 | | |
241 | 243 | | |
| |||
326 | 328 | | |
327 | 329 | | |
328 | 330 | | |
| 331 | + | |
329 | 332 | | |
330 | 333 | | |
331 | 334 | | |
| |||
416 | 419 | | |
417 | 420 | | |
418 | 421 | | |
| 422 | + | |
419 | 423 | | |
420 | 424 | | |
421 | 425 | | |
| |||
500 | 504 | | |
501 | 505 | | |
502 | 506 | | |
| 507 | + | |
503 | 508 | | |
504 | 509 | | |
505 | 510 | | |
| |||
659 | 664 | | |
660 | 665 | | |
661 | 666 | | |
| 667 | + | |
662 | 668 | | |
663 | 669 | | |
664 | 670 | | |
| |||
755 | 761 | | |
756 | 762 | | |
757 | 763 | | |
| 764 | + | |
758 | 765 | | |
759 | 766 | | |
760 | 767 | | |
| |||
776 | 783 | | |
777 | 784 | | |
778 | 785 | | |
| 786 | + | |
779 | 787 | | |
780 | 788 | | |
781 | 789 | | |
| |||
905 | 913 | | |
906 | 914 | | |
907 | 915 | | |
| 916 | + | |
908 | 917 | | |
909 | 918 | | |
910 | 919 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| 331 | + | |
331 | 332 | | |
332 | 333 | | |
333 | 334 | | |
| |||
412 | 413 | | |
413 | 414 | | |
414 | 415 | | |
| 416 | + | |
415 | 417 | | |
416 | 418 | | |
417 | 419 | | |
| |||
496 | 498 | | |
497 | 499 | | |
498 | 500 | | |
| 501 | + | |
499 | 502 | | |
500 | 503 | | |
501 | 504 | | |
| |||
766 | 769 | | |
767 | 770 | | |
768 | 771 | | |
| 772 | + | |
769 | 773 | | |
770 | 774 | | |
771 | 775 | | |
| |||
835 | 839 | | |
836 | 840 | | |
837 | 841 | | |
| 842 | + | |
838 | 843 | | |
839 | 844 | | |
840 | 845 | | |
| |||
904 | 909 | | |
905 | 910 | | |
906 | 911 | | |
| 912 | + | |
907 | 913 | | |
908 | 914 | | |
909 | 915 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| 331 | + | |
331 | 332 | | |
332 | 333 | | |
333 | 334 | | |
| |||
412 | 413 | | |
413 | 414 | | |
414 | 415 | | |
| 416 | + | |
415 | 417 | | |
416 | 418 | | |
417 | 419 | | |
| |||
496 | 498 | | |
497 | 499 | | |
498 | 500 | | |
| 501 | + | |
499 | 502 | | |
500 | 503 | | |
501 | 504 | | |
| |||
766 | 769 | | |
767 | 770 | | |
768 | 771 | | |
| 772 | + | |
769 | 773 | | |
770 | 774 | | |
771 | 775 | | |
| |||
835 | 839 | | |
836 | 840 | | |
837 | 841 | | |
| 842 | + | |
838 | 843 | | |
839 | 844 | | |
840 | 845 | | |
| |||
904 | 909 | | |
905 | 910 | | |
906 | 911 | | |
| 912 | + | |
907 | 913 | | |
908 | 914 | | |
909 | 915 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
321 | 321 | | |
322 | 322 | | |
323 | 323 | | |
| 324 | + | |
324 | 325 | | |
325 | 326 | | |
326 | 327 | | |
| |||
802 | 803 | | |
803 | 804 | | |
804 | 805 | | |
| 806 | + | |
805 | 807 | | |
806 | 808 | | |
807 | 809 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
321 | 321 | | |
322 | 322 | | |
323 | 323 | | |
| 324 | + | |
324 | 325 | | |
325 | 326 | | |
326 | 327 | | |
| |||
802 | 803 | | |
803 | 804 | | |
804 | 805 | | |
| 806 | + | |
805 | 807 | | |
806 | 808 | | |
807 | 809 | | |
| |||
0 commit comments