1- Unstructured supports processing of the following file types:
1+ The Unstructured user interface (UI) and Unstructured API support processing of the following file types:
22
33By file extension:
44
@@ -8,19 +8,14 @@ By file extension:
88| ` .bmp ` |
99| ` .csv ` |
1010| ` .cwk ` |
11- | ` .dbf ` |
12- | ` .dif ` |
11+ | ` .dif ` [ * ] ( #notes ) |
1312| ` .doc ` |
14- | ` .docm ` |
1513| ` .docx ` |
1614| ` .dot ` |
17- | ` .dotm ` |
1815| ` .eml ` |
1916| ` .epub ` |
2017| ` .et ` |
2118| ` .eth ` |
22- | ` .fods ` |
23- | ` .gif ` |
2419| ` .heic ` |
2520| ` .htm ` |
2621| ` .html ` |
@@ -29,66 +24,57 @@ By file extension:
2924| ` .jpg ` |
3025| ` .md ` |
3126| ` .mcw ` |
27+ | ` .msg ` |
3228| ` .mw ` |
33- | ` .odt ` |
3429| ` .org ` |
3530| ` .p7s ` |
36- | ` .pages ` |
3731| ` .pbd ` |
3832| ` .pdf ` |
3933| ` .png ` |
4034| ` .pot ` |
41- | ` .potm ` |
4235| ` .ppt ` |
4336| ` .pptm ` |
4437| ` .pptx ` |
4538| ` .prn ` |
4639| ` .rst ` |
4740| ` .rtf ` |
4841| ` .sdp ` |
49- | ` .sgl ` |
5042| ` .svg ` |
5143| ` .sxg ` |
5244| ` .tiff ` |
5345| ` .txt ` |
5446| ` .tsv ` |
55- | ` .uof ` |
56- | ` .uos1 ` |
57- | ` .uos2 ` |
58- | ` .web ` |
59- | ` .webp ` |
60- | ` .wk2 ` |
6147| ` .xls ` |
62- | ` .xlsb ` |
6348| ` .xlsm ` |
6449| ` .xlsx ` |
65- | ` .xlw ` |
6650| ` .xml ` |
6751| ` .zabw ` |
6852
6953By file type:
7054
7155| Category | File types |
7256| --- | --- |
73- | Apple | ` .cwk ` , ` .mcw ` , ` .pages `
57+ | Apple | ` .cwk ` , ` .mcw `
7458| CSV | ` .csv ` |
75- | Data interchange | ` .dif ` |
76- | dBase | ` .dbf ` |
77- | E-mail | ` .eml ` , ` .p7s ` |
59+ | E-mail | ` .eml ` , ` .msg ` , ` .p7s ` |
7860| EPUB | ` .epub ` |
7961| HTML | ` .htm ` , ` .html ` |
80- | Image | ` .bmp ` , ` .gif ` , ` . heic` , ` .jpeg ` , ` .jpg ` , ` .png ` , ` .prn ` , ` .svg ` , ` .tiff ` , ` .webp ` |
62+ | Image | ` .bmp ` , ` .heic ` , ` .jpeg ` , ` .jpg ` , ` .png ` , ` .prn ` , ` .svg ` , ` .tiff ` |
8163| Markdown | ` .md ` |
8264| Org Mode | ` .org ` |
83- | Open Office | ` .odt ` , ` .sgl ` |
84- | Other | ` .eth ` , ` .mw ` , ` .pbd ` , ` .sdp ` , ` .uof ` , ` .web ` |
65+ | Other | ` .dif ` [ * ] ( #notes ) , ` .eth ` , ` .mw ` , ` .pbd ` , ` .sdp ` |
8566| PDF | ` .pdf ` |
8667| Plain text | ` .txt ` |
87- | PowerPoint | ` .pot ` , ` .potm ` , ` . ppt` , ` .pptm ` , ` .pptx ` |
68+ | PowerPoint | ` .pot ` , ` .ppt ` , ` .pptm ` , ` .pptx ` |
8869| reStructured Text | ` .rst ` |
8970| Rich Text | ` .rtf ` |
90- | Spreadsheet | ` .et ` , ` .fods ` , ` .uos1 ` , ` .uos2 ` , ` .wk2 ` , ` . xls` , ` .xlsb ` , ` . xlsm` , ` .xlsx ` , ` .xlw ` |
71+ | Spreadsheet | ` .et ` , ` .xls ` , ` .xlsm ` , ` .xlsx ` |
9172| StarOffice | ` .sxg ` |
9273| TSV | ` .tsv ` |
93- | Word processing | ` .abw ` , ` .doc ` , ` .docm ` , ` . docx` , ` .dot ` , ` .dotm ` , ` .hwp ` , ` .zabw ` |
74+ | Word processing | ` .abw ` , ` .doc ` , ` .docx ` , ` .dot ` , ` .hwp ` , ` .zabw ` |
9475| XML | ` .xml ` |
76+
77+ ## Notes
78+
79+ * For ` .dif ` , ` \n ` characters in ` .dif ` files are supported, but ` \r\n ` characters will raise the error
80+ ` UnsupportedFileFormatError: Partitioning is not supported for the FileType.UNK file type ` .
0 commit comments