You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: open-source/introduction/overview.mdx
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,17 +46,22 @@ The Unstructured open source library has the following limits as compared to the
46
46
47
47
* Not designed for production scenarios.
48
48
* Significantly decreased performance on document and table extraction.
49
-
*Access only to older and less sophisticated vision transformer models.
49
+
*No access to Unstructured's latest vision language model (VLM) offerings.
50
50
* No access to Unstructured's fine-tuned OCR models.
51
51
* No access to Unstructured's by-page and by-similarity chunking strategies.
52
-
* Lack of security and SOC2 and HIPAA compliance.
53
-
* No authentication or identity management.
52
+
* No support for generating embeddings in the core [Unstructured](https://github.com/Unstructured-IO/unstructured) open source offering. (However, you can
53
+
generate embeddings as a separate step manually. [Learn how](/open-source/core-functionality/embedding). Also, there is built-in support for generating embeddings by using the open source's
54
+
[Unstructured Ingest CLI](/open-source/ingestion/ingest-cli) and [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) offerings.
55
+
[Learn more](/open-source/how-to/embedding).)
56
+
* No support for Unstructured's enrichment types such as image descriptions, table descriptions, and named entity recognition (NER).
57
+
* Lack of support for SOC2 Type 2, HIPAA, and GDPR compliance.
58
+
* No authentication or identity management in the core open source offering for local document processing.
54
59
* No incremental data loading.
55
60
* No ETL job scheduling or monitoring.
56
61
* No image extraction from documents.
57
62
* Less sophisticated document hierarchy detection.
58
63
* You must manage many of your own code dependencies, for instance for libraries such as Poppler and Tesseract.
59
-
*You must manage your own infrastructure, including parallelization and other performance optimizations.
64
+
*For local document processing, you must manage your own infrastructure, including parallelization and other performance optimizations.
0 commit comments