Skip to content

Commit 43176cc

Browse files
authored
Open source: additional limits (#669)
1 parent 7a4e038 commit 43176cc

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

open-source/introduction/overview.mdx

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,17 +46,22 @@ The Unstructured open source library has the following limits as compared to the
4646

4747
* Not designed for production scenarios.
4848
* Significantly decreased performance on document and table extraction.
49-
* Access only to older and less sophisticated vision transformer models.
49+
* No access to Unstructured's latest vision language model (VLM) offerings.
5050
* No access to Unstructured's fine-tuned OCR models.
5151
* No access to Unstructured's by-page and by-similarity chunking strategies.
52-
* Lack of security and SOC2 and HIPAA compliance.
53-
* No authentication or identity management.
52+
* No support for generating embeddings in the core [Unstructured](https://github.com/Unstructured-IO/unstructured) open source offering. (However, you can
53+
generate embeddings as a separate step manually. [Learn how](/open-source/core-functionality/embedding). Also, there is built-in support for generating embeddings by using the open source's
54+
[Unstructured Ingest CLI](/open-source/ingestion/ingest-cli) and [Unstructured Ingest Python library](/open-source/ingestion/python-ingest) offerings.
55+
[Learn more](/open-source/how-to/embedding).)
56+
* No support for Unstructured's enrichment types such as image descriptions, table descriptions, and named entity recognition (NER).
57+
* Lack of support for SOC2 Type 2, HIPAA, and GDPR compliance.
58+
* No authentication or identity management in the core open source offering for local document processing.
5459
* No incremental data loading.
5560
* No ETL job scheduling or monitoring.
5661
* No image extraction from documents.
5762
* Less sophisticated document hierarchy detection.
5863
* You must manage many of your own code dependencies, for instance for libraries such as Poppler and Tesseract.
59-
* You must manage your own infrastructure, including parallelization and other performance optimizations.
64+
* For local document processing, you must manage your own infrastructure, including parallelization and other performance optimizations.
6065

6166
## Pricing
6267

0 commit comments

Comments
 (0)