|
| 1 | +You can connect any available source connector to any available destination connector. However, the source connector code examples in the |
| 2 | +documentation show connecting only to the local destination connector. Similarly, the destination connector code examples in the |
| 3 | +documentation show connecting only to the local source connector. |
| 4 | + |
| 5 | +To quickly generate an Unstructured Ingest Python library code example that connects _any_ available source connector to _any_ available destination connector, |
| 6 | +do the following: |
| 7 | + |
| 8 | +1. Open the [Unstructured Ingest Code Generator](https://huggingface.co/spaces/MariaK/unstructured-pipeline-builder) webpage. |
| 9 | +2. Select your input (source) location type from the **Get unstructured documents from** drop-down list. |
| 10 | +3. Select your output (destination) location type from the **Upload RAG-ready documents to** drop-down list. |
| 11 | +4. Select your chunking strategy from the **Chunking strategy** drop-down list: |
| 12 | + |
| 13 | + - **None** - Do not chunk the data elements' content. |
| 14 | + - **basic** - Combine sequential data elements to maximally fill each chunk. However, do not mix `Table` and non-`Table` elements in the same chunk. |
| 15 | + - **by_title** - Use the `basic` strategy and also preserve section boundaries. Optionally preserve page boundaries as well. |
| 16 | + - **by_page** - Use the `basic` strategy and also preserve page boundaries. |
| 17 | + - **by_similarity** - Use the `sentence-transformers/multi-qa-mpnet-base-dot-v1` embedding model to identify topically similar sequential elements and combine them into chunks. This strategy is availably only when calling Unstructured API services. |
| 18 | + |
| 19 | + To learn more, see [Chunking strategies](/api-reference/api-services/chunking) and [Chunking configuration](/api-reference/ingest/ingest-configuration/chunking-configuration). |
| 20 | + |
| 21 | +5. For any chunking strategy other than **None**: |
| 22 | + |
| 23 | + - Enter your chunk size in the **Chunk size (characters)** box, or leave the default of **1000** characters. |
| 24 | + - If you need to apply overlapping to the chunks, enter the chunk overlap size in the **Chunk overlap (characters)** box, or leave default of **20** characters. |
| 25 | + |
| 26 | + To learn more, see [Chunking configuration](/api-reference/ingest/ingest-configuration/chunking-configuration). |
| 27 | + |
| 28 | +6. To generate vector embeddings, select the provider in the **Embedding provider** drop-down list. |
| 29 | + |
| 30 | + To learn more, see [Embedding configuraton](/api-reference/ingest/ingest-configuration/embedding-configuration). |
| 31 | + |
| 32 | +7. Click **Generate code**. |
| 33 | +8. Copy the example code from the **Generated Code** pane into your code project. |
| 34 | +9. The code example will contain one or more environment variables that you must set for the code to run correctly. To learn what to |
| 35 | +set these variables to, click the documentation links that are below the **Generated Code** pane. |
0 commit comments