Skip to content

Commit c34b138

Browse files
authored
Platform: "Reprocess all" check box applies only to S3 and Azure source connectors (#313)
1 parent fea369f commit c34b138

File tree

2 files changed

+22
-4
lines changed

2 files changed

+22
-4
lines changed

platform/workflows.mdx

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,11 @@ To create an automatic workflow:
5555
- **Basic** is a good choice if you have text-only documents that have no images or tables in them.
5656
- **Advanced** is a good choice if you have complex documents that have images or tables or both in them.
5757

58-
9. If you want to overwrite any files in the destination location that might have been previously processed, check the **Reprocess all** box.
58+
9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
59+
60+
- Checking this box reprocesses all documents in the source location on every workflow run.
61+
- Unchecking this box causes only new documents that are added to the source location since the last workflow run to be processed on future runs. Previously processed documents are not processed again, even if those documents' contents change.
62+
5963
10. If you want to retry processing any documents that failed to process, check the **Retry Failed Documents** box.
6064
11. Click **Continue**.
6165
12. If you want this workflow to run on a schedule, in the **Repeat Run** dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select **Don't repeat**.
@@ -186,8 +190,12 @@ There are two ways to create a custom workflow:
186190
- [Embedding overview](/platform/embedding)
187191
- [Understanding embedding models: make an informed choice for your RAG](https://unstructured.io/blog/understanding-embedding-models-make-an-informed-choice-for-your-rag).
188192

189-
17. Check the **Reprocess all** box if you want to overwrite any files in the destination location that might have been previously processed,
190-
18. Check the **Retry Failed Documents** box if you want to retry processing any documents that failed to process,
193+
17. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
194+
195+
- Checking this box reprocesses all documents in the source location on every workflow run.
196+
- Unchecking this box causes only new documents that are added to the source location since the last workflow run to be processed on future runs. Previously processed documents are not processed again, even if those documents' contents change.
197+
198+
18. Check the **Retry Failed Documents** box if you want to retry processing any documents that failed to process.
191199
19. Click **Continue**.
192200
20. If you want this workflow to run on a schedule, in the **Repeat Run** dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select **Don't repeat**.
193201
21. Click **Complete**.
@@ -212,6 +220,12 @@ There are two ways to create a custom workflow:
212220
5. Next to **Name**, click the pencil icon, enter some unique name for this workflow, and then click the check mark icon.
213221
6. If you want this workflow to run on a schedule, click the **Schedule** button. In the **Repeat Run** dropdown list, select one of the scheduling options, and fill in the scheduling settings.
214222
7. To overwrite any previously processed files, or to retry any documents that fail to process, click the **Settings** button, and check either or both of the boxes.
223+
224+
The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
225+
226+
- Checking this box reprocesses all documents in the source location on every workflow run.
227+
- Unchecking this box causes only new documents that are added to the source location since the last workflow run to be processed on future runs. Previously processed documents are not processed again, even if those documents' contents change.
228+
215229
8. In the pipeline designer, click the **Source** node. In the **Source** pane, select the source location. Then click **Save**.
216230

217231
![Workflow designer](/img/platform/Workflow-Designer.png)

snippets/quickstarts/platform.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,11 @@ allowfullscreen
7777
- **Basic** is a good choice if you have text-only documents that have no images or tables in them.
7878
- **Advanced** is a good choice if you have complex documents that have images or tables or both in them.
7979

80-
9. If you want to overwrite any files in the destination location that might have been previously processed, check the **Reprocess all** box.
80+
9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
81+
82+
- Checking this box reprocesses all documents in the source location on every workflow run.
83+
- Unchecking this box causes only new documents that are added to the source location since the last workflow run to be processed on future runs. Previously processed documents are not processed again, even if those documents' contents change.
84+
8185
10. If you want to retry processing any documents that failed to process, check the **Retry Failed Documents** box.
8286
11. Click **Continue**.
8387
12. If you want this workflow to run on a schedule, in the **Repeat Run** dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select **Don't repeat**.

0 commit comments

Comments
 (0)