You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-how-to-index-markdown-blobs.md
+37-15Lines changed: 37 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -444,24 +444,34 @@ The resulting search document in the index would look as follows:
444
444
> These examples specify how to use these parsing modes entirely with or without field mappings, but you can leverage both in one scenario if that suits your needs.
445
445
>
446
446
447
-
## Managing Stale Documents from Markdown Re-indexing
447
+
## Managing stale documents from Markdown re-indexing
448
448
449
-
When working with Markdown files in Azure AI Search, it's important to understand how deletions are handled during re-indexing. The indexer does not automatically delete previously indexed documents when a file is modified or sections are removed. This can lead to duplicate or stale documents remaining in the index.
449
+
The indexer does not automatically delete previously indexed documents when a file is modified or sections are removed. This can lead to duplicate or stale documents remaining in the index.
450
450
451
451
### Behavior overview
452
452
453
453
***No automatic deletion**: If sections are removed from a Markdown file and the file is re-indexed, the indexer will overwrite existing documents with new ones. However, it does not delete documents that no longer correspond to any content in the updated file.
454
454
***Potential for duplicates**: This behavior can result in duplicate documents, especially in `oneToMany` parsing mode where each section becomes a separate document. This issue typically arises only when more Markdown sections are deleted than inserted between indexing runs. In such cases, the index retains documents from the previous version that no longer match any current content, leading to stale entries.
455
+
455
456
### Workaround options
456
457
457
458
To ensure the index reflects the current state of your Markdown files, consider one of the following approaches:
458
459
459
-
#### Option 1. Use the delete API
460
+
#### Option 1. Soft delete with metadata
461
+
This method uses a soft-delete to delete documents associated with a specific blob. See here for more information: [Change and delete detection using indexers for Azure Storage in Azure AI Search](search-howto-index-changed-deleted-blobs.md#soft-delete-strategy-using-custom-metadata):
462
+
463
+
1. Mark the blob as deleted by setting a metadata field.
464
+
2. Let the indexer run. It will delete all documents in the index associated with that blob.
465
+
3. Remove the soft-delete marker and re-index the file.
466
+
467
+
#### Option 2. Use the delete API
468
+
469
+
Before re-indexing a modified Markdown file, explicitly delete the existing documents associated with that file using the [https://learn.microsoft.com/en-us/rest/api/searchservice/delete-documents](https://learn.microsoft.com/en-us/rest/api/searchservice/preview-api/add-update-delete-documents). You can either:
470
+
471
+
* Manually indentify individual stale documents by identifying duplicates in the index to be deleted.
472
+
* (**Recommended**) Bulk delete all documents generated from the same parent file before re-indexing.
460
473
461
-
Before re-indexing a modified Markdown file, explicitly delete the existing documents associated with that file using the https://learn.microsoft.com/en-us/rest/api/searchservice/delete-documents.
462
-
Steps:
463
-
464
-
1. Identify the id of the documents associated with the file. Use a query like the one below to retrieve the document IDs (e.g., `id`, `chunk_id`, etc.) for all documents tied to a specific file. Replace `metadata_storage_path` with the appropriate field in your index that maps to the file path or blob URI.
474
+
1. Identify the id of the documents associated with the file. Use a query like the one below to retrieve the document key IDs (e.g., `id`, `chunk_id`, etc.) for all documents tied to a specific file. Replace `metadata_storage_path` with the appropriate field in your index that maps to the file path or blob URI. Note that this field must be a key.
465
475
```
466
476
GET https://<search-service>.search.windows.net/indexes/<index-name>/docs?api-version=2025-05-01-preview
467
477
Content-Type: application/json
@@ -472,15 +482,27 @@ Steps:
472
482
"select": "id"
473
483
}
474
484
```
475
-
2. Issue a delete request for those documents.
476
-
3. Re-index the updated file.
477
-
478
-
#### Option 2. Soft delete with metadata
479
-
If identifying stale documents in the index is difficult, you can use a soft-delete approach:
480
485
481
-
1. Mark the blob as deleted by setting a metadata field (e.g., deleted=true).
482
-
2. Let the indexer run. It will delete all documents in the index associated with that blob.
483
-
3. Remove the soft-delete marker and re-index the file.
486
+
2. Issue a delete request for the documents with the identified keys.
487
+
```
488
+
GET https://<search-service>.search.windows.net/indexes/<index-name>/docs?api-version=2025-05-01-preview
0 commit comments