Skip to content

Commit f941e04

Browse files
authored
Update search-indexer-troubleshooting.md
Updated several parts of the document to adjust for Acrolynx compliance.
1 parent 5edc4aa commit f941e04

File tree

1 file changed

+20
-20
lines changed

1 file changed

+20
-20
lines changed

articles/search/search-indexer-troubleshooting.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Details for configuring IP address range restrictions for each data source type
4545

4646
* [Azure SQL](/azure/azure-sql/database/firewall-configure#create-and-manage-ip-firewall-rules)
4747

48-
**Limitation**: As stated in the documentation above for Azure Storage, IP address range restrictions will only work if your search service and your storage account are in different regions.
48+
**Limitation**: IP address range restrictions only work if your search service, and your storage account are in different regions.
4949

5050
Azure functions (that could be used as a [Custom Web Api skill](cognitive-search-custom-skill-web-api.md)) also support [IP address restrictions](../azure-functions/ip-addresses.md#ip-address-restrictions). The list of IP addresses to configure would be the IP address of your search service and the IP address range of `AzureCognitiveSearch` service tag.
5151

@@ -80,28 +80,28 @@ When you are receiving any of those errors:
8080

8181
If your SQL database is on a [serverless compute tier](/azure/azure-sql/database/serverless-tier-overview), make sure that the database is running (and not paused) when the indexer connects to it.
8282

83-
If the database is paused, the first login from your search service will auto-resume the database, but it will also return an error stating that the database is unavailable with error code 40613. After the database is running, retry the login to establish connectivity.
83+
If the database is paused, the first login from your search service is expected to auto-resume the database, but returning an error stating that the database is unavailable with error code 40613. After the database is running, retry the login to establish connectivity.
8484

8585
## Azure Active Directory Conditional Access policies
8686

87-
When creating a SharePoint indexer you will go through a step that requires you to sign in to your Azure AD app after providing a device code. If you receive a message that says `"Your sign-in was successful but your admin requires the device requesting access to be managed"` the indexer is likely being blocked from accessing the SharePoint document library due to a [Conditional Access](../active-directory/conditional-access/overview.md) policy.
87+
When creating a SharePoint indexer, you will go through a step that requires you to sign in to your Azure AD app after providing a device code. If you receive a message that says `"Your sign-in was successful but your admin requires the device requesting access to be managed"` the indexer is likely being blocked from accessing the SharePoint document library due to a [Conditional Access](../active-directory/conditional-access/overview.md) policy.
8888

8989
To update the policy to allow the indexer access to the document library, follow the below steps:
9090

91-
1. Open the Azure portal and search **Azure AD Conditional Access**, then select **Policies** on the left menu. If you don't have access to view this page you will need to either find someone who has access or get access.
91+
1. Open the Azure portal and search **Azure AD Conditional Access**, then select **Policies** on the left menu. If you don't have access to view this page, you need to either find someone who has access or get access.
9292

93-
1. Determine which policy is blocking the SharePoint indexer from accessing the document library. The policy that might be blocking the indexer will include the user account that you used to authenticate during the indexer creation step in the **Users and groups** section. The policy also might have **Conditions** that:
93+
1. Determine which policy is blocking the SharePoint indexer from accessing the document library. The policy that might be blocking the indexer includes the user account that you used to authenticate during the indexer creation step in the **Users and groups** section. The policy also might have **Conditions** that:
9494
* Restrict **Windows** platforms.
9595
* Restrict **Mobile apps and desktop clients**.
9696
* Have **Device state** configured to **Yes**.
9797

9898
1. Once you've confirmed there is a policy that is blocking the indexer, you next need to make an exemption for the indexer. Retrieve the search service IP address.
9999

100-
1. Obtain the fully qualified domain name (FQDN) of your search service. This will look like `<search-service-name>.search.windows.net`. You can find out the FQDN by looking up your search service on the Azure portal.
100+
1. Obtain the fully qualified domain name (FQDN) of your search service. The FQDN looks like `<search-service-name>.search.windows.net`. You can find out the FQDN by looking up your search service on the Azure portal.
101101

102102
![Obtain service FQDN](media\search-indexer-howto-secure-access\search-service-portal.png "Obtain service FQDN")
103103

104-
The IP address of the search service can be obtained by performing a `nslookup` (or a `ping`) of the FQDN. In the example below, you would add "150.0.0.1" to an inbound rule on the Azure Storage firewall. It might take up to 15 minutes after the firewall settings have been updated for the search service indexer to be able to access the Azure Storage account.
104+
The IP address of the search service can be obtained by performing a `nslookup` (or a `ping`) of the FQDN. In the following example, you would add "150.0.0.1" to an inbound rule on the Azure Storage firewall. It might take up to 15 minutes after the firewall settings have been updated for the search service indexer to be able to access the Azure Storage account.
105105

106106
```azurepowershell
107107
@@ -117,7 +117,7 @@ To update the policy to allow the indexer access to the document library, follow
117117
118118
1. Get the IP address ranges for the indexer execution environment for your region.
119119
120-
Additional IP addresses are used for requests that originate from the indexer's [multi-tenant execution environment](search-indexer-securing-resources.md#indexer-execution-environment). You can get this IP address range from the service tag.
120+
Extra IP addresses are used for requests that originate from the indexer's [multi-tenant execution environment](search-indexer-securing-resources.md#indexer-execution-environment). You can get this IP address range from the service tag.
121121
122122
The IP address ranges for the `AzureCognitiveSearch` service tag can be either obtained via the [discovery API](../virtual-network/service-tags-overview.md#use-the-service-tag-discovery-api) or the [downloadable JSON file](../virtual-network/service-tags-overview.md#discover-service-tags-by-using-downloadable-json-files).
123123
@@ -145,7 +145,7 @@ To update the policy to allow the indexer access to the document library, follow
145145
```
146146
147147
1. Back on the Conditional Access page in Azure portal, select **Named locations** from the menu on the left, then select **+ IP ranges location**. Give your new named location a name and add the IP ranges for your search service and indexer execution environments that you collected in the last two steps.
148-
* For your search service IP address you may need to add "/32" to the end of the IP address since it only accepts valid IP ranges.
148+
* For your search service IP address, you may need to add "/32" to the end of the IP address since it only accepts valid IP ranges.
149149
* Remember that for the indexer execution environment IP ranges, you only need to add the IP ranges for the region that your search service is in.
150150
151151
1. Exclude the new Named location from the policy.
@@ -164,7 +164,7 @@ To update the policy to allow the indexer access to the document library, follow
164164
165165
## Indexing unsupported document types
166166
167-
If you are indexing content from Azure Blob Storage, and the container includes blobs of an [unsupported content type](search-howto-indexing-azure-blob-storage.md#SupportedFormats), the indexer will skip that document. In other cases, there may be problems with individual documents.
167+
If you are indexing content from Azure Blob Storage, and the container includes blobs of an [unsupported content type](search-howto-indexing-azure-blob-storage.md#SupportedFormats), the indexer skips that document. In other cases, there may be problems with individual documents.
168168
169169
You can [set configuration options](search-howto-indexing-azure-blob-storage.md#DealingWithErrors) to allow indexer processing to continue in the event of problems with individual documents.
170170
@@ -181,12 +181,12 @@ api-key: [admin key]
181181

182182
## Missing documents
183183

184-
Indexers extract documents or rows from an external [data source](/rest/api/searchservice/create-data-source) and create *search documents* which are then indexed by the search service. Occasionally, a document that exists in data source fails to appear in a search index. This unexpected result can occur due to the following reasons:
184+
Indexers extract documents or rows from an external [data source](/rest/api/searchservice/create-data-source) and create *search documents*, which are then indexed by the search service. Occasionally, a document that exists in data source fails to appear in a search index. This unexpected result can occur due to the following reasons:
185185

186-
* The document was updated after the indexer was run. If your indexer is on a [schedule](/rest/api/searchservice/create-indexer#indexer-schedule), it will eventually rerun and pick up the document.
187-
* The indexer timed out before the document could be ingested. There are [maximum processing time limits](search-limits-quotas-capacity.md#indexer-limits) after which no documents will be processed. You can check indexer status in the portal or by calling [Get Indexer Status (REST API)](/rest/api/searchservice/get-indexer-status).
186+
* The document was updated after the indexer was run. If your indexer is on a [schedule](/rest/api/searchservice/create-indexer#indexer-schedule), it eventually reruns and picks up the document.
187+
* The indexer timed out before the document could be ingested. There are [maximum processing time limits](search-limits-quotas-capacity.md#indexer-limits) after which no documents are processed. You can check indexer status in the portal or by calling [Get Indexer Status (REST API)](/rest/api/searchservice/get-indexer-status).
188188
* [Field mappings](/rest/api/searchservice/create-indexer#fieldmappings) or [AI enrichment](./cognitive-search-concept-intro.md) have changed the document and its articulation in the search index is different from what you expect.
189-
* [Change tracking](/rest/api/searchservice/create-data-source#data-change-detection-policies) values are erroneous or prerequisites are missing. If your high watermark value is a date set to a future time, then any documents that have a date less than this will be skipped by the indexer. You can understand your indexer's change tracking state using the 'initialTrackingState' and 'finalTrackingState' fields in the [indexer status](/rest/api/searchservice/get-indexer-status#indexer-execution-result). Indexers for Azure SQL and MySQL must have an index on the high water mark column of the source table, or queries used by the indexer may time out.
189+
* [Change tracking](/rest/api/searchservice/create-data-source#data-change-detection-policies) values are erroneous or prerequisites are missing. If your high watermark value is a date set to a future time, then any documents that have a date less than this are skipped by the indexer. You can understand your indexer's change tracking state using the 'initialTrackingState' and 'finalTrackingState' fields in the [indexer status](/rest/api/searchservice/get-indexer-status#indexer-execution-result). Indexers for Azure SQL and MySQL must have an index on the high water mark column of the source table, or queries used by the indexer may time out.
190190

191191
> [!TIP]
192192
> If documents are missing, check the [query](/rest/api/searchservice/search-documents) you are using to make sure it isn't excluding the document in question. To query for a specific document, use the [Lookup Document REST API](/rest/api/searchservice/lookup-document).
@@ -218,10 +218,10 @@ Azure Cognitive Search has an implicit dependency on Azure Cosmos DB indexing. I
218218

219219
## Indexer reflects a different document count than data source or index
220220

221-
Indexer may show a different document count than either the data source, the index or count in your code in a point in time, depending on specific circumstances. Here are some possible causes of why this may occur:
221+
Indexer may show a different document count than either the data source, the index or count in your code, depending on specific circumstances. Here are some possible causes of why this behavior may occur:
222222

223223
- The indexer has a Deleted Document Policy. The deleted documents get counted on the indexer end if they are indexed before they get deleted.
224-
- If the ID column in the data source is not unique. This is for data sources that have the concept of columns, such as Azure Cosmos DB.
224+
- If the ID column in the data source is not unique. This applies to data sources that have the concept of columns, such as Azure Cosmos DB.
225225
- If the data source definition has a different query than the one you are using to estimate the number of records. In example, in your data base you are querying all your data base record count, while in the data source definition query you may be selecting just a subset of records to index.
226226
- The counts are being checked in different intervals for each component of the pipeline: data source, indexer and index.
227227
- The index may take some minutes to show the real document count.
@@ -231,7 +231,7 @@ Indexer may show a different document count than either the data source, the ind
231231

232232
## Documents processed multiple times
233233

234-
Indexers leverage a conservative buffering strategy to ensure that every new and changed document in the data source is picked up during indexing. In certain situations, these buffers can overlap, causing an indexer to index a document two or more times resulting in the processed documents count to be more than actual number of documents in the data source. This behavior does **not** affect the data stored in the index, such as duplicating documents, only that it may take longer to reach eventual consistency. This can be especially prevalent if any of the following conditions are true:
234+
Indexers use a conservative buffering strategy to ensure that every new and changed document in the data source is picked up during indexing. In certain situations, these buffers can overlap, causing an indexer to index a document two or more times resulting in the processed documents count to be more than actual number of documents in the data source. This behavior does **not** affect the data stored in the index, such as duplicating documents, only that it may take longer to reach eventual consistency. This condition can be especially prevalent if any of the following criteria are true:
235235

236236
- On-demand indexer requests are issued in quick succession
237237
- The data source's topology includes multiple replicas and partitions (one such example is discussed [here](../cosmos-db/consistency-levels.md))
@@ -241,7 +241,7 @@ Indexers are not intended to be invoked multiple times in quick succession. If y
241241

242242
### Example of duplicate document processing with 30 second buffer
243243

244-
Conditions under which a document is processed twice is explained below in a timeline that notes each action and counter action. The following timeline illustrates the issue:
244+
Conditions under which a document is processed twice is explained in the following timeline that notes each action and counter action. The following timeline illustrates the issue:
245245

246246
| Timeline (hh:mm:ss) | Event | Indexer High Water Mark | Comment |
247247
|---------------------|-------|-------------------------|---------|
@@ -264,14 +264,14 @@ Conditions under which a document is processed twice is explained below in a tim
264264
| 00:01:40 | Indexer starts | 00:01:32 | |
265265
| 00:01:41 | Indexer queries for all changes between 00:01:02 and 00:01:40; retrieves `doc2` | 00:01:32 | Indexer requests for all changes between current high water mark minus the 30 second buffer, and starting timestamp of current indexer execution. |
266266
| 00:01:42 | Indexer processes `doc2` for the fourth time | 00:01:32 | |
267-
| 00:01:43 | Indexer ends | 00:01:40 | Notice this indexer execution started more than 30 seconds after the last write to the data source and also processed `doc2`. This is the expected behavior because if all indexer executions before 00:01:35 are eliminated, this will become the first and only execution to process `doc1` and `doc2`. |
267+
| 00:01:43 | Indexer ends | 00:01:40 | Notice this indexer execution started more than 30 seconds after the last write to the data source and also processed `doc2`. This is the expected behavior because if all indexer executions before 00:01:35 are eliminated, this becomes the first and only execution to process `doc1` and `doc2`. |
268268

269269
In practice, this scenario only happens when on-demand indexers are manually invoked within minutes of each other, for certain data sources. It may result in mismatched numbers (like the indexer processed 345 documents total according to the indexer execution stats, but there are 340 documents in the data source and index) or potentially increased billing if you are running the same skills for the same document multiple times. Running an indexer using a schedule is the preferred recommendation.
270270

271271

272272
## Indexing documents with sensitivity labels
273273

274-
If you have [sensitivity labels set on documents](/microsoft-365/compliance/sensitivity-labels) you might not be able to index them. If you're getting errors, remove the labels prior to indexing.
274+
If you have [sensitivity labels set on documents](/microsoft-365/compliance/sensitivity-labels), you might not be able to index them. If you're getting errors, remove the labels prior to indexing.
275275

276276

277277
## See also

0 commit comments

Comments
 (0)