You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-get-started-portal-import-vectors.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -481,7 +481,7 @@ Search Explorer accepts text strings as input and then vectorizes the text for v
481
481
482
482
Each document is a chunk of the original PDF. The `title` field shows which PDF the chunk comes from. Each `chunk` is quite long. You can copy and paste one into a text editor to read the entire value.
483
483
484
-
1. To see all of the chunks from a specific document, add a filter for the `text_parent_id` field for a specific PDF. You can check the **Fields** tab of your index to confirm this field is filterable.
484
+
1. To see all of the chunks from a specific document, add a filter for the `title_parent_id` field for a specific PDF. You can check the **Fields** tab of your index to confirm this field is filterable.
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-sharepoint-online.md
+26-23Lines changed: 26 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: SharePoint and OneDrive indexer (preview)
2
+
title: SharePoint Online indexer (preview)
3
3
titleSuffix: Azure AI Search
4
-
description: Set up a SharePoint and OneDrive indexer to automate indexing of document library content in Azure AI Search.
4
+
description: Set up a SharePoint Online indexer to automate indexing of document library content in Azure AI Search.
5
5
author: gmndrg
6
6
ms.author: gimondra
7
7
@@ -15,7 +15,7 @@ ms.date: 08/20/2024
15
15
# Index data from SharePoint document libraries
16
16
17
17
> [!IMPORTANT]
18
-
> SharePoint and OneDrive indexer support is in public preview. It's offered "as-is", under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) and supported on best effort only. Preview features aren't recommended for production workloads and aren't guaranteed to become generally available.
18
+
> SharePoint Online indexer support is in public preview. It's offered "as-is", under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) and supported on best effort only. Preview features aren't recommended for production workloads and aren't guaranteed to become generally available.
19
19
>
20
20
> Be sure to visit the [known limitations](#limitations-and-considerations) section before you start.
21
21
>
@@ -25,7 +25,7 @@ This article explains how to configure a [search indexer](search-indexer-overvie
25
25
26
26
## Functionality
27
27
28
-
An indexer in Azure AI Search is a crawler that extracts searchable data and metadata from a data source. The SharePoint and OneDrive indexer connects to your SharePoint site and indexes documents from one or more document libraries. The indexer provides the following functionality:
28
+
An indexer in Azure AI Search is a crawler that extracts searchable data and metadata from a data source. The SharePoint Online indexer connects to your SharePoint site and indexes documents from one or more document libraries. The indexer provides the following functionality:
29
29
30
30
+ Index files and metadata from one or more document libraries.
31
31
+ Index incrementally, picking up just the new and changed files and metadata.
@@ -34,13 +34,13 @@ An indexer in Azure AI Search is a crawler that extracts searchable data and met
34
34
35
35
## Prerequisites
36
36
37
-
+[SharePoint and OneDrive](/sharepoint/introduction) cloud service
37
+
+[SharePoint in Microsoft 365](/sharepoint/introduction) cloud service
38
38
39
39
+ Files in a [document library](https://support.microsoft.com/office/what-is-a-document-library-3b5976dd-65cf-4c9e-bf5a-713c10ca2872)
40
40
41
41
## Supported document formats
42
42
43
-
The SharePoint and OneDrive indexer can extract text from the following document formats:
43
+
The SharePoint Online indexer can extract text from the following document formats:
@@ -64,21 +64,24 @@ Here are the limitations of this feature:
64
64
65
65
+ Indexing sub-sites recursively from a specific site provided isn't supported.
66
66
67
-
+ SharePoint and OneDrive indexer isn't supported when [Microsoft ENTRA ID Conditional Access](/entra/identity/conditional-access/overview) is enabled.
67
+
+ SharePoint Online indexer isn't supported when [Microsoft Entra ID Conditional Access](/entra/identity/conditional-access/overview) is enabled.
68
68
69
69
Here are the considerations when using this feature:
70
70
71
71
+ If you need to create a custom Copilot / RAG (Retrieval Augmented Generation) application to chat with SharePoint data, the recommended approach is to use [Microsoft Copilot Studio](https://www.microsoft.com/microsoft-copilot/microsoft-copilot-studio) instead of this preview feature.
72
72
73
73
+ If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with [SharePoint Webhooks](/sharepoint/dev/apis/webhooks/overview-sharepoint-webhooks), calling [Microsoft Graph API](/graph/use-the-api) to export the data to an Azure Blob container, and then use the [Azure blob indexer](search-howto-indexing-azure-blob-storage.md) for incremental indexing.
74
74
75
-
<!-- + There could be Microsoft 365 processes that update SharePoint file system-metadata (based on different configurations in SharePoint) and will cause the SharePoint and OneDrive indexer to trigger. Make sure that you test your setup and understand the document processing count prior to using any AI enrichment. Since this is a third-party connector to Azure (SharePoint is located in Microsoft 365), SharePoint configuration is not checked by the indexer. -->
75
+
<!-- + There could be Microsoft 365 processes that update SharePoint file system-metadata (based on different configurations in SharePoint) and will cause the SharePoint Online indexer to trigger. Make sure that you test your setup and understand the document processing count prior to using any AI enrichment. Since this is a third-party connector to Azure (SharePoint is located in Microsoft 365), SharePoint configuration is not checked by the indexer. -->
76
76
77
-
+ If your SharePoint configuration allows Microsoft 365 processes to update SharePoint file system metadata, be aware that these updates can trigger the SharePoint and OneDrive indexer, causing the indexer to ingest documents multiple times. Because the SharePoint and OneDrive indexer is a third-party connector to Azure, the indexer can't read the configuration or vary its behavior. It responds to changes in new and changed content, regardless of how those updates are made. For this reason, make sure that you test your setup and understand the document processing count prior to using the indexer and any AI enrichment.
77
+
+ If your SharePoint configuration allows Microsoft 365 processes to update SharePoint file system metadata, be aware that these updates can trigger the SharePoint Online indexer, causing the indexer to ingest documents multiple times. Because the SharePoint Online indexer is a non-Microsoft connector to Azure, the indexer can't read the configuration or vary its behavior. It responds to changes in new and changed content, regardless of how those updates are made. For this reason, make sure that you test your setup and understand the document processing count prior to using the indexer and any AI enrichment.
78
78
79
-
## Configure the SharePoint and OneDrive indexer
80
79
81
-
To set up the SharePoint and OneDrive indexer, use both the Azure portal and a preview REST API. You can use 2020-06-30-preview or later. We recommend the latest preview API.
80
+
81
+
82
+
## Configure the SharePoint Online indexer
83
+
84
+
To set up the SharePoint Online indexer, use both the Azure portal and a preview REST API. You can use 2020-06-30-preview or later. We recommend the latest preview API.
82
85
83
86
This section provides the steps. You can also watch the following video.
84
87
@@ -98,20 +101,20 @@ After selecting **Save**, you get an Object ID that has been assigned to your se
98
101
99
102
### Step 2: Decide which permissions the indexer requires
100
103
101
-
The SharePoint and OneDrive indexer supports both [delegated and application](/graph/auth/auth-concepts#delegated-and-application-permissions) permissions. Choose which permissions you want to use based on your scenario.
104
+
The SharePoint Online indexer supports both [delegated and application](/graph/auth/auth-concepts#delegated-and-application-permissions) permissions. Choose which permissions you want to use based on your scenario.
102
105
103
106
We recommend app-based permissions. See [limitations](#limitations-and-considerations) for known issues related to delegated permissions.
104
107
105
108
+ Application permissions (recommended), where the indexer runs under the [identity of the SharePoint tenant](/sharepoint/dev/solution-guidance/security-apponly-azureacs) with access to all sites and files. The indexer requires a [client secret](/azure/active-directory/develop/v2-oauth2-client-creds-grant-flow). The indexer will also require [tenant admin approval](/azure/active-directory/manage-apps/grant-admin-consent) before it can index any content.
106
109
107
-
+ Delegated permissions, where the indexer runs under the identity of the user or app sending the request. Data access is limited to the sites and files to which the caller has access. To support delegated permissions, the indexer requires a [device code prompt](/azure/active-directory/develop/v2-oauth2-device-code) to sign in on behalf of the user. User-delegated permissions enforces token expiration every 75 minutes, per the most recent security libraries used to implement this authentication type. This is not a behavior that can be adjusted. An expired token requires manual indexing using [Run Indexer (preview)](/rest/api/searchservice/indexers/run?view=rest-searchservice-2024-05-01-preview&tabs=HTTP&preserve-view=true). For this reason, you might want app-based permissions instead.
110
+
+ Delegated permissions, where the indexer runs under the identity of the user or app sending the request. Data access is limited to the sites and files to which the caller has access. To support delegated permissions, the indexer requires a [device code prompt](/azure/active-directory/develop/v2-oauth2-device-code) to sign in on behalf of the user. User-delegated permissions enforce token expiration every 75 minutes, per the most recent security libraries used to implement this authentication type. This isn't a behavior that can be adjusted. An expired token requires manual indexing using [Run Indexer (preview)](/rest/api/searchservice/indexers/run?view=rest-searchservice-2024-05-01-preview&tabs=HTTP&preserve-view=true). For this reason, you might want app-based permissions instead.
### Step 3: Create a Microsoft Entra application registration
113
116
114
-
The SharePoint and OneDrive indexer uses this Microsoft Entra application for authentication.
117
+
The SharePoint Online indexer uses this Microsoft Entra application for authentication.
115
118
116
119
1. Sign in to the [Azure portal](https://portal.azure.com).
117
120
@@ -244,13 +247,13 @@ api-key: [admin key]
244
247
```
245
248
246
249
> [!IMPORTANT]
247
-
> Only [`metadata_spo_site_library_item_id`](#metadata) may be used as the key field in an index populated by the SharePoint and OneDriveindexer. If a key field doesn't exist in the data source, `metadata_spo_site_library_item_id` is automatically mapped to the key field.
250
+
> Only [`metadata_spo_site_library_item_id`](#metadata) may be used as the key field in an index populated by the SharePoint Online indexer. If a key field doesn't exist in the data source, `metadata_spo_site_library_item_id` is automatically mapped to the key field.
248
251
249
252
### Step 6: Create an indexer
250
253
251
254
An indexer connects a data source with a target search index and provides a schedule to automate the data refresh. Once the index and data source are created, you can create the indexer.
252
255
253
-
If you are using delegated permissions, during this step, you’re asked to sign in with organization credentials that have access to the SharePoint site. If possible, we recommend creating a new organizational user account and giving that new user the exact permissions that you want the indexer to have.
256
+
If you're using delegated permissions, during this step, you’re asked to sign in with organization credentials that have access to the SharePoint site. If possible, we recommend creating a new organizational user account and giving that new user the exact permissions that you want the indexer to have.
254
257
255
258
There are a few steps to creating the indexer:
256
259
@@ -289,7 +292,7 @@ There are a few steps to creating the indexer:
289
292
}
290
293
```
291
294
292
-
If you're using application permissions, it's necessary to wait until the initial run is complete before starting to query your index. The following instructions provided in this step pertain specifically to delegated permissions, and are not applicable to application permissions.
295
+
If you're using application permissions, it's necessary to wait until the initial run is complete before starting to query your index. The following instructions provided in this step pertain specifically to delegated permissions, and aren't applicable to application permissions.
293
296
294
297
1. When you create the indexer for the first time, the [Create Indexer (preview)](/rest/api/searchservice/indexers/create-or-update?view=rest-searchservice-2024-05-01-preview&tabs=HTTP&preserve-view=true) request waits until you complete the next step. You must call [Get Indexer Status](/rest/api/searchservice/indexers/get-status?view=rest-searchservice-2024-05-01-preview&tabs=HTTP&preserve-view=true) to get the link and enter your new device code.
295
298
@@ -316,7 +319,7 @@ There are a few steps to creating the indexer:
316
319
317
320
:::image type="content" source="media/search-howto-index-sharepoint-online/enter-device-code.png" alt-text="Screenshot showing how to enter a device code.":::
318
321
319
-
1. The SharePoint and OneDrive indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn’t have access to a document in the Document Library that you want to index, the indexer won’t have access to that document.
322
+
1. The SharePoint Online indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn’t have access to a document in the Document Library that you want to index, the indexer won’t have access to that document.
320
323
321
324
If possible, we recommend creating a new user account and giving that new user the exact permissions that you want the indexer to have.
322
325
@@ -374,7 +377,7 @@ Here are the steps for updating a data source, assuming an expired device code:
374
377
375
378
## Indexing document metadata
376
379
377
-
If you're indexing document metadata (`"dataToExtract": "contentAndMetadata"`), the following metadata will be available to index.
380
+
If you're indexing document metadata (`"dataToExtract": "contentAndMetadata"`), the following metadata is available to index.
| metadata_spo_item_weburi | Edm.String | The URI of the item. |
391
394
| metadata_spo_item_path | Edm.String | The combination of the parent path and item name. |
392
395
393
-
The SharePoint and OneDrive indexer also supports metadata specific to each document type. More information can be found in [Content metadata properties used in Azure AI Search](search-blob-metadata-properties.md).
396
+
The SharePoint Online indexer also supports metadata specific to each document type. More information can be found in [Content metadata properties used in Azure AI Search](search-blob-metadata-properties.md).
394
397
395
398
> [!NOTE]
396
399
> To index custom metadata, "additionalColumns" must be specified in the [query parameter of the data source](#query).
@@ -417,7 +420,7 @@ PUT /indexers/[indexer name]?api-version=2024-05-01-preview
417
420
418
421
## Controlling which documents are indexed
419
422
420
-
A single SharePoint and OneDrive indexer can index content from one or more document libraries. Use the "container" parameter on the data source definition to indicate which sites and document libraries to index from.
423
+
A single SharePoint Online indexer can index content from one or more document libraries. Use the "container" parameter on the data source definition to indicate which sites and document libraries to index from.
421
424
422
425
The [data source "container" section](#create-data-source) has two properties for this task: "name" and "query".
423
426
@@ -450,7 +453,7 @@ The "query" parameter of the data source is made up of keyword/value pairs. The
450
453
451
454
## Handling errors
452
455
453
-
By default, the SharePoint and OneDrive indexer stops as soon as it encounters a document with an unsupported content type (for example, an image). You can use the `excludedFileNameExtensions` parameter to skip certain content types. However, you might need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the `failOnUnsupportedContentType` configuration parameter to false:
456
+
By default, the SharePoint Online indexer stops as soon as it encounters a document with an unsupported content type (for example, an image). You can use the `excludedFileNameExtensions` parameter to skip certain content types. However, you might need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the `failOnUnsupportedContentType` configuration parameter to false:
454
457
455
458
```http
456
459
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2024-05-01-preview
@@ -484,7 +487,7 @@ You can also continue indexing if errors happen at any point of processing, eith
484
487
}
485
488
```
486
489
487
-
If a file on the SharePoint site has encryption enabled, an error message similar to the following may be encountered:
490
+
If a file on the SharePoint site has encryption enabled, you might see the following error message:
488
491
489
492
```
490
493
Code: resourceModified Message: The resource has changed since the caller last read it; usually an eTag mismatch Inner error: Code: irmEncryptFailedToFindProtector
0 commit comments