You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-sharepoint-online.md
+18-21Lines changed: 18 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: SharePoint Online indexer (preview)
2
+
title: SharePoint and OneDrive indexer (preview)
3
3
titleSuffix: Azure AI Search
4
-
description: Set up a SharePoint Online indexer to automate indexing of document library content in Azure AI Search.
4
+
description: Set up a SharePoint and OneDrive indexer to automate indexing of document library content in Azure AI Search.
5
5
author: gmndrg
6
6
ms.author: gimondra
7
7
@@ -15,7 +15,7 @@ ms.date: 08/20/2024
15
15
# Index data from SharePoint document libraries
16
16
17
17
> [!IMPORTANT]
18
-
> SharePoint Online indexer support is in public preview. It's offered "as-is", under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) and supported on best effort only. Preview features aren't recommended for production workloads and aren't guaranteed to become generally available.
18
+
> SharePoint and OneDrive indexer support is in public preview. It's offered "as-is", under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) and supported on best effort only. Preview features aren't recommended for production workloads and aren't guaranteed to become generally available.
19
19
>
20
20
> Be sure to visit the [known limitations](#limitations-and-considerations) section before you start.
21
21
>
@@ -25,7 +25,7 @@ This article explains how to configure a [search indexer](search-indexer-overvie
25
25
26
26
## Functionality
27
27
28
-
An indexer in Azure AI Search is a crawler that extracts searchable data and metadata from a data source. The SharePoint Online indexer connects to your SharePoint site and indexes documents from one or more document libraries. The indexer provides the following functionality:
28
+
An indexer in Azure AI Search is a crawler that extracts searchable data and metadata from a data source. The SharePoint and OneDrive indexer connects to your SharePoint site and indexes documents from one or more document libraries. The indexer provides the following functionality:
29
29
30
30
+ Index files and metadata from one or more document libraries.
31
31
+ Index incrementally, picking up just the new and changed files and metadata.
@@ -34,13 +34,13 @@ An indexer in Azure AI Search is a crawler that extracts searchable data and met
34
34
35
35
## Prerequisites
36
36
37
-
+[SharePoint in Microsoft 365](/sharepoint/introduction) cloud service
37
+
+[SharePoint and OneDrive](/sharepoint/introduction) cloud service
38
38
39
39
+ Files in a [document library](https://support.microsoft.com/office/what-is-a-document-library-3b5976dd-65cf-4c9e-bf5a-713c10ca2872)
40
40
41
41
## Supported document formats
42
42
43
-
The SharePoint Online indexer can extract text from the following document formats:
43
+
The SharePoint and OneDrive indexer can extract text from the following document formats:
@@ -64,24 +64,21 @@ Here are the limitations of this feature:
64
64
65
65
+ Indexing sub-sites recursively from a specific site provided isn't supported.
66
66
67
-
+ SharePoint Online indexer isn't supported when [Microsoft ENTRA ID Conditional Access](/entra/identity/conditional-access/overview) is enabled.
67
+
+ SharePoint and OneDrive indexer isn't supported when [Microsoft ENTRA ID Conditional Access](/entra/identity/conditional-access/overview) is enabled.
68
68
69
69
Here are the considerations when using this feature:
70
70
71
71
+ If you need to create a custom Copilot / RAG (Retrieval Augmented Generation) application to chat with SharePoint data, the recommended approach is to use [Microsoft Copilot Studio](https://www.microsoft.com/microsoft-copilot/microsoft-copilot-studio) instead of this preview feature.
72
72
73
73
+ If you need a SharePoint content indexing solution in a production environment, consider creating a custom connector with [SharePoint Webhooks](/sharepoint/dev/apis/webhooks/overview-sharepoint-webhooks), calling [Microsoft Graph API](/graph/use-the-api) to export the data to an Azure Blob container, and then use the [Azure blob indexer](search-howto-indexing-azure-blob-storage.md) for incremental indexing.
74
74
75
-
<!-- + There could be Microsoft 365 processes that update SharePoint file system-metadata (based on different configurations in SharePoint) and will cause the SharePoint Online indexer to trigger. Make sure that you test your setup and understand the document processing count prior to using any AI enrichment. Since this is a third-party connector to Azure (SharePoint is located in Microsoft 365), SharePoint configuration is not checked by the indexer. -->
75
+
<!-- + There could be Microsoft 365 processes that update SharePoint file system-metadata (based on different configurations in SharePoint) and will cause the SharePoint and OneDrive indexer to trigger. Make sure that you test your setup and understand the document processing count prior to using any AI enrichment. Since this is a third-party connector to Azure (SharePoint is located in Microsoft 365), SharePoint configuration is not checked by the indexer. -->
76
76
77
-
+ If your SharePoint configuration allows Microsoft 365 processes to update SharePoint file system metadata, be aware that these updates can trigger the SharePoint Online indexer, causing the indexer to ingest documents multiple times. Because the SharePoint Online indexer is a third-party connector to Azure, the indexer can't read the configuration or vary its behavior. It responds to changes in new and changed content, regardless of how those updates are made. For this reason, make sure that you test your setup and understand the document processing count prior to using the indexer and any AI enrichment.
77
+
+ If your SharePoint configuration allows Microsoft 365 processes to update SharePoint file system metadata, be aware that these updates can trigger the SharePoint and OneDrive indexer, causing the indexer to ingest documents multiple times. Because the SharePoint and OneDrive indexer is a third-party connector to Azure, the indexer can't read the configuration or vary its behavior. It responds to changes in new and changed content, regardless of how those updates are made. For this reason, make sure that you test your setup and understand the document processing count prior to using the indexer and any AI enrichment.
78
78
79
+
## Configure the SharePoint and OneDrive indexer
79
80
80
-
81
-
82
-
## Configure the SharePoint Online indexer
83
-
84
-
To set up the SharePoint Online indexer, use both the Azure portal and a preview REST API. You can use 2020-06-30-preview or later. We recommend the latest preview API.
81
+
To set up the SharePoint and OneDrive indexer, use both the Azure portal and a preview REST API. You can use 2020-06-30-preview or later. We recommend the latest preview API.
85
82
86
83
This section provides the steps. You can also watch the following video.
87
84
@@ -101,7 +98,7 @@ After selecting **Save**, you get an Object ID that has been assigned to your se
101
98
102
99
### Step 2: Decide which permissions the indexer requires
103
100
104
-
The SharePoint Online indexer supports both [delegated and application](/graph/auth/auth-concepts#delegated-and-application-permissions) permissions. Choose which permissions you want to use based on your scenario.
101
+
The SharePoint and OneDrive indexer supports both [delegated and application](/graph/auth/auth-concepts#delegated-and-application-permissions) permissions. Choose which permissions you want to use based on your scenario.
105
102
106
103
We recommend app-based permissions. See [limitations](#limitations-and-considerations) for known issues related to delegated permissions.
107
104
@@ -114,7 +111,7 @@ We recommend app-based permissions. See [limitations](#limitations-and-considera
114
111
115
112
### Step 3: Create a Microsoft Entra application registration
116
113
117
-
The SharePoint Online indexer uses this Microsoft Entra application for authentication.
114
+
The SharePoint and OneDrive indexer uses this Microsoft Entra application for authentication.
118
115
119
116
1. Sign in to the [Azure portal](https://portal.azure.com).
120
117
@@ -247,7 +244,7 @@ api-key: [admin key]
247
244
```
248
245
249
246
> [!IMPORTANT]
250
-
> Only [`metadata_spo_site_library_item_id`](#metadata) may be used as the key field in an index populated by the SharePoint Online indexer. If a key field doesn't exist in the data source, `metadata_spo_site_library_item_id` is automatically mapped to the key field.
247
+
> Only [`metadata_spo_site_library_item_id`](#metadata) may be used as the key field in an index populated by the SharePoint and OneDriveindexer. If a key field doesn't exist in the data source, `metadata_spo_site_library_item_id` is automatically mapped to the key field.
251
248
252
249
### Step 6: Create an indexer
253
250
@@ -319,7 +316,7 @@ There are a few steps to creating the indexer:
319
316
320
317
:::image type="content" source="media/search-howto-index-sharepoint-online/enter-device-code.png" alt-text="Screenshot showing how to enter a device code.":::
321
318
322
-
1. The SharePoint Online indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn’t have access to a document in the Document Library that you want to index, the indexer won’t have access to that document.
319
+
1. The SharePoint and OneDrive indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn’t have access to a document in the Document Library that you want to index, the indexer won’t have access to that document.
323
320
324
321
If possible, we recommend creating a new user account and giving that new user the exact permissions that you want the indexer to have.
| metadata_spo_item_weburi | Edm.String | The URI of the item. |
394
391
| metadata_spo_item_path | Edm.String | The combination of the parent path and item name. |
395
392
396
-
The SharePoint Online indexer also supports metadata specific to each document type. More information can be found in [Content metadata properties used in Azure AI Search](search-blob-metadata-properties.md).
393
+
The SharePoint and OneDrive indexer also supports metadata specific to each document type. More information can be found in [Content metadata properties used in Azure AI Search](search-blob-metadata-properties.md).
397
394
398
395
> [!NOTE]
399
396
> To index custom metadata, "additionalColumns" must be specified in the [query parameter of the data source](#query).
@@ -420,7 +417,7 @@ PUT /indexers/[indexer name]?api-version=2024-05-01-preview
420
417
421
418
## Controlling which documents are indexed
422
419
423
-
A single SharePoint Online indexer can index content from one or more document libraries. Use the "container" parameter on the data source definition to indicate which sites and document libraries to index from.
420
+
A single SharePoint and OneDrive indexer can index content from one or more document libraries. Use the "container" parameter on the data source definition to indicate which sites and document libraries to index from.
424
421
425
422
The [data source "container" section](#create-data-source) has two properties for this task: "name" and "query".
426
423
@@ -453,7 +450,7 @@ The "query" parameter of the data source is made up of keyword/value pairs. The
453
450
454
451
## Handling errors
455
452
456
-
By default, the SharePoint Online indexer stops as soon as it encounters a document with an unsupported content type (for example, an image). You can use the `excludedFileNameExtensions` parameter to skip certain content types. However, you might need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the `failOnUnsupportedContentType` configuration parameter to false:
453
+
By default, the SharePoint and OneDrive indexer stops as soon as it encounters a document with an unsupported content type (for example, an image). You can use the `excludedFileNameExtensions` parameter to skip certain content types. However, you might need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the `failOnUnsupportedContentType` configuration parameter to false:
457
454
458
455
```http
459
456
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2024-05-01-preview
0 commit comments