You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-howto-index-sharepoint-online.md
+21-21Lines changed: 21 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,15 +9,15 @@ manager: liamca
9
9
10
10
ms.service: cognitive-search
11
11
ms.topic: how-to
12
-
ms.date: 06/01/2022
12
+
ms.date: 08/25/2022
13
13
---
14
14
15
15
# Index data from SharePoint document libraries
16
16
17
17
> [!IMPORTANT]
18
18
> SharePoint indexer support is currently in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). [Request access](https://aka.ms/azure-cognitive-search/indexer-preview) to this feature, and after access is enabled, use a [preview REST API (2020-06-30-preview or later)](search-api-preview.md) to index your content. There is currently limited portal support and no .NET SDK support.
19
19
20
-
Configure a [search indexer](search-indexer-overview.md) to index documents stored in SharePoint document libraries for full text search in Azure Cognitive Search. This article explains the configuration steps, followed by a deeper exploration of behaviors and scenarios you are likely to encounter.
20
+
This article explains how to configure a [search indexer](search-indexer-overview.md) to index documents stored in SharePoint document libraries for full text search in Azure Cognitive Search. Configuration steps are followed by a deeper exploration of behaviors and scenarios you're likely to encounter.
21
21
22
22
> [!NOTE]
23
23
> SharePoint supports a granular authorization model that determines per-user access at the document level. The SharePoint indexer does not pull these permissions into the search index, and Cognitive Search does not support document-level authorization. When a document is indexed from SharePoint into a search service, the content is available to anyone who has read access to the index. If you require document-level permissions, you should investigate [security filters to trim results](search-security-trimming-for-azure-search-with-aad.md) of unauthorized content.
@@ -45,33 +45,33 @@ The SharePoint indexer can extract text from the following document formats:
45
45
46
46
## Configure the SharePoint indexer
47
47
48
-
To set up the SharePoint indexer, you will need to perform some tasks in the Azure portal, and other tasks through the preview REST API.
48
+
To set up the SharePoint indexer, you'll need to perform some tasks in the Azure portal and others through the preview REST API.
49
49
50
-
The following video shows how to set up the SharePoint indexer.
50
+
The following video shows you how to set up the SharePoint indexer.
### Step 1 (Optional): Enable system assigned managed identity
55
55
56
56
When a system-assigned managed identity is enabled, Azure creates an identity for your search service that can be used by the indexer. This identity is used to automatically detect the tenant the search service is provisioned in.
57
57
58
-
If the SharePoint site is in the same tenant as the search service, you will need to enable the system-assigned managed identity for the search service in the Azure portal. If the SharePoint site is in a different tenant from the search service, skip this step.
58
+
If the SharePoint site is in the same tenant as the search service, you'll need to enable the system-assigned managed identity for the search service in the Azure portal. If the SharePoint site is in a different tenant from the search service, skip this step.
59
59
60
60
:::image type="content" source="media/search-howto-index-sharepoint-online/enable-managed-identity.png" alt-text="Enable system assigned managed identity":::
61
61
62
-
After selecting **Save** you will see an Object ID that has been assigned to your search service.
62
+
After selecting **Save** you'll see an Object ID that has been assigned to your search service.
### Step 2: Decide which permissions the indexer requires
67
67
68
68
The SharePoint indexer supports both [delegated and application](/graph/auth/auth-concepts#delegated-and-application-permissions) permissions. Choose which permissions you want to use based on your scenario:
69
69
70
-
+ Delegated permissions, where the indexer runs under the identity of the user or app that sent the request. Data access is limited to the sites and files to which the user has access. To support deleted permissions, the indexer requires a [device code prompt](../active-directory/develop/v2-oauth2-device-code.md) to log in on behalf of the user.
70
+
+ Delegated permissions, where the indexer runs under the identity of the user or app sending the request. Data access is limited to the sites and files to which the user has access. To support deleted permissions, the indexer requires a [device code prompt](../active-directory/develop/v2-oauth2-device-code.md) to sign in on behalf of the user.
71
71
72
72
+ Application permissions, where the indexer runs under the identity of the SharePoint tenant with access to all sites and files within the SharePoint tenant. The indexer requires a [client secret](../active-directory/develop/v2-oauth2-client-creds-grant-flow.md) to access the SharePoint tenant. The indexer will also require [tenant admin approval](../active-directory/manage-apps/grant-admin-consent.md) before it can index any content.
73
73
74
-
Note that if your Azure Active Directory organization has [Conditional Access enabled](../active-directory/conditional-access/overview.md) and your administrator is not able to grant any device access for Delegated permissions, you should consider Application permissions instead. For more information, refer to[SharePoint Conditional Access policies](./search-indexer-troubleshooting.md#sharepoint-conditional-access-policies).
74
+
If your Azure Active Directory organization has [Conditional Access enabled](../active-directory/conditional-access/overview.md) and your administrator isn't able to grant any device access for Delegated permissions, you should consider Application permissions instead. For more information, see[SharePoint Conditional Access policies](./search-indexer-troubleshooting.md#sharepoint-conditional-access-policies).
75
75
76
76
### Step 3: Create an Azure AD application
77
77
@@ -110,7 +110,7 @@ The SharePoint indexer will use this Azure Active Directory (Azure AD) applicati
110
110
111
111
1. Give admin consent.
112
112
113
-
Tenant admin consent is required when using application API permissions. Some tenants are locked down in such a way that tenant admin consent is required for delegated API permissions as well. If either of these are the case, you’ll need to have a tenant admin grant consent for this Azure AD application before creating the indexer.
113
+
Tenant admin consent is required when using application API permissions. Some tenants are locked down in such a way that tenant admin consent is required for delegated API permissions as well. If either of these conditions apply, you’ll need to have a tenant admin grant consent for this Azure AD application before creating the indexer.
114
114
115
115
:::image type="content" source="media/search-howto-index-sharepoint-online/aad-app-grant-admin-consent.png" alt-text="Azure AD app grant admin consent":::
116
116
@@ -128,7 +128,7 @@ The SharePoint indexer will use this Azure Active Directory (Azure AD) applicati
+ In the menu that pops up, enter a description for the new client secret. Adjust the expiration date if necessary. If the secret expires it will need to be recreated and the indexer needs to be updated with the new secret.
131
+
+ In the menu that pops up, enter a description for the new client secret. Adjust the expiration date if necessary. If the secret expires, it will need to be recreated and the indexer needs to be updated with the new secret.
1. The SharePoint indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you log in with a user account that doesn’t have access to a document in the Document Library that you want to index, the indexer won’t have access to that document.
271
+
1. The SharePoint indexer will access the SharePoint content as the signed-in user. The user that logs in during this step will be that signed-in user. So, if you sign in with a user account that doesn’t have access to a document in the Document Library that you want to index, the indexer won’t have access to that document.
272
272
273
273
If possible, we recommend creating a new user account and giving that new user the exact permissions that you want the indexer to have.
274
274
@@ -326,7 +326,7 @@ api-key: [admin key]
326
326
327
327
## Updating the data source
328
328
329
-
If there are no updates to the data source object, the indexer can run on a schedule without any user interaction. However, every time the Azure Cognitive Search data source object is updated, you will need to sign in again in order for the indexer to run. For example, if you change the data source query, sign in again using the `https://microsoft.com/devicelogin` and a new code.
329
+
If there are no updates to the data source object, the indexer can run on a schedule without any user interaction. However, every time the Azure Cognitive Search data source object is updated, you'll need to sign in again in order for the indexer to run. For example, if you change the data source query, sign in again using the `https://microsoft.com/devicelogin` and a new code.
330
330
331
331
Once the data source has been updated, follow the below steps:
332
332
@@ -358,7 +358,7 @@ If you have set the indexer to index document metadata (`"dataToExtract": "conte
358
358
359
359
| Identifier | Type | Description |
360
360
| ------------- | -------------- | ----------- |
361
-
| metadata_spo_site_library_item_id | Edm.String | The combination key of site ID, library ID and item ID which uniquely identifies an item in a document library for a site. |
361
+
| metadata_spo_site_library_item_id | Edm.String | The combination key of site ID, library ID, and item ID which uniquely identifies an item in a document library for a site. |
362
362
| metadata_spo_site_id | Edm.String | The ID of the SharePoint site. |
363
363
| metadata_spo_library_id | Edm.String | The ID of document library. |
364
364
| metadata_spo_item_id | Edm.String | The ID of the (document) item in the library. |
@@ -407,9 +407,9 @@ The "name" property is required and must be one of three values:
407
407
408
408
| Value | Description |
409
409
|-|-|
410
-
| defaultSiteLibrary | Index all the content from the sites default document library. |
411
-
| allSiteLibraries | Index all the content from all the document libraries in a site. This will not index document libraries from a subsite. Those can be specified in the "query" though. |
412
-
| useQuery | Only index content defined in the "query". |
410
+
| defaultSiteLibrary | Index all content from the site's default document library. |
411
+
| allSiteLibraries | Index all content from all document libraries in a site. Document libraries from a subsite are out of scope/ If you need content from subsites, choose "useQuery" and specify "includeLibrariesInSite". |
412
+
| useQuery | Only index the content defined in the "query". |
413
413
414
414
<aname="query"></a>
415
415
@@ -423,14 +423,14 @@ The "query" parameter of the data source is made up of keyword/value pairs. The
423
423
| Keyword | Value description and examples |
424
424
| ------- | ------------------------ |
425
425
| null | If null or empty, index either the default document library or all document libraries depending on the container name. <br><br>Example: <br><br>``` "container" : { "name" : "defaultSiteLibrary", "query" : null } ```|
426
-
| includeLibrariesInSite | Index content from all libraries under the specified site in the connection string. These are limited to subsites of your site. The value should be the URI of the site or subsite. <br><br>Example: <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/mysite" }```|
427
-
| includeLibrary | Index all content from this library. The value is the fully-qualified path to the library, which can be copied from your browser: <br><br>Example 1 (fully-qualified path): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary" }``` <br><br>Example 2 (URI copied from your browser): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" }```|
428
-
| excludeLibrary |Do not index content from this library. The value is the fully-qualified path to the library, which can be copied from your browser: <br><br> Example 1 (fully-qualified path): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mysite.sharepoint.com/subsite1; excludeLibrary=https://mysite.sharepoint.com/subsite1/MyDocumentLibrary" }``` <br><br> Example 2 (URI copied from your browser): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/teams/mysite; excludeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" }```|
426
+
| includeLibrariesInSite | Index content from all libraries under the specified site in the connection string. The scope includes any subsites of your site. The value should be the URI of the site or subsite. <br><br>Example: <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/mysite" }```|
427
+
| includeLibrary | Index all content from this library. The value is the fullyqualified path to the library, which can be copied from your browser: <br><br>Example 1 (fullyqualified path): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary" }``` <br><br>Example 2 (URI copied from your browser): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" }```|
428
+
| excludeLibrary |Don't index content from this library. The value is the fullyqualified path to the library, which can be copied from your browser: <br><br> Example 1 (fullyqualified path): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mysite.sharepoint.com/subsite1; excludeLibrary=https://mysite.sharepoint.com/subsite1/MyDocumentLibrary" }``` <br><br> Example 2 (URI copied from your browser): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrariesInSite=https://mycompany.sharepoint.com/teams/mysite; excludeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx" }```|
429
429
| additionalColumns | Index columns from the document library. The value is a comma-separated list of column names you want to index. Use a double backslash to escape semicolons and commas in column names: <br><br> Example 1 (additionalColumns=MyCustomColumn,MyCustomColumn2): <br><br>```"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/mysite/MyDocumentLibrary;additionalColumns=MyCustomColumn,MyCustomColumn2" }``` <br><br> Example 2 (escape characters using double backslash): <br><br> ```"container" : { "name" : "useQuery", "query" : "includeLibrary=https://mycompany.sharepoint.com/teams/mysite/MyDocumentLibrary/Forms/AllItems.aspx;additionalColumns=MyCustomColumnWith\\,,MyCustomColumnWith\\;" }```|
430
430
431
431
## Handling errors
432
432
433
-
By default, the SharePoint indexer stops as soon as it encounters a document with an unsupported content type (for example, an image). You can of course use the `excludedFileNameExtensions` parameter to skip certain content types. However, you may need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the `failOnUnsupportedContentType` configuration parameter to false:
433
+
By default, the SharePoint indexer stops as soon as it encounters a document with an unsupported content type (for example, an image). You can use the `excludedFileNameExtensions` parameter to skip certain content types. However, you may need to index documents without knowing all the possible content types in advance. To continue indexing when an unsupported content type is encountered, set the `failOnUnsupportedContentType` configuration parameter to false:
434
434
435
435
```http
436
436
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2020-06-30-Preview
0 commit comments