Skip to content

Commit 8ebfafd

Browse files
authored
SharePoint source connector: replace SharePoint app principals with Entra ID app registrations for authentication (#467)
1 parent ded99e2 commit 8ebfafd

File tree

8 files changed

+152
-152
lines changed

8 files changed

+152
-152
lines changed
Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
- `<name>` (_required_) - A unique name for this connector.
22
- `<client-id>` (_required_) - The client ID provided by SharePoint for the app registration.
33
- `<site>` (_required_) - The base URL of the SharePoint site to connect to.
4-
- `<client-cred>` (_required_) - The client secret associated with the client ID.
4+
- `<tenant>` (_required) - The **Directory (tenant) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
5+
- `<authority-url>` - The authentication token provider URL for the Entra ID app registration. The default is https://login.microsoftonline.com.
6+
- `<user-pname>` (_required_) - The UPN for the OneDrive account in the Entra ID tenant.
7+
- `<client-cred>` (_required_) - The **Client secret** for the Entra ID app registration.
58
- `<path>` - The path from which to start parsing files. The default is `Shared Documents` if not otherwise specified.
6-
- For `recursive` (source connector only), set to `true` to recursively process data from subfolders within the specified path. The default is `false` if not otherwise specified.
9+
- For `recursive`, set to `true` to recursively process data from subfolders within the specified path. The default is `false` if not otherwise specified.

snippets/general-shared-text/sharepoint-cli-api.mdx

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,10 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d
1010

1111
The following environment variables:
1212

13-
- `SHAREPOINT_APP_CLIENT_ID` - The application (client) ID for the SharePoint app principal, represented by `--client-id` (CLI) or `client_id` (Python).
14-
- `SHAREPOINT_APP_CLIENT_SECRET` - The client secret for the SharePoint app principal, represented by `--client-cred` (CLI) or `client_cred` (Python).
15-
- `SHAREPOINT_SITE` - The SharePoint site URL, represented by `--site` (CLI) or `site` (Python).
16-
- `SHAREPOINT_PATH` - The path in the SharePoint site from which to start parsing files, represented by `--path` (CLI) or `path` (Python).
17-
18-
{/*
19-
- `SHAREPOINT_APP_PERMISSIONS_CLIENT_ID` - The associated Azure application (client) ID, represented by `--permissions-application-id` (CLI) or `permissions_application_id` (Python).
20-
- `SHAREPOINT_APP_PERMISSIONS_CLIENT_SECRET` - The client secret for the Azure application, represented by `--permissions-client-cred` (CLI) or `permissions_client_cred` (Python).
21-
- `SHAREPOINT_APP_PERMISSIONS_TENANT` - The domian name of the tenant for the Azure application, which is typically `<organization-name>.onmicrosoft.com`, and which is represented by `--permissions-tenant` (CLI) or `permissions_tenant` (Python).
22-
*/}
13+
- `ENTRA_ID_USER_PRINCIPAL_NAME` - The User Principal Name (UPN) for the target OneDrive account in the Microsoft Entra ID tenant.
14+
- `SHAREPOINT_SITE_URL` - The SharePoint site URL, represented by `--site` (CLI) or `site` (Python).
15+
- `SHAREPOINT_SITE_PATH` - The path in the SharePoint site from which to start parsing files, represented by `--path` (CLI) or `path` (Python).
16+
- `ENTRA_ID_APP_CLIENT_ID` - The **Application (client) ID** value for the Microsoft Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
17+
- `ENTRA_ID_APP_TENANT_ID` - The **Directory (tenant) ID** value for the Entra ID app registration, represented by `--client-id` (CLI) or `client_id` (Python).
18+
- `ENTRA_ID_APP_CLIENT_SECRET` - The **Client secret** value for the Entra ID app registration, represented by `--client-cred` (CLI) or `client_cred` (Python).
19+
- `ENTRA_ID_TOKEN_AUTHORITY_URL` - The token authority URL for the Entra ID app registration (which is typically `https://login.microsoftonline.com`), represented by `--authority-url` (CLI) or `authority_url` (Python).

snippets/general-shared-text/sharepoint-platform.mdx

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ Fill in the following fields:
33
- **Name** (_required_): A unique name for this connector.
44
- **Site URL** (_required_): The base URL of the SharePoint site to connect to.
55
- **Path** (_required_): The path from which to start parsing files, for example `Shared Documents`.
6-
- **Recursive** (source connector only): Check this box to recursively process data from subfolders within the specified path.
7-
- **Client ID** (_required_): The client ID provided by SharePoint for the app principal.
8-
- **Client Credentials** (_required_): The client secret associated with the client ID.
6+
- **Recursive**: Check this box to recursively process data from subfolders within the specified path.
7+
- **Client ID** (_required_): The **Application (client) ID** for the Microsoft Entra ID app registration with the correct set of Microsoft Graph access permissions.
8+
- **Tenant ID** (_required_): The **Directory (tenant) ID** for the Entra ID app registration.
9+
- **User Principal Name (UPN)** (_required_): The UPN for the OneDrive account in the Entra ID tenant.
10+
- **Client Credentials** (_required_): The **Client secret** for the Entra ID app registration.
11+
- **Authority URL** (_required_): The authentication token provider URL for the Entra ID app registration. The default is `https://login.microsoftonline.com`.
Lines changed: 103 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,109 @@
1-
<iframe
2-
width="560"
3-
height="315"
4-
src="https://www.youtube.com/embed/HHCV7rV8fS0"
5-
title="YouTube video player"
6-
frameborder="0"
7-
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
8-
allowfullscreen
9-
></iframe>
10-
11-
- The SharePoint site URL.
1+
<Note>
2+
If you are setting up the SharePoint connector for the first time, you can skip past this note.
3+
4+
Previous versions of the SharePoint connector relied on SharePoint app principals for authentication. Current versions of the
5+
SharePoint connector no longer support these SharePoint app principals. Microsoft deprecated support for Share Point app principals on November 27, 2023.
6+
SharePoint app principals will no longer work for SharePoint tenants that were created on or after November 1, 2024, and they will stop working
7+
for all SharePoint tenants as of April 2, 2026. [Learn more](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/retirement-announcement-for-azure-acs).
8+
9+
Current versions of the SharePoint connector now rely on Microsoft Entra ID app registrations for authentication.
10+
11+
To migrate from SharePoint app princpals to Entra ID app regisrations, replace the following settings in your existing SharePoint connector,
12+
as listed in the requirements following this note:
13+
14+
- Replace the deprecated SharePoint app principal's application client ID value with your replacement Entra ID app registration's **Application (client) ID** value.
15+
- Replace the deprecated SharePoint app principal's client secret value with your replacement Entra ID app registration's **Client secret** value.
16+
- Add your replacement Entra ID app registration's **Directory (tenant) ID** value, token authority URL value, and the correct set of Microsoft Graph access permissions for SharePoint Online.
17+
18+
If you need migration help, get assistance from our [Slack community](https://short.unstructured.io/pzw05l7) or [contact us](https://unstructured.io/contact) directly.
19+
</Note>
20+
21+
- A SharePoint Online plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes SharePoint Online.
22+
[Learn more](https://www.microsoft.com/en-us/microsoft-365/SharePoint/compare-SharePoint-plans).
23+
[Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
24+
[Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
25+
- A OneDrive for business plan, or a Microsoft 365 or Office 365 Business or enterprise plan that includes OneDrive.
26+
(Even if you only plan to use SharePoint Online, you still need a plan that includes OneDrive, because the SharePoint connector is built on OneDrive technology.)
27+
[Learn more](https://www.microsoft.com/microsoft-365/onedrive/compare-onedrive-plans).
28+
[Shop for business plans](https://www.microsoft.com/microsoft-365/business/compare-all-microsoft-365-business-products).
29+
[Shop for enterprise plans](https://www.microsoft.com/microsoft-365/enterprise/microsoft365-plans-and-pricing).
30+
OneDrive personal accounts, and Microsoft 365 Free, Basic, Personal, and Family plans are not supported.
31+
- The SharePoint Online and OneDrive plans must share the same Microsoft Entra ID tenant.
32+
[Learn more](https://learn.microsoft.com/microsoft-365/enterprise/subscriptions-licenses-accounts-and-tenants-for-microsoft-cloud-offerings?view=o365-worldwide).
33+
- The User Principal Name (UPN) for the OneDrive account in the Microsoft Entra ID tenant. This is typically the OneDrive account user's email address. To find a UPN:
34+
35+
1. Depending on your plan, sign in to your Microsoft 365 admin center (typically [https://admin.microsoft.com](https://admin.microsoft.com)) using your administrator credentials,
36+
or sign in to your Office 365 portal (typically [https://portal.office.com](https://portal.office.com)) using your credentials.
37+
2. In the **Users** section, click **Active users**.
38+
3. Locate the user account in the list of active users.
39+
4. The UPN is displayed in the **Username** column.
40+
41+
The following video shows how to get a UPN:
42+
43+
<iframe
44+
width="560"
45+
height="315"
46+
src="https://www.youtube.com/embed/H0yYfhfyCE0"
47+
title="YouTube video player"
48+
frameborder="0"
49+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
50+
allowfullscreen
51+
></iframe>
52+
53+
- The SharePoint Online site URL.
1254

1355
- Site collection-level URLs typically have the format `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.
1456
- Root site collection-level URLs typically have the format `https://<tenant>.sharepoint.com`.
15-
- To process all sites within a tenant, use a site URL of `https://<tenant>-admin.sharepoint.com`.
57+
- To process all sites within a SharePoint tenant, use a site URL of `https://<tenant>-admin.sharepoint.com`.
1658

1759
[Learn more](https://learn.microsoft.com/microsoft-365/community/query-string-url-tricks-sharepoint-m365).
1860

19-
- The path in the SharePoint site from which to start parsing files, for example `"Shared Documents"`. If the connector is to process all sites within the tenant, this filter will be applied to all site document libraries.
20-
- A SharePoint app principal with its application (client) ID, client secret, and the appropriate access permissions.
21-
22-
Complete the steps in the following sections, depending on whether you want to access sites at the site collection level, the
23-
root site collection level, or all sites within a tenant.
24-
25-
<Note>
26-
Two of the main factors in the following sections are the scope of access
27-
and the level of administrative permissions required to create the app principal. Tenant-wide app principals offer the broadest access
28-
but require the highest level of administrative rights, while site collection app principals are more restricted but can be created by users
29-
with lower-level permissions.
30-
</Note>
31-
32-
## Tenant-wide SharePoint app principals
33-
34-
Create a tenant-wide SharePoint app principal when you want the power and flexibility of a principal that can process all sites within a tenant.
35-
36-
SharePoint app principals that are created in the SharePoint admin center have tenant-wide scope and can potentially access all sites within the tenant.
37-
Only global or SharePoint administrators typically have access to the following URLs.
38-
39-
1. To create a tenant-wide SharePoint app principal and then get its client ID and client secret, go to the following URL:
40-
41-
`https://<tenant>-admin.sharepoint.com/_layouts/15/appregnew.aspx`
42-
43-
2. To add access permissions to a tenant-wide SharePoint app principal and then get its client ID and client secret, go to the following URL:
44-
45-
`https://<tenant>.sharepoint.com/_layouts/15/appinv.aspx`
46-
47-
3. Apply the following permissions XML to the tenant-wide SharePoint app principal:
48-
49-
```xml
50-
<AppPermissionRequests AllowAppOnlyPolicy="true">
51-
<AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
52-
</AppPermissionRequests>
53-
```
54-
Available `Right` settings include `Read`, `Write`, `Manage`, and `FullControl`. To learn more, see
55-
[Add-in permissions in SharePoint](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint).
56-
57-
[Learn how to complete these preceding steps](https://github.com/vgrem/Office365-REST-Python-Client/wiki/How-to-connect-to-SharePoint-Online-and-and-SharePoint-2013-2016-2019-on-premises--with-app-principal).
58-
Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
59-
60-
## Root site collection-level SharePoint app principals
61-
62-
Create a root site collection-level SharePoint app principal when you want a principal that can only access a root site collection, for example with a URL
63-
that has the format `https://<tenant>.sharepoint.com`.
64-
65-
SharePoint app principals that are created at the root site collection level have a scope limited to the root site collection. Site collection administrators can usually access the following URLs.
66-
67-
1. To create a root site collection-level SharePoint app principal and then get its client ID and client secret, go to the following URL:
68-
69-
`https://<tenant>.sharepoint.com/_layouts/15/appregnew.aspx`
70-
71-
2. To add access permissions to a root site collection-level SharePoint app principal, go to the following URL:
72-
73-
`https://<tenant>.sharepoint.com/_layouts/15/appinv.aspx`
74-
75-
3. Apply the following permissions XML to the root site collection-level SharePoint app principal:
76-
77-
```xml
78-
<AppPermissionRequests AllowAppOnlyPolicy="true">
79-
<AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" />
80-
</AppPermissionRequests>
81-
```
82-
83-
Available `Right` settings include `Read`, `Write`, `Manage`, and `FullControl`. To learn more, see
84-
[Add-in permissions in SharePoint](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint).
85-
86-
[Learn how to complete these preceding steps](https://github.com/vgrem/Office365-REST-Python-Client/wiki/How-to-connect-to-SharePoint-Online-and-and-SharePoint-2013-2016-2019-on-premises--with-app-principal).
87-
Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
88-
89-
## Site collection-level SharePoint app principals
90-
91-
Create a site collection-level SharePoint app principal when you want a principal that can only access a specific site collection, for example with a URL
92-
that has or starts with the format `https://<tenant>.sharepoint.com/sites/<site-collection-name>`.
93-
94-
SharePoint app principals that are created at the site collection level have the most limited scope, restricted to the specific subsite and its subsites.
95-
Site owners or those with appropriate permissions on the subsite can access the following URLs.
96-
97-
1. To create a site collection-level SharePoint app principal, go to the following URL:
98-
99-
`https://<tenant>.sharepoint.com/sites/<site-collection-name>/_layouts/15/appregnew.aspx`
100-
101-
2. To add access permissions to a site collection-level SharePoint app principal, go to the following URL:
102-
103-
`https://<tenant>.sharepoint.com/sites/<site-collection-name>/_layouts/15/appinv.aspx`
104-
105-
3. Apply the following permissions XML to the site collection-level SharePoint app principal:
106-
107-
```xml
108-
<AppPermissionRequests AllowAppOnlyPolicy="true">
109-
<AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" />
110-
</AppPermissionRequests>
111-
```
112-
113-
Available `Right` settings include `Read`, `Write`, `Manage`, and `FullControl`. To learn more, see
114-
[Add-in permissions in SharePoint](https://learn.microsoft.com/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint).
115-
116-
[Learn how to complete these preceding steps](https://github.com/vgrem/Office365-REST-Python-Client/wiki/How-to-connect-to-SharePoint-Online-and-and-SharePoint-2013-2016-2019-on-premises--with-app-principal).
117-
Be sure to substitute the URLs and XML in the linked article with the ones in these preceding steps accordingly.
61+
- The path in the SharePoint Online site from which to start parsing files, for example `"Shared Documents"`. If the SharePoint connector is to process all sites within the tenant, this filter will be applied to all site document libraries.
62+
63+
The following video shows how to get the site URL and a path within the site:
64+
65+
<iframe
66+
width="560"
67+
height="315"
68+
src="https://www.youtube.com/embed/E3fRwJU-KTc"
69+
title="YouTube video player"
70+
frameborder="0"
71+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
72+
allowfullscreen
73+
></iframe>
74+
75+
- The **Application (client) ID**, **Directory (tenant) ID**, and **Client secret** for the Microsoft Entra ID app registration with
76+
the correct set of Microsoft Graph access permissions. These permissions include:
77+
78+
- `Sites.ReadWrite.All` (if both reading and writing are needed)
79+
- `User.Read.All`
80+
[Learn more](https://learn.microsoft.com/answers/questions/2116616/service-principal-access-to-sharepoint-online).
81+
1. [Create an Entra ID app registration](https://learn.microsoft.com/entra/identity-platform/quickstart-register-app?pivots=portal).
82+
2. [Add Graph access permissions to an app registration](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#add-permissions-to-an-application).
83+
3. [Grant consent for the added Graph permissions](https://learn.microsoft.com/entra/identity-platform/howto-update-permissions?pivots=portal#grant-consent-for-the-added-permissions-for-the-enterprise-application).
84+
85+
The following video shows how to create an Entra ID app registration:
86+
87+
<iframe
88+
width="560"
89+
height="315"
90+
src="https://www.youtube.com/embed/aBAY-LKLPSo"
91+
title="YouTube video player"
92+
frameborder="0"
93+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
94+
allowfullscreen
95+
></iframe>
96+
97+
The following video shows how to add the correct set of Graph access permissions to the Entra ID app registration:
98+
99+
<iframe
100+
width="560"
101+
height="315"
102+
src="https://www.youtube.com/embed/X7fnRYyxy0Q"
103+
title="YouTube video player"
104+
frameborder="0"
105+
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
106+
allowfullscreen
107+
></iframe>
108+
109+
- The token authority URL for your Microsoft Entra ID app registration. This is typically `https://login.microsoftonline.com`

snippets/source_connectors/sharepoint.sh.mdx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,19 @@
33

44
unstructured-ingest \
55
sharepoint \
6-
--client-id $SHAREPOINT_APP_CLIENT_ID \
7-
--client-cred $SHAREPOINT_APP_CLIENT_SECRET \
8-
--site $SHAREPOINT_SITE \
9-
--path $SHAREPOINT_PATH \
10-
--no-omit-files \
11-
--omit-pages \
12-
--omit-lists \
13-
--output-dir $LOCAL_FILE_OUTPUT_DIR \
14-
--num-processes 2 \
15-
--verbose \
6+
--client-cred $ENTRA_ID_APP_CLIENT_SECRET \
7+
--client-id $ENTRA_ID_APP_CLIENT_ID \
8+
--user-pname $ENTRA_ID_USER_PRINCIPAL_NAME \
9+
--tenant $ENTRA_ID_APP_TENANT_ID \
10+
--authority-url $ENTRA_ID_TOKEN_AUTHORITY_URL \
11+
--site $SHAREPOINT_SITE_URL \
12+
--path $SHAREPOINT_SITE_PATH \
13+
--recursive \
14+
--download-dir $LOCAL_FILE_DOWNLOAD_DIR\
1615
--partition-by-api \
1716
--api-key $UNSTRUCTURED_API_KEY \
1817
--partition-endpoint $UNSTRUCTURED_API_URL \
1918
--strategy hi_res \
19+
--output-dir $LOCAL_FILE_OUTPUT_DIR \
2020
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
2121
```

0 commit comments

Comments
 (0)