Skip to content

Commit f55247b

Browse files
authored
Update indexing guide for OneLake files
Updated the date and clarified the requirements for using a shared private link with Azure AI Search and Microsoft Fabric workspace.
1 parent cfa72b7 commit f55247b

File tree

1 file changed

+23
-6
lines changed

1 file changed

+23
-6
lines changed

articles/search/search-how-to-index-onelake-files.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.author: gimondra
77
manager: nitinme
88
ms.service: azure-ai-search
99
ms.topic: how-to
10-
ms.date: 09/17/2025
10+
ms.date: 09/26/2025
1111
ms.custom:
1212
- build-2024
1313
- ignite-2024
@@ -88,7 +88,12 @@ The following OneLake shortcuts are supported by the OneLake files indexer:
8888

8989
+ This indexer doesn't support SQL queries, but the query used in the data source configuration is exclusively to add optionally the folder or shortcut to access.
9090

91-
+ There's no support to ingest files from **My Workspace** workspace in OneLake since this is a personal repository per user.
91+
+ There's no support to ingest files from **My Workspace** workspace in OneLake since this is a personal repository per user.
92+
93+
+ Microsoft Purview Sensitivity Labels applied via Data Map are not currently supported. If sensitivity labels are applied to artifacts in OneLake using [Microsoft Purview Data Map](/purview/data-map-sensitivity-labels-apply), the indexer may fail to execute properly. To bypass this restriction, an exception must be granted by your organization’s IT team responsible for managing Purview sensitivity labels and Data Map configurations.
94+
95+
+ Workspace role-based permissions in Microsoft OneLake may affect indexer access to files. Ensure that the Azure AI Search service principal (managed identity) has sufficient permissions over the files you intend to access in the target [Microsoft Fabric workspace](/fabric/fundamentals/workspaces).
96+
9297

9398
## Prepare data for indexing
9499

@@ -162,19 +167,23 @@ The minimum role assignment for your search service identity is Contributor.
162167

163168
:::image type="content" source="media/search-how-to-index-onelake-files/add-user-assigned-managed-identity.png" alt-text="Screenshot showing a Contributor role assignment for a search service user-assigned managed identity in the Azure portal." lightbox="media/search-how-to-index-onelake-files/add-user-assigned-managed-identity.png":::
164169

170+
## Configure a shared private link (required if using Fabric workspace-level private link)
171+
172+
If your Fabric workspace is secured with a [private link](/fabric/security/security-workspace-level-private-links-overview), Azure AI Search won't be able to access your lakehouse data over the public internet, and you won't be able to configure the indexer or its required dependencies, such as the data source. To enable access, you must configure a [shared private link](search-indexer-howto-access-private.md) between Azure AI Search and your Fabric workspace.
173+
165174
## Define the data source
166175

167-
A data source is defined as an independent resource so that it can be used by multiple indexers.
176+
A data source is defined as an independent resource so that it can be used by multiple indexers.
168177

169178
1. Use the [Create or update a data source REST API](/rest/api/searchservice/data-sources/create-or-update) to set its definition. These are the most significant steps of the definition.
170179

171180
1. Set `"type"` to `"onelake"` (required).
172181

173182
1. Get the Microsoft Fabric workspace GUID and the lakehouse GUID:
174183

175-
+ In Power BI, open the lakehouse you'd like to import data from. Notice the lakehouse URL in the browser. It should look similar to this example: "https://msit.powerbi.com/groups/00000000-0000-0000-0000-000000000000/lakehouses/11111111-1111-1111-1111-111111111111". The URL contains both the workspace GUID and the lakehouse GUID.
184+
+ In Power BI, open the lakehouse you'd like to import data from. Notice the lakehouse URL in the browser. It should look similar to this example: "https://msit.powerbi.com/groups/00000000-0000-0000-0000-000000000000/lakehouses/11111111-1111-1111-1111-111111111111". The URL contains both the workspace GUID and the lakehouse GUID. If the Fabric workspace is secured with a private link, the URL would start with "https://{FabricWorkspaceGuid}.z{xy}.blob.fabric.microsoft.com".
176185

177-
+ Copy the workspace GUID, which is listed to the right of "groups" in the URL. In this example, it would be 00000000-0000-0000-0000-000000000000. In your REST file, create an environment variable for `{FabricWorkspaceGuid}` and set it to the workspace GUID.
186+
+ Copy the workspace GUID, which is listed to the right of "groups" in the URL. In this example, it would be 00000000-0000-0000-0000-000000000000. In your REST file, create an environment variable for `{FabricWorkspaceGuid}` and set it to the workspace GUID. If your workspace uses a private link, the workspace GUID will appear in a different location in the URL. Be sure to reference the correct part of the URL based on your setup.
178187

179188
:::image type="content" source="media/search-how-to-index-onelake-files/fabric-guid.png" alt-text="Screenshot of the Fabric workspace GUID in the Azure portal." lightbox="media/search-how-to-index-onelake-files/fabric-guid.png" :::
180189

@@ -190,6 +199,14 @@ A data source is defined as an independent resource so that it can be used by mu
190199
}
191200
```
192201

202+
For your setup with [shared private link](search-indexer-howto-access-private.md), setup the managed identities using the following connection string, that varies from the setup using the internet for communication. Note that not only the URL is different, but also `WorkspaceEndpoint` is used, instead of `ResourceId`. Take this into consideration when configuring either the system-managed identity or user-managed identity setups.
203+
204+
```json
205+
"credentials": {
206+
"connectionString": "WorkspaceEndpoint=https://{FabricWorkspaceGuid}.z{xy}.blob.fabric.microsoft.com"
207+
}
208+
```
209+
193210
1. Set `"container.name"` to the lakehouse GUID, replacing `{LakehouseGuid}` with the value you copied in the previous step. Use `"query"` to optionally specify a lakehouse subfolder or shortcut.
194211

195212
```json
@@ -199,7 +216,7 @@ A data source is defined as an independent resource so that it can be used by mu
199216
}
200217
```
201218

202-
1. Set the authentication method using the user-assigned managed identity, or skip to the next step for system-managed identity.
219+
1. Set the authentication method using the user-assigned managed identity, or skip to the next step for system-managed identity.
203220

204221
```json
205222
{

0 commit comments

Comments
 (0)