|
5 | 5 | "id": "aba4346f", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | | - "## Document Permissions in Azure AI Search" |
| 8 | + "# Document level access in Azure AI Search using the indexer pull APIs\n", |
| 9 | + "\n", |
| 10 | + "In Azure AI Search, you can use an indexer to pull content into a search index for indexing. This notebook shows you how index blobs that have access control lists (ACLs) in Azure Storage Data Lake Storage (ADLS) Gen2, and then query the index to return only those results that the user is authorized to view.\n", |
| 11 | + "\n", |
| 12 | + "The security principal behind the query access token determines the \"user\". The ACLs on folders and files determine whether the user has authorization to the content, and that metadata is pulled into the index along with document content. Internally, the search engine filters out any documents that aren't associated with the security principal.\n", |
| 13 | + "\n", |
| 14 | + "This feature is currently in preview.\n", |
| 15 | + "\n", |
| 16 | + "For an alternative approaching using push APIs to index any data, see [Quickstart-Document-Permissions-Push-API](../Quickstart-Document-Permissions-Push-API/document-permissions-push-api.ipynb).\n", |
| 17 | + "\n", |
| 18 | + "\n", |
| 19 | + "## Prerequisites\n", |
| 20 | + "\n", |
| 21 | + "+ Azure AI Search, basic tier or higher, with a [system-assigned managed identity](https://learn.microsoft.com/azure/search/search-howto-managed-identities-data-sources) and [role-based access control](https://learn.microsoft.com/azure/search/search-security-enable-roles).\n", |
| 22 | + "\n", |
| 23 | + "+ Azure Storage, general purpose account, with a [hierarchical namespace](https://learn.microsoft.com/azure/storage/blobs/create-data-lake-storage-account).\n", |
| 24 | + "\n", |
| 25 | + "+ Folders and files, where each file has an [access control list specified](https://learn.microsoft.com/azure/storage/blobs/data-lake-storage-access-control). We recommend group IDs.\n", |
| 26 | + "\n", |
| 27 | + "We recommend creating a virtual environment to run this sample code. In Visual Studio Code, open the control palette (ctrl-shift-p) to create an environment. This notebook was tested on Python 3.10.\n", |
| 28 | + "\n", |
| 29 | + "## Permissions\n", |
| 30 | + "\n", |
| 31 | + "+ On Azure Storage, **Storage Blob Data Reader** permissions are required for both the search service identity and for your user account since you are testing locally. You also need **Storage Blob Data Contributor**. This sample includes code for creating and configuring a container and blobs used in this demonstration.\n", |
| 32 | + "\n", |
| 33 | + "+ On Azure AI Search, assign yourself **Search Service Contributor**, **Search Index Data Contributor**, and **Search Index Data Reader** permissions to create objects and run queries. For more information, see [Connect to Azure AI Search using roles](https://learn.microsoft.com/azure/search/search-security-rbac) and [Quickstart: Connect without keys for local testing](https://learn.microsoft.com/azure/search/search-get-started-rbac).\n", |
| 34 | + "\n", |
| 35 | + "## Limitations\n", |
| 36 | + "\n", |
| 37 | + "+ Parsing indexer options aren't currently supported." |
9 | 38 | ] |
10 | 39 | }, |
11 | 40 | { |
12 | 41 | "cell_type": "markdown", |
13 | 42 | "id": "f445040a", |
14 | 43 | "metadata": {}, |
15 | 44 | "source": [ |
16 | | - "## 1. Load Connections" |
| 45 | + "## Set up connections\n", |
| 46 | + "\n", |
| 47 | + "Save the `sample.env` file as `.env` and then modify the environment variables to use your Azure endpoints. You need endpoints for:\n", |
| 48 | + "\n", |
| 49 | + "+ Azure AI Search\n", |
| 50 | + "+ Azure Storage\n", |
| 51 | + "\n", |
| 52 | + "For Azure AI Search, find the endpoint in the [Azure portal](https://portal.azure.com), in the **Essentials** section of the Overview page.\n", |
| 53 | + "\n", |
| 54 | + "For Azure Storage, follow the guidance in [Get storage account configuration information](https://learn.microsoft.com/azure/storage/common/storage-account-get-info).\n", |
| 55 | + "\n", |
| 56 | + "## Load Connections\n", |
| 57 | + "\n", |
| 58 | + "Load the environment variables to set up connections and object names." |
17 | 59 | ] |
18 | 60 | }, |
19 | 61 | { |
|
32 | 74 | "# The following variables from your .env file are used in this notebook\n", |
33 | 75 | "endpoint = os.environ[\"AZURE_SEARCH_ENDPOINT\"]\n", |
34 | 76 | "credential = DefaultAzureCredential()\n", |
35 | | - "index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"document-permissions-sample\")\n", |
36 | | - "indexer_name = os.getenv(\"AZURE_SEARCH_INDEXER\", \"document-permissions-sample-indexer\")\n", |
37 | | - "datasource_name = os.getenv(\"AZURE_SEARCH_DATASOURCE\", \"document-permissions-sample-datasource\")\n", |
38 | | - "adls_gen2_account_name = os.getenv(\"AZURE_STORAGE_ACCOUNT_NAME\", \"documentpermissionssample\")\n", |
39 | | - "adls_gen2_container_name = os.getenv(\"AZURE_STORAGE_CONTAINER_NAME\", \"documentpermissionssample\")\n", |
| 77 | + "index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"document-permissions-indexer-idx\")\n", |
| 78 | + "indexer_name = os.getenv(\"AZURE_SEARCH_INDEXER\", \"document-permissions-indexer-idxr\")\n", |
| 79 | + "datasource_name = os.getenv(\"AZURE_SEARCH_DATASOURCE\", \"document-permissions-indexer-ds\")\n", |
| 80 | + "adls_gen2_account_name = os.getenv(\"AZURE_STORAGE_ACCOUNT_NAME\")\n", |
| 81 | + "adls_gen2_container_name = os.getenv(\"AZURE_STORAGE_CONTAINER_NAME\")\n", |
40 | 82 | "adls_gen2_connection_string = os.environ[\"AZURE_STORAGE_CONNECTION_STRING\"]\n", |
41 | 83 | "adls_gen2_resource_id = os.environ[\"AZURE_STORAGE_RESOURCE_ID\"]\n", |
42 | 84 | "token_provider = get_bearer_token_provider(credential, \"https://search.azure.com/.default\")" |
|
47 | 89 | "id": "2d46b940", |
48 | 90 | "metadata": {}, |
49 | 91 | "source": [ |
50 | | - "## 2. Create Index" |
| 92 | + "## Create an index\n", |
| 93 | + "\n", |
| 94 | + "The search index must includes fields for your content and for permission metadata. Assign the new permission filter option to a string filter and make sure the field is filterable. \n", |
| 95 | + "\n", |
| 96 | + "For local testing, `retrievable` can be **true**, but be sure to change it back to **false** if you make the solution available to others." |
51 | 97 | ] |
52 | 98 | }, |
53 | 99 | { |
|
66 | 112 | " fields=[\n", |
67 | 113 | " SearchField(name=\"id\", type=\"Edm.String\", key=True, filterable=True, sortable=True),\n", |
68 | 114 | " SearchField(name=\"content\", type=\"Edm.String\", searchable=True, filterable=False, sortable=False),\n", |
69 | | - " SearchField(name=\"oids\", type=\"Collection(Edm.String)\", filterable=True, permission_filter=PermissionFilter.USER_IDS),\n", |
70 | | - " SearchField(name=\"groups\", type=\"Collection(Edm.String)\", filterable=True, permission_filter=PermissionFilter.GROUP_IDS),\n", |
| 115 | + " SearchField(name=\"oids\", type=\"Collection(Edm.String)\", filterable=True, retrievable=True, permission_filter=PermissionFilter.USER_IDS),\n", |
| 116 | + " SearchField(name=\"groups\", type=\"Collection(Edm.String)\", filterable=True, retrievable=True, permission_filter=PermissionFilter.GROUP_IDS),\n", |
71 | 117 | " SearchField(name=\"metadata_storage_path\", type=\"Edm.String\", searchable=True),\n", |
72 | 118 | " SearchField(name=\"metadata_storage_name\", type=\"Edm.String\", searchable=True)\n", |
73 | 119 | " ],\n", |
|
83 | 129 | "id": "2b8945a2", |
84 | 130 | "metadata": {}, |
85 | 131 | "source": [ |
86 | | - "## 3. Create data source" |
| 132 | + "## Create a data source\n", |
| 133 | + "\n", |
| 134 | + "Set the `IndexerPermissionOption` so that the indexer knows to retrieve the permission metadata." |
87 | 135 | ] |
88 | 136 | }, |
89 | 137 | { |
|
113 | 161 | "id": "ff5b912d", |
114 | 162 | "metadata": {}, |
115 | 163 | "source": [ |
116 | | - "## 4. Get group ids" |
| 164 | + "## Get group IDs\n", |
| 165 | + "\n", |
| 166 | + "This step calls the Graph APIs to get a few group IDs for your Microsoft Entra identity. Your group IDs will be added to the access control list of the objects created in the next step. Two group identifiers are retrieved. Each one is assigned to a different file." |
117 | 167 | ] |
118 | 168 | }, |
119 | 169 | { |
|
136 | 186 | "id": "20588dc3", |
137 | 187 | "metadata": {}, |
138 | 188 | "source": [ |
139 | | - "## 5. Upload sample directory and file" |
| 189 | + "## Upload sample directory and file\n", |
| 190 | + "\n", |
| 191 | + "This step creates the container, folders, and uploads the files into Azure Storage. It assigns your group IDs to to the access control list for each folder." |
140 | 192 | ] |
141 | 193 | }, |
142 | 194 | { |
|
179 | 231 | "id": "ca6de2ad", |
180 | 232 | "metadata": {}, |
181 | 233 | "source": [ |
182 | | - "## 6. Run indexer" |
| 234 | + "## Run the indexer\n", |
| 235 | + "\n", |
| 236 | + "Start the indexer to run all operations, from data retrieval to indexing. Any connection errors or permission problems become evident here." |
183 | 237 | ] |
184 | 238 | }, |
185 | 239 | { |
|
210 | 264 | "id": "987dd496", |
211 | 265 | "metadata": {}, |
212 | 266 | "source": [ |
213 | | - "## 7. Search sample data using x-ms-query-source-authorization " |
| 267 | + "## Search sample data using x-ms-query-source-authorization\n", |
| 268 | + "\n", |
| 269 | + "This query uses an empty search string (`*`) to provide an unqualified search. It returns the file name and permission metadata associated with each file. Notice that each file is associated with a different group ID." |
214 | 270 | ] |
215 | 271 | }, |
216 | 272 | { |
|
233 | 289 | "id": "c712ab8c", |
234 | 290 | "metadata": {}, |
235 | 291 | "source": [ |
236 | | - "## 8. Search sample data without x-ms-query-source-authorization " |
| 292 | + "## Search sample data without x-ms-query-source-authorization \n", |
| 293 | + "\n", |
| 294 | + "This step demonstrates the user experience when authorization fails. No results are returned in the response." |
237 | 295 | ] |
238 | 296 | }, |
239 | 297 | { |
|
250 | 308 | "for result in results:\n", |
251 | 309 | " print(f\"Path: {result['metadata_storage_path']}, OID: {result['oids']}, Group: {result['groups']}\")" |
252 | 310 | ] |
| 311 | + }, |
| 312 | + { |
| 313 | + "cell_type": "markdown", |
| 314 | + "id": "e1ac3c84", |
| 315 | + "metadata": {}, |
| 316 | + "source": [ |
| 317 | + "## Next steps\n", |
| 318 | + "\n", |
| 319 | + "To learn more, see [Document-level access control in Azure AI Search](https://learn.microsoft.com/azure/search/search-document-level-access-overview)." |
| 320 | + ] |
253 | 321 | } |
254 | 322 | ], |
255 | 323 | "metadata": { |
|
0 commit comments