|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "810ce279", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Document-level access example using the push document APIs\n", |
| 9 | + "\n", |
| 10 | + "In Azure AI Search, you can upload any JSON document payload to a search index for indexing. This notebook shows you how index documents that contain [user access permissions at the document level](azure/search/search-document-level-access-overview), and then query the index to return only those results that the user is authorized to view.\n", |
| 11 | + "\n", |
| 12 | + "The security principal behind the query access token determines the \"user\". The permission metadata in the document determines whether the user has authorization to the content. Internally, the search engine filters out any documents that aren't associated with the security principal.\n", |
| 13 | + "\n", |
| 14 | + "This feature is currently in preview.\n", |
| 15 | + "\n", |
| 16 | + "For an alternative approaching using indexers and pull API, see [Quickstart-Document-Permissions-Pull-API](../Quickstart-Document-Permissions-Pull-API/document-permissions-pull-api.ipynb).\n" |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "markdown", |
| 21 | + "id": "b6585426", |
| 22 | + "metadata": {}, |
| 23 | + "source": [ |
| 24 | + "## Prerequisites\n", |
| 25 | + "\n", |
| 26 | + "+ Azure AI Search, with [role-based access control](https://learn.microsoft.com/azure/search/search-security-enable-roles).\n", |
| 27 | + "\n", |
| 28 | + "## Permissions\n", |
| 29 | + "\n", |
| 30 | + "This walkthrough uses Microsoft Entra ID authentication and authorization.\n", |
| 31 | + "\n", |
| 32 | + "On Azure AI Search, you must have role assignments to create objects and run queries:\n", |
| 33 | + "\n", |
| 34 | + "+ **Search Service Contributor**\n", |
| 35 | + "+ **Search Index Data Contributor**\n", |
| 36 | + "+ **Search Index Data Reader**\n", |
| 37 | + "\n", |
| 38 | + "For more information, see [Connect to Azure AI Search using roles](https://learn.microsoft.com/azure/search/search-security-rbac) and [Quickstart: Connect without keys for local testing](https://learn.microsoft.com/azure/search/search-get-started-rbac).\n", |
| 39 | + "\n", |
| 40 | + "## Set the environment variables\n", |
| 41 | + "\n", |
| 42 | + "1. Rename `sample.env` to `.env`.\n", |
| 43 | + "1. In the `.env` file, provide a full endpoint to your search service (https://your-search-service.search.windows.net).\n", |
| 44 | + "1. Replace the default index name if you want a different name.\n", |
| 45 | + "\n", |
| 46 | + "## Load Connections\n", |
| 47 | + "\n", |
| 48 | + "We recommend creating a virtual environment to run this sample code. In Visual Studio Code, open the control palette (ctrl-shift-p) to create an environment. This notebook was tested on Python 3.10.\n", |
| 49 | + "\n", |
| 50 | + "Once your environment is created, load the environment variables." |
| 51 | + ] |
| 52 | + }, |
| 53 | + { |
| 54 | + "cell_type": "code", |
| 55 | + "execution_count": null, |
| 56 | + "id": "2975a7f5", |
| 57 | + "metadata": {}, |
| 58 | + "outputs": [], |
| 59 | + "source": [ |
| 60 | + "from dotenv import load_dotenv\n", |
| 61 | + "from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n", |
| 62 | + "import os\n", |
| 63 | + "\n", |
| 64 | + "load_dotenv(override=True) # take environment variables from .env.\n", |
| 65 | + "\n", |
| 66 | + "# The following variables from your .env file are used in this notebook\n", |
| 67 | + "endpoint = os.environ[\"AZURE_SEARCH_ENDPOINT\"]\n", |
| 68 | + "credential = DefaultAzureCredential()\n", |
| 69 | + "index_name = os.getenv(\"AZURE_SEARCH_INDEX\")\n", |
| 70 | + "token_provider = get_bearer_token_provider(credential, \"https://search.azure.com/.default\")\n" |
| 71 | + ] |
| 72 | + }, |
| 73 | + { |
| 74 | + "cell_type": "markdown", |
| 75 | + "id": "9327cf01", |
| 76 | + "metadata": {}, |
| 77 | + "source": [ |
| 78 | + "## Create Sample Index\n", |
| 79 | + "\n", |
| 80 | + "The search index must includes fields for your content and for permission metadata. Assign the new permission filter option to a string field and make sure the field is filterable. The search engine builds the filter internally at query time." |
| 81 | + ] |
| 82 | + }, |
| 83 | + { |
| 84 | + "cell_type": "code", |
| 85 | + "execution_count": null, |
| 86 | + "id": "9863061f", |
| 87 | + "metadata": {}, |
| 88 | + "outputs": [], |
| 89 | + "source": [ |
| 90 | + "from azure.search.documents.indexes.models import SearchField, SearchIndex, PermissionFilter, SearchIndexPermissionFilterOption\n", |
| 91 | + "from azure.search.documents.indexes import SearchIndexClient\n", |
| 92 | + "\n", |
| 93 | + "index_client = SearchIndexClient(endpoint=endpoint, credential=credential)\n", |
| 94 | + "index = SearchIndex(\n", |
| 95 | + " name=index_name,\n", |
| 96 | + " fields=[\n", |
| 97 | + " SearchField(name=\"id\", type=\"Edm.String\", key=True, filterable=True, sortable=True),\n", |
| 98 | + " SearchField(name=\"oid\", type=\"Collection(Edm.String)\", retrievable=True, filterable=True, permission_filter=PermissionFilter.USER_IDS),\n", |
| 99 | + " SearchField(name=\"group\", type=\"Collection(Edm.String)\", retrievable=True, filterable=True, permission_filter=PermissionFilter.GROUP_IDS),\n", |
| 100 | + " SearchField(name=\"name\", type=\"Edm.String\", searchable=True)\n", |
| 101 | + " ],\n", |
| 102 | + " permission_filter_option=SearchIndexPermissionFilterOption.ENABLED\n", |
| 103 | + ")\n", |
| 104 | + "\n", |
| 105 | + "index_client.create_index(index=index)\n", |
| 106 | + "print(f\"Index '{index_name}' created with permission filter option enabled.\")" |
| 107 | + ] |
| 108 | + }, |
| 109 | + { |
| 110 | + "cell_type": "markdown", |
| 111 | + "id": "f5cf4169", |
| 112 | + "metadata": {}, |
| 113 | + "source": [ |
| 114 | + "## Connect to Graph to find your object ID (OID) and groups\n", |
| 115 | + "\n", |
| 116 | + "This step calls the Graph APIs to get a few group IDs for your Microsoft Entra identity. Your group IDs will be added to the access control list of the objects created in the next step." |
| 117 | + ] |
| 118 | + }, |
| 119 | + { |
| 120 | + "cell_type": "code", |
| 121 | + "execution_count": null, |
| 122 | + "id": "63904f09", |
| 123 | + "metadata": {}, |
| 124 | + "outputs": [], |
| 125 | + "source": [ |
| 126 | + "from msgraph import GraphServiceClient\n", |
| 127 | + "client = GraphServiceClient(credentials=credential, scopes=[\"https://graph.microsoft.com/.default\"])\n", |
| 128 | + "\n", |
| 129 | + "groups = await client.me.member_of.get()\n", |
| 130 | + "me = await client.me.get()\n", |
| 131 | + "oid = me.id" |
| 132 | + ] |
| 133 | + }, |
| 134 | + { |
| 135 | + "cell_type": "markdown", |
| 136 | + "id": "a9ce6d0f", |
| 137 | + "metadata": {}, |
| 138 | + "source": [ |
| 139 | + "## Upload Sample Data\n", |
| 140 | + "\n", |
| 141 | + "This step creates the container, folders, and uploads documents into Azure Storage. It assigns your group IDs to to the access control list for each file." |
| 142 | + ] |
| 143 | + }, |
| 144 | + { |
| 145 | + "cell_type": "code", |
| 146 | + "execution_count": null, |
| 147 | + "id": "8fb830a1", |
| 148 | + "metadata": {}, |
| 149 | + "outputs": [], |
| 150 | + "source": [ |
| 151 | + "from azure.search.documents import SearchClient\n", |
| 152 | + "search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)\n", |
| 153 | + "\n", |
| 154 | + "documents = [\n", |
| 155 | + " { \"id\": \"1\", \"oid\": [oid], \"group\": [groups.value[0].id], \"name\": \"Document 1\" },\n", |
| 156 | + " { \"id\": \"2\", \"oid\": [\"all\"], \"group\": [groups.value[0].id], \"name\": \"Document 2\" },\n", |
| 157 | + " { \"id\": \"3\", \"oid\": [oid], \"group\": [\"all\"], \"name\": \"Document 3\" },\n", |
| 158 | + " { \"id\": \"4\", \"oid\": [\"none\"], \"group\": [\"none\"], \"name\": \"Document 4\" },\n", |
| 159 | + " { \"id\": \"5\", \"oid\": [\"none\"], \"group\": [groups.value[0].id], \"name\": \"Document 5\" },\n", |
| 160 | + "]\n", |
| 161 | + "search_client.upload_documents(documents=documents)\n", |
| 162 | + "print(\"Documents uploaded to the index.\")\n" |
| 163 | + ] |
| 164 | + }, |
| 165 | + { |
| 166 | + "cell_type": "markdown", |
| 167 | + "id": "e5c93f76", |
| 168 | + "metadata": {}, |
| 169 | + "source": [ |
| 170 | + "## Search sample data with x-ms-query-source-authorization\n", |
| 171 | + "\n", |
| 172 | + "This query uses an empty search string (`*`) to provide an unqualified search. It returns the file name and permission metadata associated with each file. Notice that each file is associated with a different group ID." |
| 173 | + ] |
| 174 | + }, |
| 175 | + { |
| 176 | + "cell_type": "code", |
| 177 | + "execution_count": null, |
| 178 | + "id": "cd872e8c", |
| 179 | + "metadata": {}, |
| 180 | + "outputs": [], |
| 181 | + "source": [ |
| 182 | + "results = search_client.search(search_text=\"*\", x_ms_query_source_authorization=token_provider(), select=\"name,oid,group\", order_by=\"id asc\")\n", |
| 183 | + "\n", |
| 184 | + "for result in results:\n", |
| 185 | + " print(f\"Name: {result['name']}, OID: {result['oid']}, Group: {result['group']}\")" |
| 186 | + ] |
| 187 | + }, |
| 188 | + { |
| 189 | + "cell_type": "markdown", |
| 190 | + "id": "d31b67d8", |
| 191 | + "metadata": {}, |
| 192 | + "source": [ |
| 193 | + "## Search sample data without x-ms-query-source-authorization \n", |
| 194 | + "\n", |
| 195 | + "This step demonstrates the user experience when authorization fails. No results are returned in the response." |
| 196 | + ] |
| 197 | + }, |
| 198 | + { |
| 199 | + "cell_type": "code", |
| 200 | + "execution_count": null, |
| 201 | + "id": "a1f2f2a0", |
| 202 | + "metadata": {}, |
| 203 | + "outputs": [], |
| 204 | + "source": [ |
| 205 | + "results = search_client.search(search_text=\"*\", x_ms_query_source_authorization=None, select=\"name,oid,group\", order_by=\"id asc\")\n", |
| 206 | + "\n", |
| 207 | + "for result in results:\n", |
| 208 | + " print(f\"Name: {result['name']}, OID: {result['oid']}, Group: {result['group']}\")" |
| 209 | + ] |
| 210 | + }, |
| 211 | + { |
| 212 | + "cell_type": "markdown", |
| 213 | + "id": "5ad253ec", |
| 214 | + "metadata": {}, |
| 215 | + "source": [ |
| 216 | + "## Next steps\n", |
| 217 | + "\n", |
| 218 | + "To learn more, see [Document-level access control in Azure AI Search](https://learn.microsoft.com/azure/search/search-document-level-access-overview)." |
| 219 | + ] |
| 220 | + } |
| 221 | + ], |
| 222 | + "metadata": { |
| 223 | + "kernelspec": { |
| 224 | + "display_name": ".venv", |
| 225 | + "language": "python", |
| 226 | + "name": "python3" |
| 227 | + }, |
| 228 | + "language_info": { |
| 229 | + "codemirror_mode": { |
| 230 | + "name": "ipython", |
| 231 | + "version": 3 |
| 232 | + }, |
| 233 | + "file_extension": ".py", |
| 234 | + "mimetype": "text/x-python", |
| 235 | + "name": "python", |
| 236 | + "nbconvert_exporter": "python", |
| 237 | + "pygments_lexer": "ipython3", |
| 238 | + "version": "3.12.10" |
| 239 | + } |
| 240 | + }, |
| 241 | + "nbformat": 4, |
| 242 | + "nbformat_minor": 5 |
| 243 | +} |
0 commit comments