|
24 | 24 | "+ [Azure subscription](https://Azure.Microsoft.com/subscription/free)\n", |
25 | 25 | "+ [Azure Cognitive Search service](https://docs.microsoft.com/azure/search/search-create-service-portal) (get the full service endpoint and an admin API key)\n", |
26 | 26 | "+ [Azure Blob storage service](https://docs.microsoft.com/azure/storage/common/storage-account-create) (get the connection string)\n", |
27 | | - "+ [Azure Cognitive Services](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account) (get the account name)\n", |
28 | 27 | "+ [Python 3.6+](https://www.python.org/downloads/)\n", |
29 | 28 | "+ [Jupyter Notebook](https://jupyter.org/install)\n", |
30 | | - "+ [Visual Studio Code](https://code.visualstudio.com/download) with the [Azure Functions extension](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurefunctions) and the [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python)\n" |
| 29 | + "+ [Visual Studio Code](https://code.visualstudio.com/download) with the [Azure Functions extension](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurefunctions) and the [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python)\n", |
| 30 | + "\n", |
| 31 | + "If you adapt this exercise to include more image files, add [Azure Cognitive Services](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account)." |
31 | 32 | ] |
32 | 33 | }, |
33 | 34 | { |
|
69 | 70 | "\n", |
70 | 71 | "# Replace with a full search service endpoint the format \"https://searchservicename.search.windows.net\"\n", |
71 | 72 | "# Paste in an admin API key. Both values can be obtained from the Azure portal.\n", |
72 | | - "search_service = \"https://<YOUR-SEARCH-SERVICE>.search.windows.net\"\n", |
73 | | - "api_key = '<YOUR-ADMIN-API-KEY>'\n", |
| 73 | + "search_service = \"https://<YOUR-SEARCH-SERVICE-NAME>.search.windows.net\"\n", |
| 74 | + "api_key = '<YOUR-SEARCH-ADMIN-API-KEY>'\n", |
74 | 75 | "\n", |
75 | 76 | "# Leave the API version and content_type as they are listed here.\n", |
76 | 77 | "api_version = '2020-06-30'\n", |
77 | 78 | "content_type = 'application/json'\n", |
78 | 79 | "\n", |
79 | 80 | "# Replace with a Cognitive Services account name and all-in-one key.\n", |
80 | | - "cog_svcs_key = '' #Required only if processing more than 20 documents\n", |
81 | | - "cog_svcs_acct = '<YOUR-COGNITIVE-SERVICE-ACCOUNT-NAME>'\n", |
| 81 | + "# Required only if processing more than 20 documents\n", |
| 82 | + "cog_svcs_key = '' \n", |
| 83 | + "cog_svcs_acct = '' \n", |
82 | 84 | "\n", |
83 | | - "# Your Azure Storage account will be used for the datasource, knowledge store and cache\n", |
| 85 | + "# Your Azure Storage account will be used for the datasource input and knowledge store output\n", |
84 | 86 | "# Replace with a connection string to your Azure Storage account. \n", |
85 | | - "STORAGECONNSTRING = \"DefaultEndpointsProtocol=https;AccountName=<YOUR-ACCOUNT>;AccountKey=<YOUR-ACCOUNT-KEY>;EndpointSuffix=core.windows.net\"\n", |
86 | | - "# Replace with the blob container containing your image files\n", |
| 87 | + "STORAGECONNSTRING = \"DefaultEndpointsProtocol=https;AccountName=<YOUR-STORAGE-ACCOUNT>;AccountKey=<YOUR-ACCOUNT-KEY>;EndpointSuffix=core.windows.net\"\n", |
| 88 | + "# Replace with the blob container containing your image file\n", |
87 | 89 | "datasource_container = 'bfr-sample' \n", |
88 | | - "# Use the same storage account for knowledge store and indexer cache. The knowledge store will contain the projected images\n", |
89 | | - "know_store_cache = STORAGECONNSTRING\n", |
90 | 90 | "# Container where the sliced images will be projected to. Use the value provided below.\n", |
91 | 91 | "know_store_container = \"obfuscated\"\n", |
92 | 92 | "\n", |
93 | 93 | "# Replace with the Function HTTP URL of the app deployed to Azure Function\n", |
94 | | - "skill_uri = \"<YOUR-FUNCTION-APP-URL>\"" |
| 94 | + "skill_uri = \"<YOUR-FUNCTION-APP-URL\"" |
95 | 95 | ] |
96 | 96 | }, |
97 | 97 | { |
|
165 | 165 | "source": [ |
166 | 166 | "#### Create the skillset\n", |
167 | 167 | "\n", |
168 | | - "Besides skills, a skillset also specifies the knowledge store that will contain the final output." |
| 168 | + "Binary image references are passed as inputs and outputs, starting with \"/document/normalized_images/*\" in the OCR skill. OCR output is text and layout. Only the text component is passed to PIIDectection for analysis and redactive formatting. In the custom skill, the image is sliced into component parts (text and layout from OCR, and PII entity created in the PIIDetection step).\n", |
| 169 | + "\n", |
| 170 | + "Besides skills, a skillset also specifies the knowledge store projections that shape the final output in Blob storage." |
169 | 171 | ] |
170 | 172 | }, |
171 | 173 | { |
|
358 | 360 | "source": [ |
359 | 361 | "#### Create the index\n", |
360 | 362 | "\n", |
361 | | - "This exercise doesn't have steps for using the index, but having an index is an indexer requirement. You can use Search Explorer in the Azure portal to query the index on your own." |
| 363 | + "A search index isn't used in this exercise, but because it's an indexer requirement, you'll create one anyway. You can use Search Explorer in the Azure portal to query the index on your own. It will contain text extracted from the image." |
362 | 364 | ] |
363 | 365 | }, |
364 | 366 | { |
|
531 | 533 | "source": [ |
532 | 534 | "#### Create the indexer\n", |
533 | 535 | "\n", |
534 | | - "The indexer connects to the data source, invokes the skillset, and outputs results. This indexer is scheduled to run every two hours. In the following step, you'll run the indexer to start the process immediately." |
| 536 | + "This step creates the index (you'll run it in a separate step). At run time, the indexer connects to the data source, invokes the skillset, and outputs results. This indexer is scheduled to run every two hours. " |
535 | 537 | ] |
536 | 538 | }, |
537 | 539 | { |
|
577 | 579 | " \"sourceFieldName\": \"/document/normalized_images/*/text\",\n", |
578 | 580 | " \"targetFieldName\": \"image_text\"\n", |
579 | 581 | " }\n", |
580 | | - " ],\n", |
581 | | - " \"cache\": {\n", |
582 | | - " \"enableReprocessing\": True,\n", |
583 | | - " \"storageConnectionString\": f'{know_store_cache}'\n", |
584 | | - " }\n", |
| 582 | + " ]\n", |
585 | 583 | "}\n", |
586 | 584 | "r = requests.post(construct_Url(search_service, \"indexers\", None, None, api_version), data=json.dumps(indexer_def), headers=headers)\n", |
587 | 585 | "print(r)\n", |
|
593 | 591 | "cell_type": "markdown", |
594 | 592 | "metadata": {}, |
595 | 593 | "source": [ |
596 | | - "#### Run the indexer" |
| 594 | + "#### Run the indexer\n", |
| 595 | + "\n", |
| 596 | + "This step executes the indexer you just created. It will take several minutes to process." |
597 | 597 | ] |
598 | 598 | }, |
599 | 599 | { |
|
608 | 608 | "#print(json.dumps(res, indent=2))" |
609 | 609 | ] |
610 | 610 | }, |
| 611 | + { |
| 612 | + "cell_type": "markdown", |
| 613 | + "metadata": {}, |
| 614 | + "source": [ |
| 615 | + "#### Check status\n", |
| 616 | + "\n", |
| 617 | + "The final step in this exercise is to view results. Before doing so, make sure the lastResult status message indicates \"success\", which means that the indexer completed its work successfully, and the revised image now exists in blob storage." |
| 618 | + ] |
| 619 | + }, |
| 620 | + { |
| 621 | + "cell_type": "code", |
| 622 | + "execution_count": null, |
| 623 | + "metadata": {}, |
| 624 | + "outputs": [], |
| 625 | + "source": [ |
| 626 | + "r = requests.get(construct_Url(search_service, \"indexers\", indexername, \"status\", api_version), data=None, headers=headers)\n", |
| 627 | + "print(r)\n", |
| 628 | + "res = r.json()\n", |
| 629 | + "print(res[\"lastResult\"])" |
| 630 | + ] |
| 631 | + }, |
611 | 632 | { |
612 | 633 | "cell_type": "markdown", |
613 | 634 | "metadata": {}, |
614 | 635 | "source": [ |
615 | 636 | "### View Results\n", |
616 | | - "The following cell downloads the output image so that you can verify skillset success." |
| 637 | + "The following cell downloads the output image so that you can verify skillset success. If you get an error, check the indexer status to make sure the indexer is finished and that there were no errors." |
617 | 638 | ] |
618 | 639 | }, |
619 | 640 | { |
|
638 | 659 | " if(count == 3):\n", |
639 | 660 | " break\n", |
640 | 661 | "\n", |
641 | | - "Image(filename='image2.jpg') " |
| 662 | + "Image(filename='image0.jpg') " |
642 | 663 | ] |
643 | 664 | }, |
644 | 665 | { |
645 | 666 | "cell_type": "markdown", |
646 | 667 | "metadata": {}, |
647 | 668 | "source": [ |
648 | 669 | "### Next Steps\n", |
649 | | - "You now know how to pass images into skills and return the modified images to the skillset for further processing. \n", |
| 670 | + "In this exercise, you learned how to pass images into skills and return the modified images to the skillset for further processing. \n", |
650 | 671 | "\n", |
651 | 672 | "As a next step, you can start from scratch and build a [custom AML Skill](https://docs.microsoft.com/azure/search/cognitive-search-aml-skill) to perform inferences on images, or use the Custom Vision service to build a skill. The power skills github repository has a [sample custom vision skill](https://github.com/Azure-Samples/azure-search-power-skills/tree/master/Vision/CustomVision) to help you get started." |
652 | 673 | ] |
|
0 commit comments