Merge pull request #91105 from HeidiSteen/heidist-master

ShannonLeavitt · web-flow · commit 72b0bdae6b9e · 2019-10-13T10:22:33.000-07:00
AzS: Use AI with blobs
diff --git a/articles/search/TOC.yml b/articles/search/TOC.yml
@@ -169,7 +169,7 @@
       href: search-howto-complex-data-types.md
     - name: Model relational data
       href: index-sql-relational-data.md
-  - name: Load data
+  - name: Indexing any data
     items:
     - name: Data import overview
       href: search-what-is-data-import.md
@@ -181,7 +181,21 @@
       href: search-howto-large-index.md
     - name: Handle concurrent updates
       href: search-howto-concurrency.md
-  - name: Load data with indexers
+  - name: Indexing Azure Blob data
+    items:      
+    - name: Use AI with blob data
+      href: search-blob-ai-integration.md
+    - name: Add full text search
+      href: search-blob-storage-integration.md
+    - name: Set up a blob indexer
+      href: search-howto-indexing-azure-blob-storage.md
+    - name: Index one-to-many blobs
+      href: search-howto-index-one-to-many-blobs.md
+    - name: Index CSV blobs
+      href: search-howto-index-csv-blobs.md
+    - name: Index JSON blobs
+      href: search-howto-index-json-blobs.md
+  - name: Indexing with "indexers"
     items:
     - name: Indexers overview
       href: search-indexer-overview.md
@@ -191,16 +205,6 @@
       href: search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md
     - name: Azure Cosmos DB indexer
       href: search-howto-index-cosmosdb.md
-    - name: Azure Blob Storage indexer
-      items:      
-      - name: Set up a blob indexer
-        href: search-howto-indexing-azure-blob-storage.md
-      - name: Index one-to-many blobs
-        href: search-howto-index-one-to-many-blobs.md
-      - name: Index CSV blobs
-        href: search-howto-index-csv-blobs.md
-      - name: Index JSON blobs
-        href: search-howto-index-json-blobs.md
     - name: Schedule indexers
       href: search-howto-schedule-indexers.md
     - name: Map fields
diff --git a/articles/search/index.yml b/articles/search/index.yml
@@ -50,6 +50,22 @@ landingContent:
           - text: Introduction to Azure Search
             url: https://docs.microsoft.com/learn/modules/intro-to-azure-search/
 
+  # Card
+  - title: Use with Blob storage
+    linkLists:
+      - linkListType: how-to-guide
+        links:
+          - text: Use AI to understand blob data
+            url: search-blob-ai-integration.md
+          - text: Add full text search to blob data
+            url: search-blob-storage-integration.md
+          - text: Store AI enrichments
+            url: knowledge-store-create-portal.md
+      - linkListType: tutorial
+        links:
+          - text: Index semi-structured blob data
+            url: search-semi-structured-data.md
+
   # Card
   - title: Index your data
     linkLists:
diff --git a/articles/search/search-blob-ai-integration.md b/articles/search/search-blob-ai-integration.md
@@ -0,0 +1,107 @@
+---
+title: Use AI to understand Blob data
+titleSuffix: Azure Search
+description: Add semantic, natural language processing and image analysis to Azure blobs using an AI enrichment pipeline in Azure Search.
+
+manager: nitinme
+author: HeidiSteen
+ms.author: heidist
+ms.service: search
+ms.topic: conceptual
+ms.date: 10/09/2019
+---
+
+# Use AI to understand Blob data
+
+Data in Azure Blob storage is often a variety of unstructured content such as images, long text, PDFs, and Office documents. By using the AI capabilities in Azure Search, you can understand and extract valuable information from blobs in a variety of ways. Examples of applying AI to blob content include:
+
++ Extract text from images using optical character recognition (OCR)
++ Produce a scene description or tags from a photo
++ Detect language and translate text into different languages
++ Process text with named entity recognition (NER) to find references to people, dates, places, or organizations 
+
+While you might need just one of these AI capabilities, it’s common to combine multiple of them into the same pipeline (for example, extracting text from a scanned image and then finding all the dates and places referenced in it). 
+
+AI enrichment creates new information, captured as text, stored in fields. Post-enrichment, you can access this information from a search index through full text search, or send enriched documents back to Azure storage to power new application experiences that include exploring data for discovery or analytics scenarios. 
+
+In this article, we view AI enrichment through a wide lens so that you can quickly grasp the entire process, from transforming raw data in blobs, to queryable information in either a search index or a knowledge store.
+
+## What it means to "enrich" blob data
+
+*AI enrichment* is part of the indexing architecture of Azure Search that integrates built-in AI from Microsoft or custom AI that you provide. It helps you implement end-to-end scenarios where you need to process blobs (both existing ones and new ones as they come in or are updated), crack open all file formats to extract images and text, extract the desired information using various AI capabilities, and index them in an Azure Search index for fast search, retrieval and exploration. 
+
+Inputs are your blobs, in a single container, in Azure Blob storage. Blobs can be almost any kind of text or image data. 
+
+Output is always an Azure Search index, used for fast text search, retrieval, and exploration in client applications. Additionally, output can also be a *knowledge store* that projects enriched documents into Azure blobs or Azure tables for downstream analysis in tools like Power BI or in data science workloads.
+
+In between is the pipeline architecture itself. The pipeline is based on the *indexer* feature, to which you can assign a *skillset*, which is composed of one or more *skills* providing the AI. The purpose of the pipeline is to produce *enriched documents* that enter as raw content but pick up additional structure, context, and information while moving through the pipeline. Enriched documents are consumed during indexing to create inverted indexes and other structures used in full text search or exploration and analytics.
+
+## How to get started
+
+You can start directly in your storage account portal page. Click **Add Azure Search** and create a new Azure Search service or select an existing one. If you already have an existing search service in the same subscription, clicking **Add Azure Search** opens the Import data wizard so that you can immediately step through indexing, enrichment, and index definition.
+
+Once you add Azure Search to your storage account, you can follow the standard process to enrich data in any Azure data source. Assuming you already have blob content, you can use the Import data wizard in Azure Search for an easy initial introduction to AI enrichment. This quickstart explains the steps: [Create an AI enrichment pipeline in the portal](cognitive-search-quickstart-blob.md). 
+
+In the following sections, we'll explore more components and concepts.
+
+## Use Blob indexers
+
+AI enrichment is an add-on to an indexing pipeline, and in Azure Search, those pipelines are built on top of an *indexer*. An indexer is a data-source-aware subservice equipped with internal logic for sampling data, reading metadata data, retrieving data, and serializing data from native formats into JSON documents for subsequent import. Indexers are often used by themselves for import, separate from AI, but if you want to build an AI enrichment pipeline, you will need an indexer and a skillset to go with it. In this section, we'll focus on the indexer itself.
+
+Blobs in Azure Storage are indexed using the [Azure Search Blob storage indexer](search-howto-indexing-azure-blob-storage.md). You invoke this indexer by setting the type, and by providing connection information that includes an Azure Storage account along with a blob container. Unless you've previously organized blobs into a virtual directory, which you can then pass as a parameter, the Blob indexer pulls from the entire container.
+
+An indexer does the "document cracking", and after connecting to the data source, it's the first step in the pipeline. For blob data, this is where PDF, office docs, image, and other content types are detected. Document cracking with text extraction is no charge. Document cracking with image extraction is charged at rates you can find on the Azure Search [pricing page](https://azure.microsoft.com/pricing/details/search/).
+
+Although all documents will be cracked, enrichment only occurs if you explicitly provide the skills to do so. For example, if your pipeline consists exclusively of text analytics, any images in your container or documents will be ignored.
+
+The Blob indexer comes with configuration parameters and supports change tracking if the underlying data provides sufficient information. You can learn more about the core functionality in [Azure Search Blob storage indexer](search-howto-indexing-azure-blob-storage.md).
+
+## Add AI
+
+*Skills* are the individual components of AI processing that you can use standalone or in combination with other skills for sequential processing. 
+
++ Built-in skills are backed by Cognitive Services, with image analysis based on Computer Vision, and natural language processing based on Text Analytics. A few examples are [OCR](cognitive-search-skill-ocr.md), [Entity Recognition](cognitive-search-skill-entity-recognition.md), and [Image Analysis](cognitive-search-skill-image-analysis.md). You can review the full list of built-in skills in [Predefined skills for content enrichment](cognitive-search-predefined-skills.md).
+
++ Custom skills are custom code, wrapped in an interface definition that allows for integration into the pipeline. In customer solutions, it's common practice to use both, with custom skills providing open-source, third-party, or first-party AI modules.
+
+A *skillset* is the collection of skills used in a pipeline, and it's invoked after the document cracking phase makes content available. An indexer can consume exactly one skillset, but that skillset exists independently of an indexer so that you can reuse it in other scenarios.
+
+Custom skills might sound complex but can be simple and straightforward in terms of implementation. If you have existing packages that provide pattern matching or classification models, the content you extract from blobs could be passed to these models for processing. Since AI enrichment is Azure-based, your model should be on Azure also. Some common hosting methodologies include using [Azure Functions](cognitive-search-create-custom-skill-example.md) or [Containers](https://github.com/Microsoft/SkillsExtractorCognitiveSearch).
+
+Built-in skills backed by Cognitive Services require an [attached Cognitive Services](cognitive-search-attach-cognitive-services.md) all-in-one subscription key that gives you access to the resource. An all-in-one key gives you image analysis, language detection, text translation, and text analytics. Other built-in skills are features of Azure Search and require no additional service or key. Text shaper, splitter, and merger are examples of helper skills that are sometimes necessary when designing the pipeline.
+
+If you use only custom skills and built-in utility skills, there is no dependency or costs related to Cognitive Services.
+
+## Order of operations
+
+Now we've covered indexers, content extraction, and skills, we can take a closer look at pipeline mechanisms and order of operations.
+
+A skillset is a composition of one or more skills. When multiple skills are involved, the skillset operates as sequential pipeline, producing dependency graphs, where output from one skill becomes input to another. 
+
+For example, given a large blob of unstructured text, a sample order of operations for text analytics might be as follows:
+
+1. Use Text Splitter to break the blob into smaller parts.
+1. Use Language Detection to determine if content is English or another language.
+1. Use Text Translator to get all text into a common language.
+1. Run Entity Recognition, Key Phrase Extraction, or Sentiment Analysis on chunks of text. In this step, new fields are created and populated. Entities might be location, people, organization, dates. Key phrases are short combinations of words that appear to belong together. Sentiment score is a rating on continuum of negative (0) to positive (1) sentiment.
+1. Use Text Merger to reconstitute the document from the smaller chunks..
+
+
+## Outputs and use cases
+
+An enriched document at the end of the pipeline differs from its original input version by the presence of additional fields containing new information that was extracted or generated during enrichment. As such, you can work with a combination of original and created values in several ways.
+
+The output formations are a search index on Azure Search, or a knowledge store in Azure Storage.
+
+In Azure Search, enriched documents are formatted in JSON and can be indexed in the same way all documents are indexed, with the benefits an indexer provides. Fields from enriched documents are mapped to an index schema. During indexing, the blob indexer refers to configuration parameters and settings to utilize any field mappings or change detection logic that you've specified. Post-indexing, when content is stored on Azure Search, you can build rich queries and filter expressions to understand your content.
+
+In Azure Storage, a knowledge store has two manifestations: a blob container, or tables in Table storage. A blob container captures enriched documents in their entirety, which is useful if you want to feed into other processes. In contrast, Table storage can accommodate physical projections of enriched documents. You can create slices or layers of enriched documents that include or exclude specific parts. For analysis in Power BI, the tables in Azure Table storage become the data source for further visualization and exploration.
+
+## Next steps
+
+There’s a lot more you can do with AI enrichment to get the most out of your data in Azure Storage, including combining Cognitive Services in different ways, and authoring custom skills for cases where there’s no existing Cognitive Service for the scenario. You can learn more by following the links below.
+
+> [!div class="nextstepaction"]
+> [AI enrichment overview](cognitive-search-concept-intro.md) 
+> [Create a skillset](cognitive-search-defining-skillset.md)
+> [Map nodes in an annotation tree](cognitive-search-output-field-mapping.md)
diff --git a/articles/search/search-blob-storage-integration.md b/articles/search/search-blob-storage-integration.md
@@ -1,17 +1,17 @@
 ---
-title: Add full text search to Azure Blob Storage - Azure Search
-description: Crawl text content in Azure Blob storage for Azure Search indexing, in code using the HTTP REST API.
-services: search
+title: Add full text search to Azure Blob Storage
+titleSuffix: Azure Search
+description: Extract content and add structure to Azure blobs when building a full text search index in Azure Search.
+
+manager: nitinme
+author: HeidiSteen
+ms.author: heidist
 ms.service: search
 ms.topic: conceptual
-ms.date: 03/01/2019
-author: mgottein 
-manager: nitinme
-ms.author: magottei
-ms.custom: seodec2018
+ms.date: 10/09/2019
 ---
 
-# Searching Blob storage with Azure Search
+# Add full text search to Azure blob data using Azure Search
 
 Searching across the variety of content types stored in Azure Blob storage can be a difficult problem to solve. However, you can index and search the content of your Blobs in just a few clicks by using Azure Search. Searching over Blob storage requires provisioning an Azure Search service. The various service limits and pricing tiers of Azure Search can be found on the [pricing page](https://aka.ms/azspricing).
 
@@ -40,12 +40,12 @@ Azure Search can be configured to extract structured content found in blobs that
 
 JSON parsing is not currently configurable through the portal. [Learn more about JSON parsing in Azure Search.](https://aka.ms/azsjsonblobindexing)
 
-## Quick start
+## Quickstart
 Azure Search can be added to blobs directly from the Blob storage portal page.
 
 ![](./media/search-blob-storage-integration/blob-blade.png)
 
 Click **Add Azure Search** to launch a flow where you can select an existing Azure Search service or create a new service. If you create a new service, you are navigated out of your Storage account's portal experience. You can navigate back to the Storage portal page and re-select the **Add Azure Search** option, where you can select the existing service.
 
-## Next Steps
+## Next steps
 Learn more about the Azure Search Blob Indexer in the full [documentation](https://aka.ms/azsblobindexer).