You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-intro.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,19 +10,21 @@ ms.service: cognitive-search
10
10
ms.custom:
11
11
- ignite-2023
12
12
ms.topic: conceptual
13
-
ms.date: 10/27/2023
13
+
ms.date: 01/30/2024
14
14
---
15
15
# AI enrichment in Azure AI Search
16
16
17
-
In Azure AI Search, *AI enrichment*calls the APIs of[Azure AI services](/azure/ai-services/what-are-ai-services) to process content that isn't full text searchable in its raw form. Through enrichment, analysis and inference are used to create searchable content and structure where none previously existed.
17
+
In Azure AI Search, *AI enrichment*refers to integration with[Azure AI services](/azure/ai-services/what-are-ai-services) to process content that isn't searchable in its raw form. Through enrichment, analysis and inference are used to create searchable content and structure where none previously existed.
18
18
19
-
Because Azure AI Search is a full text search solution, the purpose of AI enrichment is to improve the utility of your content in search-related scenarios:
19
+
Because Azure AI Search is a text and vector search solution, the purpose of AI enrichment is to improve the utility of your content in search-related scenarios. Source content must be textual (you can't enrich vectors), but the content created by an enrichment pipeline can be vectorized and indexed in a vector store using skills like [Text Split skill](cognitive-search-skill-textsplit.md) for chunking and [AzureOpenAiEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) for encoding.
20
20
21
-
+ Apply translation and language detection for multi-lingual search
22
-
+ Apply entity recognition to extract people names, places, and other entities from large chunks of text
23
-
+ Apply key phrase extraction to identify and output important terms
24
-
+ Apply Optical Character Recognition (OCR) to recognize printed and handwritten text in binary files
25
-
+ Apply image analysis to describe image content, and output the descriptions as searchable text fields
21
+
Built-in skills apply the following transformation and processing to raw content:
22
+
23
+
+ Translation and language detection for multi-lingual search
24
+
+ Entity recognition to extract people names, places, and other entities from large chunks of text
25
+
+ Key phrase extraction to identify and output important terms
26
+
+ Optical Character Recognition (OCR) to recognize printed and handwritten text in binary files
27
+
+ Image analysis to describe image content, and output the descriptions as searchable text fields
26
28
27
29
AI enrichment is an extension of an [**indexer pipeline**](search-indexer-overview.md) that connects to Azure data sources. An enrichment pipeline has all of the components of an indexer pipeline (indexer, data source, index), plus a [**skillset**](cognitive-search-working-with-skillsets.md) that specifies atomic enrichment steps.
Copy file name to clipboardExpand all lines: articles/search/knowledge-store-concept-intro.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,13 +14,13 @@ ms.date: 01/10/2024
14
14
15
15
# Knowledge store in Azure AI Search
16
16
17
-
Knowledge store is secondary storage for [AI-enriched content created by a skillset](cognitive-search-concept-intro.md) in Azure AI Search. In Azure AI Search, an indexing job always sends output to a search index, but if you attach a skillset to an indexer, you can optionally also send AI-enriched output to a container or table in Azure Storage. A knowledge store can be used for independent analysis or downstream processing in non-search scenarios like knowledge mining.
17
+
Knowledge store is secondary storage for [AI-enriched content created by a skillset](cognitive-search-concept-intro.md) in Azure AI Search. In Azure AI Search, an indexing job always sends output to a search index, but if you attach a skillset to an indexer, you can optionally also send AI-enriched output to a container or table in Azure Storage. A knowledge store can be used for independent analysis or downstream processing in non-search scenarios like knowledge mining.
18
18
19
19
The two outputs of indexing, a search index and knowledge store, are mutually exclusive products of the same pipeline. They're derived from the same inputs and contain the same data, but their content is structured, stored, and used in different applications.
20
20
21
21
:::image type="content" source="media/knowledge-store-concept-intro/knowledge-store-concept-intro.svg" alt-text="Pipeline with skillset" border="false":::
22
22
23
-
Physically, a knowledge store is [Azure Storage](../storage/common/storage-account-overview.md), either Azure Table Storage, Azure Blob Storage, or both. Any tool or process that can connect to Azure Storage can consume the contents of a knowledge store.
23
+
Physically, a knowledge store is [Azure Storage](../storage/common/storage-account-overview.md), either Azure Table Storage, Azure Blob Storage, or both. Any tool or process that can connect to Azure Storage can consume the contents of a knowledge store. There's no query support in Azure AI Search for retrieving content from a knowledge store.
24
24
25
25
When viewed through Azure portal, a knowledge store looks like any other collection of tables, objects, or files. The following screenshot shows a knowledge store composed of three tables. You can adopt a naming convention, such as a `kstore` prefix, to keep your content together.
26
26
@@ -63,13 +63,13 @@ The type of projection you specify in this structure determines the type of stor
63
63
64
64
+`tables` project enriched content into Table Storage. Define a table projection when you need tabular reporting structures for inputs to analytical tools or export as data frames to other data stores. You can specify multiple `tables` within the same projection group to get a subset or cross section of enriched documents. Within the same projection group, table relationships are preserved so that you can work with all of them.
65
65
66
-
Projected content is not aggregated or normalized. The following screenshot shows a table, sorted by key phrase, with the parent document indicated in the adjacent column. In contrast with data ingestion during indexing, there is no linguistic analysis or aggregation of content. Plural forms and differences in casing are considered unique instances.
66
+
Projected content isn't aggregated or normalized. The following screenshot shows a table, sorted by key phrase, with the parent document indicated in the adjacent column. In contrast with data ingestion during indexing, there's no linguistic analysis or aggregation of content. Plural forms and differences in casing are considered unique instances.
67
67
68
68
:::image type="content" source="media/knowledge-store-concept-intro/kstore-keyphrases-per-document.png" alt-text="Screenshot of key phrases and documents in a table" border="true":::
69
69
70
70
+`objects` project JSON document into Blob storage. The physical representation of an `object` is a hierarchical JSON structure that represents an enriched document.
71
71
72
-
+`files` project image files into Blob storage. A `file` is an image extracted from a document, transferred intact to Blob storage. Although it is named "files", it shows up in Blob Storage, not file storage.
72
+
+`files` project image files into Blob storage. A `file` is an image extracted from a document, transferred intact to Blob storage. Although it's named "files", it shows up in Blob Storage, not file storage.
73
73
74
74
## Create a knowledge store
75
75
@@ -139,15 +139,15 @@ For data sources that support change tracking, an indexer will process new and c
139
139
140
140
### Changes to a skillset
141
141
142
-
If you are making changes to a skillset, you should [enable caching of enriched documents](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments where possible.
142
+
If you're making changes to a skillset, you should [enable caching of enriched documents](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments where possible.
143
143
144
144
Without incremental caching, the indexer will always process documents in order of the high water mark, without going backwards. For blobs, the indexer would process blobs sorted by `lastModified`, regardless of any changes to indexer settings or the skillset. If you change a skillset, previously processed documents aren't updated to reflect the new skillset. Documents processed after the skillset change will use the new skillset, resulting in index documents being a mix of old and new skillsets.
145
145
146
146
With incremental caching, and after a skillset update, the indexer will reuse any enrichments that are unaffected by the skillset change. Upstream enrichments are pulled from cache, as are any enrichments that are independent and isolated from the skill that was changed.
147
147
148
148
### Deletions
149
149
150
-
Although an indexer creates and updates structures and content in Azure Storage, it does not delete them. Projections continue to exist even when the indexer or skillset is deleted. As the owner of the storage account, you should delete a projection if it is no longer needed.
150
+
Although an indexer creates and updates structures and content in Azure Storage, it doesn't delete them. Projections continue to exist even when the indexer or skillset is deleted. As the owner of the storage account, you should delete a projection if it's no longer needed.
Copy file name to clipboardExpand all lines: articles/search/retrieval-augmented-generation-overview.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ The decision about which information retrieval system to use is critical because
25
25
26
26
+ Security, global reach, and reliability for both data and operations.
27
27
28
-
+ Integration with LLMs.
28
+
+ Integration with embedding models for indexing, and chat models or language understanding models for retrieval.
29
29
30
30
Azure AI Search is a [proven solution for information retrieval](/azure/developer/python/get-started-app-chat-template?tabs=github-codespaces) in a RAG architecture. It provides indexing and query capabilities, with the infrastructure and security of the Azure cloud. Through code and other components, you can design a comprehensive RAG solution that includes all of the elements for generative AI over your proprietary content.
31
31
@@ -73,7 +73,7 @@ The web app provides the user experience, providing the presentation, context, a
73
73
74
74
The app server or orchestrator is the integration code that coordinates the handoffs between information retrieval and the LLM. One option is to use [LangChain](https://python.langchain.com/docs/get_started/introduction) to coordinate the workflow. LangChain [integrates with Azure AI Search](https://python.langchain.com/docs/integrations/retrievers/azure_cognitive_search), making it easier to include Azure AI Search as a [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/) in your workflow.
75
75
76
-
The information retrieval system provides the searchable index, query logic, and the payload (query response). The search index can contain vectors or non-vector content. Although most samples and demos include vector fields, it's not a requirement. The query is executed using the existing search engine in Azure AI Search, which can handle keyword (or term) and vector queries. The index is created in advance, based on a schema you define, and loaded with your content that's sourced from files, databases, or storage.
76
+
The information retrieval system provides the searchable index, query logic, and the payload (query response). The search index can contain vectors or nonvector content. Although most samples and demos include vector fields, it's not a requirement. The query is executed using the existing search engine in Azure AI Search, which can handle keyword (or term) and vector queries. The index is created in advance, based on a schema you define, and loaded with your content that's sourced from files, databases, or storage.
77
77
78
78
The LLM receives the original prompt, plus the results from Azure AI Search. The LLM analyzes the results and formulates a response. If the LLM is ChatGPT, the user interaction might be a back and forth conversation. If you're using Davinci, the prompt might be a fully composed answer. An Azure solution most likely uses Azure OpenAI, but there's no hard dependency on this specific service.
79
79
@@ -114,11 +114,11 @@ There's no query type in Azure AI Search - not even semantic or vector search -
114
114
115
115
| Query feature | Purpose | Why use it |
116
116
|---------------|---------|------------|
117
-
|[Simple or full Lucene syntax](search-query-create.md)| Query execution over text and non-vector numeric content | Full text search is best for exact matches, rather than similar matches. Full text search queries are ranked using the [BM25 algorithm](index-similarity-and-scoring.md) and support relevance tuning through scoring profiles. It also supports filters and facets. |
118
-
|[Filters](search-filters.md) and [facets](search-faceted-navigation.md)| Applies to text or numeric (non-vector) fields only. Reduces the search surface area based on inclusion or exclusion criteria. | Adds precision to your queries. |
117
+
|[Simple or full Lucene syntax](search-query-create.md)| Query execution over text and nonvector numeric content | Full text search is best for exact matches, rather than similar matches. Full text search queries are ranked using the [BM25 algorithm](index-similarity-and-scoring.md) and support relevance tuning through scoring profiles. It also supports filters and facets. |
118
+
|[Filters](search-filters.md) and [facets](search-faceted-navigation.md)| Applies to text or numeric (nonvector) fields only. Reduces the search surface area based on inclusion or exclusion criteria. | Adds precision to your queries. |
119
119
|[Semantic ranking](semantic-how-to-query-request.md)| Re-ranks a BM25 result set using semantic models. Produces short-form captions and answers that are useful as LLM inputs. | Easier than scoring profiles, and depending on your content, a more reliable technique for relevance tuning. |
120
120
[Vector search](vector-search-how-to-query.md) | Query execution over vector fields for similarity search, where the query string is one or more vectors. | Vectors can represent all types of content, in any language. |
121
-
|[Hybrid search](hybrid-search-how-to-query.md)| Combines any or all of the above query techniques. Vector and non-vector queries execute in parallel and are returned in a unified result set. | The most significant gains in precision and recall are through hybrid queries. |
121
+
|[Hybrid search](hybrid-search-how-to-query.md)| Combines any or all of the above query techniques. Vector and nonvector queries execute in parallel and are returned in a unified result set. | The most significant gains in precision and recall are through hybrid queries. |
122
122
123
123
### Structure the query response
124
124
@@ -135,7 +135,7 @@ Rows are matches to the query, ranked by relevance, similarity, or both. By defa
135
135
136
136
When you're working with complex processes, a large amount of data, and expectations for millisecond responses, it's critical that each step adds value and improves the quality of the end result. On the information retrieval side, *relevance tuning* is an activity that improves the quality of the results sent to the LLM. Only the most relevant or the most similar matching documents should be included in results.
137
137
138
-
Relevance applies to keyword (non-vector) search and to hybrid queries (over the non-vector fields). In Azure AI Search, there's no relevance tuning for similarity search and vector queries. [BM25 ranking](index-similarity-and-scoring.md) is the ranking algorithm for full text search.
138
+
Relevance applies to keyword (nonvector) search and to hybrid queries (over the nonvector fields). In Azure AI Search, there's no relevance tuning for similarity search and vector queries. [BM25 ranking](index-similarity-and-scoring.md) is the ranking algorithm for full text search.
139
139
140
140
Relevance tuning is supported through features that enhance BM25 ranking. These approaches include:
Copy file name to clipboardExpand all lines: articles/search/search-get-started-vector.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,14 @@ ms.date: 01/19/2024
14
14
15
15
# Quickstart: Vector search using REST APIs
16
16
17
-
Get started with vector search in Azure AI Search using the **2023-11-01** REST APIs that create, load, and query a search index.
17
+
Get started with vector stores in Azure AI Search using the **2023-11-01** REST APIs that load, and query vectors.
18
18
19
-
Search indexes can have vector and nonvector fields. You can execute pure vector queries, or hybrid queries targeting both vector *and* textual fields configured for filters, sorts, facets, and semantic reranking.
19
+
In Azure AI Search, a *vector store* has an index schema that defines vector and nonvector fields, a vector configuration for algorithms that create the embedding space, and settings on vector field definitions that are used in query requests. The [Create Index](/rest/api/searchservice/indexes/create-or-update) API creates the vector store.
20
+
21
+
You can execute pure vector queries, or hybrid queries targeting both vector *and* textual fields configured for filters, sorts, facets, and semantic reranking.
20
22
21
23
> [!NOTE]
22
-
> The stable REST API version depends on external modules for data chunking and embedding. If you want test-drive the [built-in data chunking and vectorization (public preview)](vector-search-integrated-vectorization.md) features, try the [**Import and vectorize data** wizard](search-get-started-portal-import-vectors.md) for an end-to-end walkthrough.
24
+
> The stable REST API version depends on external solutions for data chunking and embedding. If you want evalulate the [built-in data chunking and vectorization (public preview)](vector-search-integrated-vectorization.md) features, try the [**Import and vectorize data** wizard](search-get-started-portal-import-vectors.md) for an end-to-end walkthrough.
23
25
24
26
## Prerequisites
25
27
@@ -29,7 +31,7 @@ Search indexes can have vector and nonvector fields. You can execute pure vector
29
31
30
32
+ An Azure subscription. [Create one for free](https://azure.microsoft.com/free/).
31
33
32
-
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields will fail on creation. In this situation, a new service must be created.
34
+
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields will fail on creation. In this situation, a new service must be created. You can use the Free tier for this quickstart, but Basic or higher is recommended for larger data files.
33
35
34
36
+ Optionally, for [semantic reranking](semantic-search-overview.md) shown in the last example, your search service must be Basic tier or higher, with [semantic ranking enabled](semantic-how-to-enable-disable.md).
0 commit comments