Skip to content

Commit d4104f4

Browse files
Merge pull request #272513 from HeidiSteen/heidist-vectors
[azure search] indexer consistency pass #1
2 parents 13c0d6c + 8a07704 commit d4104f4

8 files changed

+26
-27
lines changed

articles/search/cognitive-search-skill-custom-entity-lookup.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -404,7 +404,6 @@ This warning will be emitted if the number of matches detected is greater than t
404404

405405
## See also
406406

407-
+ [Custom Entity Lookup sample and readme](https://github.com/Azure-Samples/azure-search-rest-samples/tree/main/skill-examples/custom-entity-lookup-skill)
408407
+ [Built-in skills](cognitive-search-predefined-skills.md)
409408
+ [How to define a skillset](cognitive-search-defining-skillset.md)
410409
+ [Entity Recognition skill (to search for well known entities)](cognitive-search-skill-entity-recognition-v3.md)

articles/search/cognitive-search-tutorial-debug-sessions.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
3131

3232
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
3333

34-
+ [Sample PDFs (clinical trials)](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/clinical-trials/clinical-trials-pdf-19).
34+
+ [Sample PDFs (clinical trials)](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/_ARCHIVE/clinical-trials/clinical-trials-pdf-19).
3535

3636
+ [Sample debug-sessions.rest file](https://github.com/Azure-Samples/azure-search-rest-samples/blob/main/Debug-sessions/debug-sessions.rest) used to create the enrichment pipeline.
3737

@@ -42,7 +42,7 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
4242

4343
This section creates the sample data set in Azure Blob Storage so that the indexer and skillset have content to work with.
4444

45-
1. [Download sample data (clinical-trials-pdf-19)](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/clinical-trials/clinical-trials-pdf-19), consisting of 19 files.
45+
1. [Download sample data (clinical-trials-pdf-19)](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/_ARCHIVE/clinical-trials/clinical-trials-pdf-19), consisting of 19 files.
4646

4747
1. [Create an Azure storage account](../storage/common/storage-account-create.md?tabs=azure-portal) or [find an existing account](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/).
4848

articles/search/retrieval-augmented-generation-overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,13 +71,13 @@ RAG patterns that include Azure AI Search have the elements indicated in the fol
7171

7272
The web app provides the user experience, providing the presentation, context, and user interaction. Questions or prompts from a user start here. Inputs pass through the integration layer, going first to information retrieval to get the search results, but also go to the LLM to set the context and intent.
7373

74-
The app server or orchestrator is the integration code that coordinates the handoffs between information retrieval and the LLM. One option is to use [LangChain](https://python.langchain.com/docs/get_started/introduction) to coordinate the workflow. LangChain [integrates with Azure AI Search](https://python.langchain.com/docs/integrations/retrievers/azure_cognitive_search), making it easier to include Azure AI Search as a [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/) in your workflow.
74+
The app server or orchestrator is the integration code that coordinates the handoffs between information retrieval and the LLM. One option is to use [LangChain](https://python.langchain.com/docs/get_started/introduction) to coordinate the workflow. LangChain [integrates with Azure AI Search](https://python.langchain.com/docs/integrations/retrievers/azure_ai_search/), making it easier to include Azure AI Search as a [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/) in your workflow.
7575

7676
The information retrieval system provides the searchable index, query logic, and the payload (query response). The search index can contain vectors or nonvector content. Although most samples and demos include vector fields, it's not a requirement. The query is executed using the existing search engine in Azure AI Search, which can handle keyword (or term) and vector queries. The index is created in advance, based on a schema you define, and loaded with your content that's sourced from files, databases, or storage.
7777

7878
The LLM receives the original prompt, plus the results from Azure AI Search. The LLM analyzes the results and formulates a response. If the LLM is ChatGPT, the user interaction might be a back and forth conversation. If you're using Davinci, the prompt might be a fully composed answer. An Azure solution most likely uses Azure OpenAI, but there's no hard dependency on this specific service.
7979

80-
Azure AI Search doesn't provide native LLM integration, web frontends, or vector encoding (embeddings) out of the box, so you need to write code that handles those parts of the solution. You can review demo source ([Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo)) for a blueprint of what a full solution entails.
80+
Azure AI Search doesn't provide native LLM integration for prompt flows or chat preservation, so you need to write code that handles orchestration and state. You can review demo source ([Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo)) for a blueprint of what a full solution entails. We also recommend Azure AI Studio or [Azure OpenAI Studio](/azure/ai-services/openai/use-your-data-quickstart) to create RAG-based Azure AI Search solutions that integrate with LLMs.
8181

8282
## Searchable content in Azure AI Search
8383

articles/search/search-api-preview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Preview features are removed from this list if they're retired or transition to
3131
| [**Text Split skill**](cognitive-search-skill-textsplit.md) | AI enrichment (skills) | Text Split has two new chunking-related properties in preview: `maximumPagesToTake`, `pageOverlapLength`. | [Create or Update Skillset (preview)](/rest/api/searchservice/preview-api/create-or-update-skillset), 2023-10-01-Preview or later. Also available in the portal through the [Import and vectorize data wizard](search-get-started-portal-import-vectors.md). |
3232
| [**Index projections**](index-projections-concept-intro.md) | AI enrichment (skills) | A component of a skillset definition that defines the shape of a secondary index, supporting a one-to-many index pattern, where content from an enrichment pipeline can target multiple indexes.| [Create or Update Skillset (preview)](/rest/api/searchservice/preview-api/create-or-update-skillset), 2023-10-01-Preview or later. Also available in the portal through the [Import and vectorize data wizard](search-get-started-portal-import-vectors.md). |
3333
| [**Azure Files indexer**](search-file-storage-integration.md) | Indexer data source | New data source for indexer-based indexing from [Azure Files](https://azure.microsoft.com/services/storage/files/) | [Create or Update Data Source (preview)](/rest/api/searchservice/preview-api/create-or-update-data-source), 2021-04-30-Preview or later. |
34-
| [**SharePoint Indexer**](search-howto-index-sharepoint-online.md) | Indexer data source | New data source for indexer-based indexing of SharePoint content. | [Sign up](https://aka.ms/azure-cognitive-search/indexer-preview) to enable the feature. Use [Create or Update Data Source (preview)](/rest/api/searchservice/preview-api/create-or-update-data-source), 2020-06-30-Preview or later, or the Azure portal. |
34+
| [**SharePoint Online indexer**](search-howto-index-sharepoint-online.md) | Indexer data source | New data source for indexer-based indexing of SharePoint content. | [Sign up](https://aka.ms/azure-cognitive-search/indexer-preview) to enable the feature. Use [Create or Update Data Source (preview)](/rest/api/searchservice/preview-api/create-or-update-data-source), 2020-06-30-Preview or later, or the Azure portal. |
3535
| [**MySQL indexer**](search-howto-index-mysql.md) | Indexer data source | New data source for indexer-based indexing of Azure MySQL data sources.| [Sign up](https://aka.ms/azure-cognitive-search/indexer-preview) to enable the feature. Use [Create or Update Data Source (preview)](/rest/api/searchservice/preview-api/create-or-update-data-source), 2020-06-30-Preview or later, [.NET SDK 11.2.1](/dotnet/api/azure.search.documents.indexes.models.searchindexerdatasourcetype.mysql), and Azure portal. |
3636
| [**Azure Cosmos DB for MongoDB indexer**](search-howto-index-cosmosdb.md) | Indexer data source | New data source for indexer-based indexing through the MongoDB APIs in Azure Cosmos DB. | [Sign up](https://aka.ms/azure-cognitive-search/indexer-preview) to enable the feature. Use [Create or Update Data Source (preview)](/rest/api/searchservice/preview-api/create-or-update-data-source), 2020-06-30-Preview or later, or the Azure portal.|
3737
| [**Azure Cosmos DB for Apache Gremlin indexer**](search-howto-index-cosmosdb.md) | Indexer data source | New data source for indexer-based indexing through the Apache Gremlin APIs in Azure Cosmos DB. | [Sign up](https://aka.ms/azure-cognitive-search/indexer-preview) to enable the feature. Use [Create or Update Data Source (preview)](/rest/api/searchservice/preview-api/create-or-update-data-source), 2020-06-30-Preview or later.|

articles/search/search-blob-metadata-properties.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Azure AI Search supports blob indexing and SharePoint document indexing for the
2424

2525
## Properties by document format
2626

27-
The following table summarizes processing for each document format, and describes the metadata properties extracted by a blob indexer and the SharePoint indexer.
27+
The following table summarizes processing for each document format, and describes the metadata properties extracted by a blob indexer and the SharePoint Online indexer.
2828

2929
| Document format / content type | Extracted metadata | Processing details |
3030
| --- | --- | --- |

articles/search/search-howto-index-json-blobs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,9 +109,9 @@ api-key: [admin key]
109109
}
110110
```
111111

112-
### jsonArrays example (clinical trials sample data)
112+
### jsonArrays example
113113

114-
The [clinical trials JSON data set](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/clinical-trials/clinical-trials-json) on GitHub is helpful for testing JSON array parsing. You can upload the data files to Blob storage and use the [**Import data** wizard](search-get-started-portal.md) to quickly evaluate how this content is parsed into individual search documents.
114+
The [New York Philharmonic JSON data set](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ny-philharmonic) on GitHub is helpful for testing JSON array parsing. You can upload the data files to Blob storage and use the [**Import data** wizard](search-get-started-portal.md) to quickly evaluate how this content is parsed into individual search documents.
115115

116116
The data set consists of eight blobs, each containing a JSON array of entities, for a total of 100 entities. The entities vary as to which fields are populated, but the end result is one search document per entity, from all arrays, in all blobs.
117117

0 commit comments

Comments
 (0)