Skip to content

Commit cf88a52

Browse files
Merge pull request #272531 from HeidiSteen/heidist-vectors
[azure search] Indexer consistency pass #2
2 parents 8dd354a + 6546d81 commit cf88a52

20 files changed

+37
-37
lines changed

articles/search/cognitive-search-concept-annotations-syntax.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ The following list includes several common examples:
5050
+ `/document/pages/*` or `/document/sentences/*` become the context if you're breaking a large document into smaller chunks for processing. If "context" is `/document/pages/*`, the skill executes once over each page in the document. Because there might be more than one page or sentence, you'll append `/*` to catch them all.
5151
+ `/document/normalized_images/*` is created during document cracking if the document contains images. All paths to images start with normalized_images. Since there are often multiple images embedded in a document, append `/*`.
5252

53-
Examples in the remainder of this article are based on the "content" field generated automatically by [Azure Blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the "content" field is part of the "document".
53+
Examples in the remainder of this article are based on the "content" field generated automatically by [Azure blob indexers](search-howto-indexing-azure-blob-storage.md) as part of the [document cracking](search-indexer-overview.md#document-cracking) phase. When referring to documents from a Blob container, use a format such as `"/document/content"`, where the "content" field is part of the "document".
5454

5555
<a name="example-1"></a>
5656

articles/search/cognitive-search-skill-image-analysis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Parameters are case-sensitive.
4949

5050
| Input name | Description |
5151
|---------------|------------------------------------------------------|
52-
| `image` | Complex Type. Currently only works with "/document/normalized_images" field, produced by the Azure Blob indexer when ```imageAction``` is set to a value other than ```none```. |
52+
| `image` | Complex Type. Currently only works with "/document/normalized_images" field, produced by the Azure blob indexer when ```imageAction``` is set to a value other than ```none```. |
5353

5454
## Skill outputs
5555

articles/search/cognitive-search-skill-ocr.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ In previous versions, there was a parameter called "textExtractionAlgorithm" to
5252

5353
| Input name | Description |
5454
|---------------|------------------------------------------------------|
55-
| `image` | Complex Type. Currently only works with "/document/normalized_images" field, produced by the Azure Blob indexer when ```imageAction``` is set to a value other than ```none```. |
55+
| `image` | Complex Type. Currently only works with "/document/normalized_images" field, produced by the Azure blob indexer when ```imageAction``` is set to a value other than ```none```. |
5656

5757
## Skill outputs
5858

articles/search/search-blob-storage-integration.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -66,23 +66,23 @@ Textual content of a document is extracted into a string field named "content".
6666
> [!NOTE]
6767
> Azure AI Search imposes [indexer limits](search-limits-quotas-capacity.md#indexer-limits) on how much text it extracts depending on the pricing tier. A warning will appear in the indexer status response if documents are truncated.
6868
69-
## Use a Blob indexer for content extraction
69+
## Use a blob indexer for content extraction
7070

7171
An *indexer* is a data-source-aware subservice in Azure AI Search, equipped with internal logic for sampling data, reading and retrieving data and metadata, and serializing data from native formats into JSON documents for subsequent import.
7272

7373
Blobs in Azure Storage are indexed using the [blob indexer](search-howto-indexing-azure-blob-storage.md). You can invoke this indexer by using the **Azure AI Search** command in Azure Storage, the **Import data** wizard, a REST API, or the .NET SDK. In code, you use this indexer by setting the type, and by providing connection information that includes an Azure Storage account along with a blob container. You can subset your blobs by creating a virtual directory, which you can then pass as a parameter, or by filtering on a file type extension.
7474

7575
An indexer ["cracks a document"](search-indexer-overview.md#document-cracking), opening a blob to inspect content. After connecting to the data source, it's the first step in the pipeline. For blob data, this is where PDF, Office docs, and other content types are detected. Document cracking with text extraction is no charge. If your blobs contain image content, images are ignored unless you [add AI enrichment](cognitive-search-concept-intro.md). Standard indexing applies only to text content.
7676

77-
The Blob indexer comes with configuration parameters and supports change tracking if the underlying data provides sufficient information. You can learn more about the core functionality in [Blob indexer](search-howto-indexing-azure-blob-storage.md).
77+
The Azure blob indexer comes with configuration parameters and supports change tracking if the underlying data provides sufficient information. You can learn more about the core functionality in [Index data from Azure Blob Storage](search-howto-indexing-azure-blob-storage.md).
7878

7979
### Supported access tiers
8080

8181
Blob storage [access tiers](../storage/blobs/access-tiers-overview.md) include hot, cool, and archive. Only hot and cool can be accessed by indexers.
8282

8383
### Supported content types
8484

85-
By running a Blob indexer over a container, you can extract text and metadata from the following content types with a single query:
85+
By running a blob indexer over a container, you can extract text and metadata from the following content types with a single query:
8686

8787
[!INCLUDE [search-blob-data-sources](../../includes/search-blob-data-sources.md)]
8888

articles/search/search-file-storage-integration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ In the [search index](search-what-is-an-index.md), add fields to accept the cont
114114

115115
1. Add a "content" field to store extracted text from each file through the blob's "content" property. You aren't required to use this name, but doing so lets you take advantage of implicit field mappings.
116116

117-
1. Add fields for standard metadata properties. In file indexing, the standard metadata properties are the same as blob metadata properties. The file indexer automatically creates internal field mappings for these properties that converts hyphenated property names to underscored property names. You still have to add the fields you want to use the index definition, but you can omit creating field mappings in the data source.
117+
1. Add fields for standard metadata properties. In file indexing, the standard metadata properties are the same as blob metadata properties. The Azure Files indexer automatically creates internal field mappings for these properties that converts hyphenated property names to underscored property names. You still have to add the fields you want to use the index definition, but you can omit creating field mappings in the data source.
118118

119119
+ **metadata_storage_name** (`Edm.String`) - the file name. For example, if you have a file /my-share/my-folder/subfolder/resume.pdf, the value of this field is `resume.pdf`.
120120
+ **metadata_storage_path** (`Edm.String`) - the full URI of the file, including the storage account. For example, `https://myaccount.file.core.windows.net/my-share/my-folder/subfolder/resume.pdf`
@@ -124,7 +124,7 @@ In the [search index](search-what-is-an-index.md), add fields to accept the cont
124124
+ **metadata_storage_content_md5** (`Edm.String`) - MD5 hash of the file content, if available.
125125
+ **metadata_storage_sas_token** (`Edm.String`) - A temporary SAS token that can be used by [custom skills](cognitive-search-custom-skill-interface.md) to get access to the file. This token shouldn't be stored for later use as it might expire.
126126

127-
## Configure and run the file indexer
127+
## Configure and run the Azure Files indexer
128128

129129
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
130130

articles/search/search-how-to-create-search-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ In this article, learn the steps for defining and publishing a search index. Cre
3434

3535
## Document keys
3636

37-
A search index has one required field: a document key. A document key is the unique identifier of a search document. In Azure AI Search, it must be a string, and it must originate from unique values in the data source that's providing the content to be indexed. A search service doesn't generate key values, but in some scenarios (such as the [Azure Table indexer](search-howto-indexing-azure-tables.md)) it synthesizes existing values to create a unique key for the documents being indexed.
37+
A search index has one required field: a document key. A document key is the unique identifier of a search document. In Azure AI Search, it must be a string, and it must originate from unique values in the data source that's providing the content to be indexed. A search service doesn't generate key values, but in some scenarios (such as the [Azure table indexer](search-howto-indexing-azure-tables.md)) it synthesizes existing values to create a unique key for the documents being indexed.
3838

3939
During incremental indexing, where new and updated content is indexed, incoming documents with new keys are added, while incoming documents with existing keys are either merged or overwritten, depending on whether index fields are null or populated.
4040

articles/search/search-howto-index-cosmosdb-gremlin.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Azure Cosmos DB Gremlin indexer
33
titleSuffix: Azure AI Search
4-
description: Set up an Azure Cosmos DB indexer to automate indexing of Azure Cosmos DB for Apache Gremlin content for full text search in Azure AI Search. This article explains how index data using the Azure Cosmos DB for Apache Gremlin protocol.
4+
description: Set up an Azure Cosmos DB indexer to automate indexing of Apache Gremlin content for full text search in Azure AI Search. This article explains how index data using the Azure Cosmos DB for Apache Gremlin protocol.
55

66
author: mgottein
77
ms.author: magottei
@@ -14,7 +14,7 @@ ms.topic: how-to
1414
ms.date: 02/28/2024
1515
---
1616

17-
# Import data from Azure Cosmos DB for Apache Gremlin for queries in Azure AI Search
17+
# Index data from Azure Cosmos DB for Apache Gremlin for queries in Azure AI Search
1818

1919
> [!IMPORTANT]
2020
> The Azure Cosmos DB for Apache Gremlin indexer is currently in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). Currently, there is no SDK support.
@@ -312,7 +312,7 @@ The Azure Cosmos DB for Apache Gremlin indexer will automatically map a couple p
312312

313313
1. The indexer will map `_id` to an `id` field in the index if it exists.
314314

315-
1. When querying your Azure Cosmos DB database using the Azure Cosmos DB for Apache Gremlin you may notice that the JSON output for each property has an `id` and a `value`. Azure AI Search Azure Cosmos DB indexer will automatically map the properties `value` into a field in your search index that has the same name as the property if it exists. In the following example, 450 would be mapped to a `pages` field in the search index.
315+
1. When querying your Azure Cosmos DB database using the Azure Cosmos DB for Apache Gremlin you may notice that the JSON output for each property has an `id` and a `value`. The indexer will automatically map the properties `value` into a field in your search index that has the same name as the property if it exists. In the following example, 450 would be mapped to a `pages` field in the search index.
316316

317317
```http
318318
{

articles/search/search-howto-index-cosmosdb-mongodb.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.topic: how-to
1111
ms.date: 02/28/2024
1212
---
1313

14-
# Import data from Azure Cosmos DB for MongoDB for queries in Azure AI Search
14+
# Index data from Azure Cosmos DB for MongoDB for queries in Azure AI Search
1515

1616
> [!IMPORTANT]
1717
> MongoDB API support is currently in public preview under [supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). Currently, there is no SDK support.
@@ -161,7 +161,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
161161
| GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint |
162162
| Other JSON objects |N/A |
163163
164-
## Configure and run the Azure Cosmos DB indexer
164+
## Configure and run the Azure Cosmos DB for MongoDB indexer
165165
166166
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
167167

articles/search/search-howto-index-cosmosdb.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.topic: how-to
1313
ms.date: 01/18/2024
1414
---
1515

16-
# Import data from Azure Cosmos DB for NoSQL for queries in Azure AI Search
16+
# Index data from Azure Cosmos DB for NoSQL for queries in Azure AI Search
1717

1818
In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from [Azure Cosmos DB for NoSQL](../cosmos-db/nosql/index.yml) and makes it searchable in Azure AI Search.
1919

@@ -204,7 +204,7 @@ In a [search index](search-what-is-an-index.md), add fields to accept the source
204204
| GeoJSON objects such as { "type": "Point", "coordinates": [long, lat] } |Edm.GeographyPoint |
205205
| Other JSON objects |N/A |
206206
207-
## Configure and run the Azure Cosmos DB indexer
207+
## Configure and run the Azure Cosmos DB for NoSQL indexer
208208
209209
Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors.
210210
@@ -311,7 +311,7 @@ If you're using a [custom query to retrieve documents](#flatten-structures), mak
311311

312312
In some cases, even if your query contains an `ORDER BY [collection alias]._ts` clause, Azure AI Search might not infer that the query is ordered by the `_ts`. You can tell Azure AI Search that results are ordered by setting the `assumeOrderByHighWaterMarkColumn` configuration property.
313313

314-
To specify this hint, [create or update your indexer definition](#configure-and-run-the-azure-cosmos-db-indexer) as follows:
314+
To specify this hint, [create or update your indexer definition](#configure-and-run-the-azure-cosmos-db-for-nosql-indexer) as follows:
315315

316316
```http
317317
{

articles/search/search-howto-index-csv-blobs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Search over CSV blobs
33
titleSuffix: Azure AI Search
4-
description: Extract CSV blobs from Azure Blob Storage and import as search documents into Azure AI Search using the delimitedText parsing mode.
4+
description: Extract CSV blobs from Azure Blob Storage or Azure Files and import as search documents into Azure AI Search using the delimitedText parsing mode.
55

66
manager: nitinme
77
author: HeidiSteen
@@ -18,7 +18,7 @@ ms.date: 01/17/2024
1818

1919
**Applies to**: [Blob indexers](search-howto-indexing-azure-blob-storage.md), [File indexers](search-file-storage-integration.md)
2020

21-
In Azure AI Search, both blob indexers and file indexers support a `delimitedText` parsing mode for CSV files that treats each line in the CSV as a separate search document. For example, given the following comma-delimited text, the `delimitedText` parsing mode would result in two documents in the search index:
21+
In Azure AI Search, indexers for Azure Blob Storage and Azure Files support a `delimitedText` parsing mode for CSV files that treats each line in the CSV as a separate search document. For example, given the following comma-delimited text, the `delimitedText` parsing mode would result in two documents in the search index:
2222

2323
```text
2424
id, datePublished, tags

0 commit comments

Comments
 (0)