Skip to content

Commit dafeae7

Browse files
Merge pull request #2037 from HeidiSteen/heidist-freshness
[azure search] review references to alphanumeric
2 parents 1607f66 + 4a84dca commit dafeae7

14 files changed

+28
-28
lines changed

articles/search/cognitive-search-concept-image-scenarios.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Through [AI enrichment](cognitive-search-concept-intro.md), Azure AI Search give
2020
+ [Image Analysis](cognitive-search-skill-image-analysis.md) that describes images through visual features
2121
+ [Custom skills](#passing-images-to-custom-skills) to invoke any external image processing that you want to provide
2222

23-
By using OCR, you can extract text from photos or pictures containing alphanumeric text, such as the word *STOP* in a stop sign. Through image analysis, you can generate a text representation of an image, such as *dandelion* for a photo of a dandelion, or the color *yellow*. You can also extract metadata about the image, such as its size.
23+
By using OCR, you can extract text and from photos or pictures, such as the word *STOP* in a stop sign. Through image analysis, you can generate a text representation of an image, such as *dandelion* for a photo of a dandelion, or the color *yellow*. You can also extract metadata about the image, such as its size.
2424

2525
This article covers the fundamentals of working with images, and also describes several common scenarios, such as working with embedded images, custom skills, and overlaying visualizations on original images.
2626

articles/search/cognitive-search-create-custom-skill-example.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Although this example uses an Azure Function to host a web API, it isn't require
3232

3333
1. In Visual Studio, select **New** > **Project** from the File menu.
3434

35-
1. Choose **Azure Functions** as the template and select **Next**. Type a name for your project, and select **Create**. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other non-alphanumeric characters.
35+
1. Choose **Azure Functions** as the template and select **Next**. Type a name for your project, and select **Create**. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other special characters.
3636

3737
1. Select a framework that has long term support.
3838

articles/search/index-add-custom-analyzers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ To create a custom analyzer, specify it in the `analyzers` section of an index a
4747

4848
An analyzer definition includes a name, type, one or more character filters, a maximum of one tokenizer, and one or more token filters for post-tokenization processing. Character filters are applied before tokenization. Token filters and character filters are applied from left to right.
4949

50-
- Names in a custom analyzer must be unique and can't be the same as any of the built-in analyzers, tokenizers, token filters, or characters filters. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
50+
- Names in a custom analyzer must be unique and can't be the same as any of the built-in analyzers, tokenizers, token filters, or characters filters. Names consist of letters, digits, spaces, dashes or underscores. Names must start and end with plain text characters. Names must be under 128 characters in length.
5151

5252
- Type must be #Microsoft.Azure.Search.CustomAnalyzer.
5353

articles/search/index-add-scoring-profiles.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Scoring profiles supplement the default scoring algorithm by boosting the scores
9999

100100
For standalone text queries, scoring profiles identify the maximum 1,000 matches in a [BM25-ranked search](index-similarity-and-scoring.md), and the top 50 are returned in results.
101101

102-
For pure vectors, the query is vector-only, but if the [*k*-matching documents](vector-search-ranking.md) include alphanumeric fields that a scoring profile can process, a scoring profile is applied. The scoring profile revises the result set by boosting documents that match criteria in the profile.
102+
For pure vectors, the query is vector-only, but if the [*k*-matching documents](vector-search-ranking.md) include nonvector fields with human-readable ocntent, a scoring profile can be applied. The scoring profile revises the result set by boosting documents that match criteria in the profile.
103103

104104
For text queries in a hybrid query, scoring profiles identify the maximum 1,000 matches in a BM25-ranked search. However, once those 1,000 results are identified, they're restored to their original BM25 order so that they can be rescored alongside vectors results in the final [Reciprocal Ranking Function (RRF)](hybrid-search-ranking.md) ordering, where the scoring profile (identified as "final document boosting adjustment" in the illustration) is applied to the merged results, along with [vector weighting](vector-search-how-to-query.md#vector-weighting), and [semantic ranking](semantic-search-overview.md) as the last step.
105105

articles/search/query-simple-syntax.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ You can embed Boolean operators in a query string to improve the precision of a
7070

7171
## Prefix queries
7272

73-
For "starts with" queries, add a suffix operator (`*`) as the placeholder for the remainder of a term. A prefix query must begin with at least one alphanumeric character before you can add the suffix operator.
73+
For "starts with" queries, add a suffix operator (`*`) as the placeholder for the remainder of a term. A prefix query must begin with at least one plain text character before you can add the suffix operator.
7474

7575
| Character | Example | Usage |
7676
|----------- |--------|-------|

articles/search/search-filters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Filtering occurs in tandem with search, qualifying which documents to include in
4646

4747
## How filters are defined
4848

49-
Filters apply to alphanumeric content on fields that are attributed as `filterable`.
49+
Filters apply to text and numeric (nonvector) content on fields that are attributed as `filterable`.
5050

5151
Filters are OData expressions, articulated in the [filter syntax](search-query-odata-filter.md) supported by Azure AI Search.
5252

articles/search/search-how-to-create-indexers.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.service: azure-ai-search
1111
ms.custom:
1212
- ignite-2023
1313
ms.topic: how-to
14-
ms.date: 10/24/2024
14+
ms.date: 12/17/2024
1515
---
1616

1717
# Create an indexer in Azure AI Search
@@ -22,9 +22,9 @@ You can use an indexer to automate data import and indexing in Azure AI Search.
2222

2323
Indexers support two workflows:
2424

25-
+ **Text-based indexing**: Extract strings and metadata from textual content for full text search scenarios.
25+
+ **Raw content indexing (plain text or vectors)**: Extract strings and metadata from textual content for full text search scenarios. Extracts raw vector content for vector search (for example, vectors in an Azure SQL database or Azure Cosmos DB collection). In this workflow, indexing occurs only over existing content that you provide.
2626

27-
+ **Skills-based indexing**: Use built-in or custom skills that add integrated machine learning for analysis over images and large undifferentiated content, extracting or inferring text and structure. Skills-based indexing enables search over content that isn't otherwise easily full text searchable. To learn more, see [AI enrichment in Azure AI Search](cognitive-search-concept-intro.md).
27+
+ **Skills-based indexing**: Extends indexing through built-in or custom skills that create or generate new searchable content. For example, you can add integrated machine learning for analysis over images and unstructured text, extracting or inferring text and structure. Or, use skills to chunk and vectorize content from text and images. Skills-based indexing creates or generates new content that doesn't exist in your external data source. New content becomes part of your index when you add fields to the index schema that accepts the incoming data. To learn more, see [AI enrichment in Azure AI Search](cognitive-search-concept-intro.md).
2828

2929
## Prerequisites
3030

@@ -38,11 +38,11 @@ Indexers support two workflows:
3838

3939
## Indexer patterns
4040

41-
When you create an indexer, the definition is one of two patterns: *text-based indexing* or *skills-based indexing*. The patterns are the same, except that skills-based indexing has more definitions.
41+
When you create an indexer, the definition is one of two patterns: *content-based indexing* or *skills-based indexing*. The patterns are the same, except that skills-based indexing has more definitions.
4242

43-
### Indexer example for text-based indexing
43+
### Indexer example for content-based indexing
4444

45-
Text-based indexing for full text search is the primary use case for indexers. For this workflow, an indexer looks like this example.
45+
Content-based indexing for full text or vector search is the primary use case for indexers. For this workflow, an indexer looks like this example.
4646

4747
```json
4848
{
@@ -114,16 +114,16 @@ Indexers work with data sets. When you run an indexer, it connects to your data
114114

115115
| Source data | Tasks |
116116
|-------------|-------|
117-
| JSON documents | Make sure the structure or shape of incoming data corresponds to the schema of your search index. Most search indexes are fairly flat, where the fields collection consists of fields at the same level. However, hierarchical or nested structures are possible through [complex fields and collections](search-howto-complex-data-types.md). |
117+
| JSON documents | JSON documents can contain text, numbers, and vectors. Make sure the structure or shape of incoming data corresponds to the schema of your search index. Most search indexes are fairly flat, where the fields collection consists of fields at the same level. However, hierarchical or nested structures are possible through [complex fields and collections](search-howto-complex-data-types.md). |
118118
| Relational | Provide data as a flattened row set, where each row becomes a full or partial search document in the index. <br><br> To flatten relational data into a row set, you should create a SQL view, or build a query that returns parent and child records in the same row. For example, the built-in hotels sample dataset is an SQL database that has 50 records (one for each hotel), linked to room records in a related table. The query that flattens the collective data into a row set embeds all of the room information in JSON documents in each hotel record. The embedded room information is a generated by a query that uses a **FOR JSON AUTO** clause. <br><br> You can learn more about this technique in [define a query that returns embedded JSON](index-sql-relational-data.md#define-a-query-that-returns-embedded-json). This is just one example; you can find other approaches that produce the same result. |
119119
| Files | An indexer generally creates one search document for each file, where the search document consists of fields for content and metadata. Depending on the file type, the indexer can sometimes [parse one file into multiple search documents](search-howto-index-one-to-many-blobs.md). For example, in a CSV file, each row can become a standalone search document. |
120120

121121
Remember that you only need to pull in searchable and filterable data:
122122

123-
+ Searchable data is text
124-
+ Filterable data is alphanumeric
123+
+ Searchable data is text or vectors
124+
+ Filterable data is text and numbers (non-vector fields)
125125

126-
Azure AI Search can't search over binary data in any format, although it can extract and infer text descriptions of image files (see [AI enrichment](cognitive-search-concept-intro.md)) to create searchable content. Likewise, large text can be broken down and analyzed by natural language models to find structure or relevant information, generating new content that you can add to a search document.
126+
Azure AI Search can't do a full-text search over binary data in any format, although it can extract and infer text descriptions of image files (see [AI enrichment](cognitive-search-concept-intro.md)) to create searchable content. Likewise, large text can be broken down and analyzed by natural language models to find structure or relevant information, generating new content that you can add to a search document. It can also do vector search over embeddings, including quantized embeddings in a binary format.
127127

128128
Given that indexers don't fix data problems, other forms of data cleansing or manipulation might be needed. For more information, you should refer to the product documentation of your [Azure database product](/azure/?product=databases).
129129

articles/search/search-how-to-create-search-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Use this checklist to assist the design decisions for your search index.
6464

6565
1. Identify which source fields can be used as filters. Numeric content and short text fields, particularly those with repeating values, are good choices. When working with filters, remember:
6666

67-
+ Filters can be used in vector and nonvector queries, but the filter itself is applied to alphanumeric (nonvector) fields in your index.
67+
+ Filters can be used in vector and nonvector queries, but the filter itself is applied to human-readable (nonvector) fields in your index.
6868

6969
+ Filterable fields can optionally be used in faceted navigation.
7070

articles/search/search-how-to-load-search-index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ This article explains how to import documents into a predefined search index. In
1818

1919
## How data import works
2020

21-
A search service accepts JSON documents that conform to the index schema. A search service imports and indexes plain text and vectors in JSON, used in full text search, vector search, hybrid search, and knowledge mining scenarios.
21+
A search service accepts JSON documents that conform to the index schema. A search service can import and index plain text content and vector content in JSON documents.
2222

23-
+ Plain text content is obtainable from alphanumeric fields in the external data source, metadata that's useful in search scenarios, or enriched content created by a [skillset](cognitive-search-working-with-skillsets.md) (skills can extract or infer textual descriptions from images and unstructured content).
23+
+ Plain text content is retrieved from fields in the external data source, from metadata properties, or from enriched content that's generated by a [skillset](cognitive-search-working-with-skillsets.md) (skills can extract or infer textual descriptions from images and unstructured content).
2424

25-
+ Vector content is vectorized using an [external embedding model](vector-search-how-to-generate-embeddings.md) or [integrated vectorization](vector-search-integrated-vectorization.md) using Azure AI Search features that integrate with applied AI.
25+
+ Vector content is retrieved from a data source that provides it, or it's created by a skillset that implements [integrated vectorization](vector-search-integrated-vectorization.md) in an Azure AI Search indexer workload.
2626

2727
You can prepare these documents yourself, but if content resides in a [supported data source](search-indexer-overview.md#supported-data-sources), running an [indexer](search-indexer-overview.md) or using an Import wizard can automate document retrieval, JSON serialization, and indexing.
2828

articles/search/search-howto-incremental-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ PUT https://[YOUR-SEARCH-SERVICE].search.windows.net/indexers/[YOUR-INDEXER-NAME
134134
}
135135
```
136136

137-
If you now issue another GET request on the indexer, the response from the service includes an `ID` property in the cache object. The alphanumeric string is appended to the name of the container containing all the cached results and intermediate state of each document processed by this indexer. The ID is used to uniquely name the cache in Blob storage.
137+
If you now issue another GET request on the indexer, the response from the service includes an `ID` property in the cache object. The string is appended to the name of the container containing all the cached results and intermediate state of each document processed by this indexer. The ID is used to uniquely name the cache in Blob storage.
138138

139139
```http
140140
"cache": {

0 commit comments

Comments
 (0)