You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-image-scenarios.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ Through [AI enrichment](cognitive-search-concept-intro.md), Azure AI Search give
20
20
+[Image Analysis](cognitive-search-skill-image-analysis.md) that describes images through visual features
21
21
+[Custom skills](#passing-images-to-custom-skills) to invoke any external image processing that you want to provide
22
22
23
-
By using OCR, you can extract text from photos or pictures containing alphanumeric text, such as the word *STOP* in a stop sign. Through image analysis, you can generate a text representation of an image, such as *dandelion* for a photo of a dandelion, or the color *yellow*. You can also extract metadata about the image, such as its size.
23
+
By using OCR, you can extract text and from photos or pictures, such as the word *STOP* in a stop sign. Through image analysis, you can generate a text representation of an image, such as *dandelion* for a photo of a dandelion, or the color *yellow*. You can also extract metadata about the image, such as its size.
24
24
25
25
This article covers the fundamentals of working with images, and also describes several common scenarios, such as working with embedded images, custom skills, and overlaying visualizations on original images.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-create-custom-skill-example.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ Although this example uses an Azure Function to host a web API, it isn't require
32
32
33
33
1. In Visual Studio, select **New** > **Project** from the File menu.
34
34
35
-
1. Choose **Azure Functions** as the template and select **Next**. Type a name for your project, and select **Create**. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other non-alphanumeric characters.
35
+
1. Choose **Azure Functions** as the template and select **Next**. Type a name for your project, and select **Create**. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other special characters.
Copy file name to clipboardExpand all lines: articles/search/index-add-custom-analyzers.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ To create a custom analyzer, specify it in the `analyzers` section of an index a
47
47
48
48
An analyzer definition includes a name, type, one or more character filters, a maximum of one tokenizer, and one or more token filters for post-tokenization processing. Character filters are applied before tokenization. Token filters and character filters are applied from left to right.
49
49
50
-
- Names in a custom analyzer must be unique and can't be the same as any of the built-in analyzers, tokenizers, token filters, or characters filters. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
50
+
- Names in a custom analyzer must be unique and can't be the same as any of the built-in analyzers, tokenizers, token filters, or characters filters. Names consist of letters, digits, spaces, dashes or underscores. Names must start and end with plain text characters. Names must be under 128 characters in length.
51
51
52
52
- Type must be #Microsoft.Azure.Search.CustomAnalyzer.
Copy file name to clipboardExpand all lines: articles/search/index-add-scoring-profiles.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -99,7 +99,7 @@ Scoring profiles supplement the default scoring algorithm by boosting the scores
99
99
100
100
For standalone text queries, scoring profiles identify the maximum 1,000 matches in a [BM25-ranked search](index-similarity-and-scoring.md), and the top 50 are returned in results.
101
101
102
-
For pure vectors, the query is vector-only, but if the [*k*-matching documents](vector-search-ranking.md) include alphanumeric fields that a scoring profile can process, a scoring profile is applied. The scoring profile revises the result set by boosting documents that match criteria in the profile.
102
+
For pure vectors, the query is vector-only, but if the [*k*-matching documents](vector-search-ranking.md) include nonvector fields with human-readable ocntent, a scoring profile can be applied. The scoring profile revises the result set by boosting documents that match criteria in the profile.
103
103
104
104
For text queries in a hybrid query, scoring profiles identify the maximum 1,000 matches in a BM25-ranked search. However, once those 1,000 results are identified, they're restored to their original BM25 order so that they can be rescored alongside vectors results in the final [Reciprocal Ranking Function (RRF)](hybrid-search-ranking.md) ordering, where the scoring profile (identified as "final document boosting adjustment" in the illustration) is applied to the merged results, along with [vector weighting](vector-search-how-to-query.md#vector-weighting), and [semantic ranking](semantic-search-overview.md) as the last step.
Copy file name to clipboardExpand all lines: articles/search/query-simple-syntax.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -70,7 +70,7 @@ You can embed Boolean operators in a query string to improve the precision of a
70
70
71
71
## Prefix queries
72
72
73
-
For "starts with" queries, add a suffix operator (`*`) as the placeholder for the remainder of a term. A prefix query must begin with at least one alphanumeric character before you can add the suffix operator.
73
+
For "starts with" queries, add a suffix operator (`*`) as the placeholder for the remainder of a term. A prefix query must begin with at least one plain text character before you can add the suffix operator.
Copy file name to clipboardExpand all lines: articles/search/search-how-to-create-indexers.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ ms.service: azure-ai-search
11
11
ms.custom:
12
12
- ignite-2023
13
13
ms.topic: how-to
14
-
ms.date: 10/24/2024
14
+
ms.date: 12/17/2024
15
15
---
16
16
17
17
# Create an indexer in Azure AI Search
@@ -22,9 +22,9 @@ You can use an indexer to automate data import and indexing in Azure AI Search.
22
22
23
23
Indexers support two workflows:
24
24
25
-
+**Text-based indexing**: Extract strings and metadata from textual content for full text search scenarios.
25
+
+**Raw content indexing (plain text or vectors)**: Extract strings and metadata from textual content for full text search scenarios. Extracts raw vector content for vector search (for example, vectors in an Azure SQL database or Azure Cosmos DB collection). In this workflow, indexing occurs only over existing content that you provide.
26
26
27
-
+**Skills-based indexing**: Use built-in or custom skills that add integrated machine learning for analysis over images and large undifferentiated content, extracting or inferring text and structure. Skills-based indexing enables search over content that isn't otherwise easily full text searchable. To learn more, see [AI enrichment in Azure AI Search](cognitive-search-concept-intro.md).
27
+
+**Skills-based indexing**: Extends indexing through built-in or custom skills that create or generate new searchable content. For example, you can add integrated machine learning for analysis over images and unstructured text, extracting or inferring text and structure. Or, use skills to chunk and vectorize content from text and images. Skills-based indexing creates or generates new content that doesn't exist in your external data source. New content becomes part of your index when you add fields to the index schema that accepts the incoming data. To learn more, see [AI enrichment in Azure AI Search](cognitive-search-concept-intro.md).
28
28
29
29
## Prerequisites
30
30
@@ -38,11 +38,11 @@ Indexers support two workflows:
38
38
39
39
## Indexer patterns
40
40
41
-
When you create an indexer, the definition is one of two patterns: *text-based indexing* or *skills-based indexing*. The patterns are the same, except that skills-based indexing has more definitions.
41
+
When you create an indexer, the definition is one of two patterns: *content-based indexing* or *skills-based indexing*. The patterns are the same, except that skills-based indexing has more definitions.
42
42
43
-
### Indexer example for text-based indexing
43
+
### Indexer example for content-based indexing
44
44
45
-
Text-based indexing for full text search is the primary use case for indexers. For this workflow, an indexer looks like this example.
45
+
Content-based indexing for full text or vector search is the primary use case for indexers. For this workflow, an indexer looks like this example.
46
46
47
47
```json
48
48
{
@@ -114,16 +114,16 @@ Indexers work with data sets. When you run an indexer, it connects to your data
114
114
115
115
| Source data | Tasks |
116
116
|-------------|-------|
117
-
| JSON documents | Make sure the structure or shape of incoming data corresponds to the schema of your search index. Most search indexes are fairly flat, where the fields collection consists of fields at the same level. However, hierarchical or nested structures are possible through [complex fields and collections](search-howto-complex-data-types.md). |
117
+
| JSON documents |JSON documents can contain text, numbers, and vectors. Make sure the structure or shape of incoming data corresponds to the schema of your search index. Most search indexes are fairly flat, where the fields collection consists of fields at the same level. However, hierarchical or nested structures are possible through [complex fields and collections](search-howto-complex-data-types.md). |
118
118
| Relational | Provide data as a flattened row set, where each row becomes a full or partial search document in the index. <br><br> To flatten relational data into a row set, you should create a SQL view, or build a query that returns parent and child records in the same row. For example, the built-in hotels sample dataset is an SQL database that has 50 records (one for each hotel), linked to room records in a related table. The query that flattens the collective data into a row set embeds all of the room information in JSON documents in each hotel record. The embedded room information is a generated by a query that uses a **FOR JSON AUTO** clause. <br><br> You can learn more about this technique in [define a query that returns embedded JSON](index-sql-relational-data.md#define-a-query-that-returns-embedded-json). This is just one example; you can find other approaches that produce the same result. |
119
119
| Files | An indexer generally creates one search document for each file, where the search document consists of fields for content and metadata. Depending on the file type, the indexer can sometimes [parse one file into multiple search documents](search-howto-index-one-to-many-blobs.md). For example, in a CSV file, each row can become a standalone search document. |
120
120
121
121
Remember that you only need to pull in searchable and filterable data:
122
122
123
-
+ Searchable data is text
124
-
+ Filterable data is alphanumeric
123
+
+ Searchable data is text or vectors
124
+
+ Filterable data is text and numbers (non-vector fields)
125
125
126
-
Azure AI Search can't search over binary data in any format, although it can extract and infer text descriptions of image files (see [AI enrichment](cognitive-search-concept-intro.md)) to create searchable content. Likewise, large text can be broken down and analyzed by natural language models to find structure or relevant information, generating new content that you can add to a search document.
126
+
Azure AI Search can't do a full-text search over binary data in any format, although it can extract and infer text descriptions of image files (see [AI enrichment](cognitive-search-concept-intro.md)) to create searchable content. Likewise, large text can be broken down and analyzed by natural language models to find structure or relevant information, generating new content that you can add to a search document. It can also do vector search over embeddings, including quantized embeddings in a binary format.
127
127
128
128
Given that indexers don't fix data problems, other forms of data cleansing or manipulation might be needed. For more information, you should refer to the product documentation of your [Azure database product](/azure/?product=databases).
Copy file name to clipboardExpand all lines: articles/search/search-how-to-create-search-index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ Use this checklist to assist the design decisions for your search index.
64
64
65
65
1. Identify which source fields can be used as filters. Numeric content and short text fields, particularly those with repeating values, are good choices. When working with filters, remember:
66
66
67
-
+ Filters can be used in vector and nonvector queries, but the filter itself is applied to alphanumeric (nonvector) fields in your index.
67
+
+ Filters can be used in vector and nonvector queries, but the filter itself is applied to human-readable (nonvector) fields in your index.
68
68
69
69
+ Filterable fields can optionally be used in faceted navigation.
Copy file name to clipboardExpand all lines: articles/search/search-how-to-load-search-index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,11 +18,11 @@ This article explains how to import documents into a predefined search index. In
18
18
19
19
## How data import works
20
20
21
-
A search service accepts JSON documents that conform to the index schema. A search service imports and indexes plain text and vectors in JSON, used in full text search, vector search, hybrid search, and knowledge mining scenarios.
21
+
A search service accepts JSON documents that conform to the index schema. A search service can import and index plain text content and vector content in JSON documents.
22
22
23
-
+ Plain text content is obtainable from alphanumeric fields in the external data source, metadata that's useful in search scenarios, or enriched content created by a [skillset](cognitive-search-working-with-skillsets.md) (skills can extract or infer textual descriptions from images and unstructured content).
23
+
+ Plain text content is retrieved from fields in the external data source, from metadata properties, or from enriched content that's generated by a [skillset](cognitive-search-working-with-skillsets.md) (skills can extract or infer textual descriptions from images and unstructured content).
24
24
25
-
+ Vector content is vectorized using an [external embedding model](vector-search-how-to-generate-embeddings.md)or [integrated vectorization](vector-search-integrated-vectorization.md)using Azure AI Search features that integrate with applied AI.
25
+
+ Vector content is retrieved from a data source that provides it, or it's created by a skillset that implements [integrated vectorization](vector-search-integrated-vectorization.md)in an Azure AI Search indexer workload.
26
26
27
27
You can prepare these documents yourself, but if content resides in a [supported data source](search-indexer-overview.md#supported-data-sources), running an [indexer](search-indexer-overview.md) or using an Import wizard can automate document retrieval, JSON serialization, and indexing.
Copy file name to clipboardExpand all lines: articles/search/search-howto-incremental-index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,7 +134,7 @@ PUT https://[YOUR-SEARCH-SERVICE].search.windows.net/indexers/[YOUR-INDEXER-NAME
134
134
}
135
135
```
136
136
137
-
If you now issue another GET request on the indexer, the response from the service includes an `ID` property in the cache object. The alphanumeric string is appended to the name of the container containing all the cached results and intermediate state of each document processed by this indexer. The ID is used to uniquely name the cache in Blob storage.
137
+
If you now issue another GET request on the indexer, the response from the service includes an `ID` property in the cache object. The string is appended to the name of the container containing all the cached results and intermediate state of each document processed by this indexer. The ID is used to uniquely name the cache in Blob storage.
0 commit comments