You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-faceted-navigation-examples.md
+60-12Lines changed: 60 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,13 +9,15 @@ author: HeidiSteen
9
9
ms.author: heidist
10
10
ms.service: azure-ai-search
11
11
ms.topic: how-to
12
-
ms.date: 03/21/2025
12
+
ms.date: 03/31/2025
13
13
---
14
14
15
15
# Faceted navigation examples
16
16
17
17
This section extends [faceted navigation configuration](search-faceted-navigation.md) with examples that demonstrate basic usage and other scenarios.
18
18
19
+
Facetable fields are defined in an index, but facet parameters and expressions are defined in query requests. If you have an index with facetable fields, you can try new features like [facet hierarchs](#facet-hierarchy-example) and [aggregations](#facet-aggregation-example) on existing indexes.
20
+
19
21
## Facet parameters and syntax
20
22
21
23
Depending on the API, a facet query is usually an array of facet expressions that are applied to search results. Each facet expression contains a facetable field name, optionally followed by a comma-separated list of name-value pairs.
@@ -202,22 +204,22 @@ Results from this query are as follows:
202
204
203
205
Starting in [2025-03-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-03-01-preview&preserve-view=true) and available in the Azure portal, you can configure a facet hierarchy using the `>` and `;` operators.
204
206
205
-
The nesting (hierarchical) operator `>` denotes a parent–child relationship, and the semicolon operator `;` denotes children of a shared parent. The parent must contain only one field. Both the parent and child fields must be facetable.
207
+
The nesting (hierarchical) operator `>` denotes a parent–child relationship, and the semicolon operator `;` denotes multiple fields at the same nesting level, which are all children of the same parent. The parent must contain only one field. Both the parent and child fields must be `facetable`.
206
208
207
209
The order of operations in a facet expression that includes facet hierarchies are:
208
210
209
211
* options operator (comma `,`) that separates facet parameters for the facet field, such as the comma in `Rooms/BaseRate,values`
210
-
* parentheses, such as the ones enclosing `Rooms/BaseRate`.
212
+
* parentheses, such as the ones enclosing `(Rooms/BaseRate,values:50 ; Rooms/Type)`.
211
213
* nesting operator (angled bracket `>`)
212
-
* append operator (semicolon `;`), demonstrated in a second example `"Tags>(Rooms/BaseRate,values:50;Rooms/Type)"` in this section, where two child facets are peers under the Tags parent.
214
+
* append operator (semicolon `;`), demonstrated in a second example `"Tags>(Rooms/BaseRate,values:50 ; Rooms/Type)"` in this section, where two child facets are peers under the Tags parent.
213
215
214
-
Here's a query that returns just a few documents, which is helpful for viewing a full response. Facets count the parent document (Hotels) and not intermediate subdocuments (Rooms), so the response determines the number of *hotels* that have any rooms in each facet bucket.
216
+
There are several examples for facet hierarchies. The first example is a query that returns just a few documents, which is helpful for viewing a full response. Facets count the parent document (Hotels) and not intermediate subdocuments (Rooms), so the response determines the number of *hotels* that have any rooms in each facet bucket.
215
217
216
218
```rest
217
219
POST /indexes/hotels-sample-index/docs/search?api-version=2025-03-01-Preview
@@ -371,13 +373,13 @@ Results from this query are as follows. Both hotels have pools. For other tags,
371
373
}
372
374
```
373
375
374
-
This example extends the previous one, demonstrating multiple top-level facets with multiple children. Notice the semicolon (`;`) operator separates each child.
376
+
This second example extends the previous one, demonstrating multiple top-level facets with multiple children. Notice the semicolon (`;`) operator separates each child.
375
377
376
378
```rest
377
379
POST /indexes/hotels-sample-index/docs/search?api-version=2025-03-01-Preview
If you remove the innermost parentheses, Category and Rating are no longer siblings because the precedence rules mean that the `>` operator is evaluated before `;`.
@@ -438,7 +484,7 @@ Facet filtering enables you to constrain the facet values returned to those matc
438
484
*`includeTermFilter` filters the facet values to those that match the regular expression
439
485
*`excludeTermFilter` filters the facet values to those that don't match the regular expression
440
486
441
-
If a facet string satisfies both conditions, the `excludeTermFilter` takes precedence. Otherwise, the set of bucket strings are first evaluated with `includeTermFilter` and then excluded with `excludeTermFilter`.
487
+
If a facet string satisfies both conditions, the `excludeTermFilter` takes precedence because the set of bucket strings is first evaluated with `includeTermFilter` and then excluded with `excludeTermFilter`.
442
488
443
489
Only those facet values that match the regular expression are returned. You can combine these parameters with other facet options (for example, `count`, `sort`, and [hierarchical faceting](#facet-hierarchy-example)) on string fields.
444
490
@@ -449,7 +495,7 @@ The following example shows how to escape special characters in your regular exp
@@ -556,7 +602,9 @@ The following example is an abbreviated response (hotel documents are omitted fo
556
602
557
603
Starting in [2025-03-01-preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-03-01-preview&preserve-view=true) and available in the Azure portal, you can aggregate facets.
558
604
559
-
Facet aggregations allow you to compute metrics from facet values. The aggregation capability works alongside the existing faceting options. The only supported metric is `sum`. Adding `metric: sum` to a numeric facet aggregates all the values of each bucket.
605
+
Facet aggregations allow you to compute metrics from facet values. The aggregation capability works alongside the existing faceting options. The only supported metric is `sum`. Adding `metric: sum` to a numeric facet aggregates all the values of each bucket.
606
+
607
+
You can add a default value to use if a document contains a null for that field: `"facets": [ "Rooms/SleepsCount, metric: sum, default:2"]`. If a room has a null value for the Rooms/SleepsCount field, the default substitutes for the missing value.
560
608
561
609
You can sum any facetable field of a numeric data type (except vectors and geographic coordinates).
Copy file name to clipboardExpand all lines: articles/search/search-faceted-navigation.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
8
8
ms.author: heidist
9
9
ms.service: azure-ai-search
10
10
ms.topic: how-to
11
-
ms.date: 03/21/2025
11
+
ms.date: 03/31/2025
12
12
---
13
13
14
14
# Add faceted navigation to search results
@@ -121,7 +121,7 @@ Facets can be calculated over single-value fields and collections. Fields that w
121
121
* Low cardinality (a few distinct values that repeat throughout documents in your search corpus).
122
122
* Short descriptive values (one or two words) that render nicely in a navigation tree.
123
123
124
-
The values within a field, and not the field name itself, produce the facets in a faceted navigation structure. If the facet is a string field named *Color*, facets are blue, green, and any other value for that field. As a best practice, review field values to ensure there are no typos, nulls, or casing differences. Consider [assigning a normalizer](search-normalizers.md) to a "filterable" and "facetable" field to smooth out minor variations in the text.
124
+
The values within a field, and not the field name itself, produce the facets in a faceted navigation structure. If the facet is a string field named *Color*, facets are blue, green, and any other value for that field. As a best practice, review field values to ensure there are no typos, nulls, or casing differences. Consider [assigning a normalizer](search-normalizers.md) to a filterable and facetable field to smooth out minor variations in the text. For example, "Canada", "CANADA", and "canada" would all be normalized to one bucket.
125
125
126
126
You can't set facets on existing fields, on vector fields, or fields of type `Edm.GeographyPoint` or `Collection(Edm.GeographyPoint)`.
127
127
@@ -220,7 +220,7 @@ Here's a screenshot of the [basic facet query example](search-faceted-navigation
220
220
|`interval`| An integer interval greater than zero for numbers, or minute, hour, day, week, month, quarter, year for date time values. For example, `"facet=baseRate,interval:100"` produces buckets based on base rate ranges of size 100. If base rates are all between $60 and $600, there are buckets for 0-100, 100-200, 200-300, 300-400, 400-500, and 500-600. The string `"facet=lastRenovationDate,interval:year"` produces one bucket for each year when hotels were renovated. |
221
221
|`timeoffset`| Can be set to (`[+-]hh:mm, [+-]hhmm, or [+-]hh`). If used, the `timeoffset` parameter must be combined with the interval option, and only when applied to a field of type `Edm.DateTimeOffset`. The value specifies the UTC time offset to account for in setting time boundaries. For example: `"facet=lastRenovationDate,interval:day,timeoffset:-01:00"` uses the day boundary that starts at 01:00:00 UTC (midnight in the target time zone). |
222
222
223
-
`count` and `sort` can be combined in the same facet specification, but they can't be combined with interval or values, and interval and values can't be combined together.
223
+
`count` and `sort` can be combined in the same facet specification, but they can't be combined with `interval` or `values`, and `interval` and `values` can't be combined together.
224
224
225
225
Interval facets on date time are computed based on the UTC time if `timeoffset` isn't specified. For example, for `"facet=lastRenovationDate,interval:day"`, the day boundary starts at 00:00:00 UTC.
226
226
@@ -250,6 +250,10 @@ Remember that you can't use `Edm.GeographyPoint` or `Collection(Edm.GeographyPoi
250
250
251
251
As you prepare data for indexing, check fields for null values, misspellings or case discrepancies, and single and plural versions of the same word. By default, filters and facets don't undergo lexical analysis or [spell check](speller-how-to-add.md), which means that all values of a "facetable" field are potential facets, even if the words differ by one character. Optionally, you can [assign a normalizer](search-normalizers.md) to a "filterable" and "facetable" field to smooth out variations in casing and characters.
252
252
253
+
### Ordering facet buckets
254
+
255
+
Although you can sort within a bucket, there's no parameters for controlling the order of facet buckets in the navigation structure as a whole. If you want facet buckets in a specific order, you must provide it in application code.
256
+
253
257
### Discrepancies in facet counts
254
258
255
259
Under certain circumstances, you might find that facet counts aren't fully accurate due to the [sharding architecture](index-similarity-and-scoring.md#sharding-effects-on-query-results). Every search index is spread across multiple shards, and each shard reports the top N facets by document count, which are then combined into a single result. Because it's just the top N facets for each shard, it's possible to miss or under-count matching documents in the facet response.
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-chunk-documents.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.service: azure-ai-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: conceptual
12
-
ms.date: 03/11/2025
12
+
ms.date: 03/31/2025
13
13
---
14
14
15
15
# Chunk large documents for vector search solutions in Azure AI Search
@@ -20,7 +20,9 @@ We recommend [integrated vectorization](vector-search-integrated-vectorization.m
20
20
21
21
## Common chunking techniques
22
22
23
-
Chunking is only required if the source documents are too large for the maximum input size imposed by models. Here are some common chunking techniques, associated with built-in features if you use [indexers](search-indexer-overview.md) and [skills](cognitive-search-working-with-skillsets.md).
23
+
Chunking is only required if the source documents are too large for the maximum input size imposed by models, but it's also beneficial if content is poorly represented as a single vector. Consider a wiki page that covers a lot of varied sub-topics. The entire page might be small enough to meet model input requirements, but you might get better results if you chunk at a finer grain.
24
+
25
+
Here are some common chunking techniques, associated with built-in features if you use [indexers](search-indexer-overview.md) and [skills](cognitive-search-working-with-skillsets.md).
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-quantization.md
+12-6Lines changed: 12 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,15 +52,15 @@ Rescoring is technique used to offset information loss due to vector compression
52
52
Rescoring applies to:
53
53
54
54
- scalar quantization using Hierarchical Navigable Small World (HNSW) graphs for similarity search
55
-
- binary quantization using HNSW graphs
55
+
- binary quantization, also using HNSW graphs
56
56
57
57
Exhaustive K Nearest Neighbors (eKNN) doesn't support rescoring.
58
58
59
59
Rescoring occurs when you set a rescoring option in the index vector configuration:
60
60
61
61
- In version 2024-07-01, set `rerankWithOriginalVectors`
62
62
- In version 2024-11-01-preview, set `rescoringOptions.enableRescoring` and `rescoreStorageMethod.preserveOriginals`
63
-
- In version 2025-03-01-preview, set `rescoringOptions.enableRescoring` and `rescoringOptions.rescoreStorageMethod=preserveOriginals` for scalar quantization, or `rescoringOptions.enableRescoring` for binary quantization.
63
+
- In version 2025-03-01-preview, set `rescoringOptions.enableRescoring` and `rescoringOptions.rescoreStorageMethod=preserveOriginals` for scalar or binary quantization, or `rescoringOptions.enableRescoring`and `rescoringOptions.rescoreStorageMethod=discardOriginals`for binary quantization only
64
64
65
65
The generalized process for rescoring is:
66
66
@@ -372,11 +372,11 @@ Each component of the vector is mapped to the closest representative value withi
372
372
373
373
Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
374
374
375
-
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found BQ performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
375
+
It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found binary quantization performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
376
376
377
377
## Query a quantized vector field using oversampling
378
378
379
-
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or rescoring with original vectors.
379
+
Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling and rescoring. You can add an o`versampling` parameter to invoke oversampling and rescoring at query time.
380
380
381
381
### [**2024-07-01**](#tab/query-2024-07-01)
382
382
@@ -430,9 +430,9 @@ POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?ap
430
430
431
431
**Key points**:
432
432
433
-
-Applies to vector fields that undergo vector compression, per the vector profile assignment.
433
+
-Oversampling applies to vector fields that undergo vector compression, per the vector profile assignment.
434
434
435
-
-Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
435
+
-Oversampling in the query overrides the `defaultOversampling` value in the index, or invokes oversampling and rescoring at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
@@ -454,4 +454,10 @@ POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?ap
454
454
}
455
455
```
456
456
457
+
**Key points**:
458
+
459
+
- Oversampling applies to vector fields that undergo vector compression, per the vector profile assignment.
460
+
461
+
- Oversampling in the query overrides the `defaultOversampling` value in the index, or invokes oversampling and rescoring at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
0 commit comments