Skip to content

Commit e44f9d6

Browse files
authored
Merge pull request #6222 from HeidiSteen/heidist-freshness
[azure search] updated per the pablo/farzad/openai-guy thread
2 parents 34c119e + fec35c3 commit e44f9d6

File tree

5 files changed

+96
-38
lines changed

5 files changed

+96
-38
lines changed

articles/search/index-add-scoring-profiles.md

Lines changed: 71 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: how-to
13-
ms.date: 07/23/2025
13+
ms.date: 07/25/2025
1414
---
1515

1616
# Add scoring profiles to boost search scores
@@ -27,7 +27,7 @@ You can add a scoring profile to an index by editing its JSON definition in the
2727

2828
## Prerequisites
2929

30-
+ A search index with text or numeric fields.
30+
+ A search index with text or numeric (nonvector) fields.
3131

3232
## Rules for scoring profiles
3333

@@ -47,7 +47,7 @@ You can use [semantic ranker](semantic-how-to-query-request.md) with scoring pro
4747

4848
A scoring profile is defined in an index schema. It consists of weighted fields, functions, and parameters.
4949

50-
The following definition shows a simple profile named "geo". This example boosts results that have the search term in the hotelName field. It also uses the `distance` function to favor results that are within 10 kilometers of the current location. If someone searches on the term 'inn', and 'inn' happens to be part of the hotel name, documents that include hotels with 'inn' within a 10 kilometer radius of the current location appear higher in the search results.
50+
The following definition shows a simple profile named "geo". This example boosts results that have the search term in the hotelName field. It also uses the `distance` function to favor results that are within 10 kilometers of the current location. If someone searches on the term 'inn', and 'inn' happens to be part of the hotel name, documents that include hotels with 'inn' within a 10-kilometer radius of the current location appear higher in the search results.
5151

5252
```json
5353
"scoringProfiles": [
@@ -87,7 +87,7 @@ POST /indexes/hotels/docs&api-version=2024-07-01
8787

8888
Query parameters, including `scoringParameters`, are described in [Search Documents (REST API)](/rest/api/searchservice/documents/search-post).
8989

90-
For more scenarios, see the [example](#example-of-a-scoring-profile) in this article.
90+
For more scenarios, see the examples for [freshness and distance](#example-boosting-by-freshness-or-distance) and [weighted text and functions](#example-boosting-by-weighted-text-and-functions) in this article.
9191

9292
## Add a scoring profile to a search index
9393

@@ -122,7 +122,7 @@ Scoring profiles can be defined in the Azure portal as shown in the following sc
122122
"functions": (optional) [
123123
{
124124
"type": "magnitude | freshness | distance | tag",
125-
"boost": # (positive number used as multiplier for raw score != 1),
125+
"boost": # (positive or negative number used as multiplier for raw score != 1),
126126
"fieldName": "(...)",
127127
"interpolation": "constant | linear (default) | quadratic | logarithmic",
128128

@@ -160,7 +160,7 @@ Scoring profiles can be defined in the Azure portal as shown in the following sc
160160

161161
## Use text-weighted fields
162162

163-
Use text-weighted fields when field context is important and queries include `searchable` string fields. For example, if a query includes the term "airport", you might want "airport" in the HotelName field than the Description field.
163+
Use text-weighted fields when field context is important and queries include `searchable` string fields. For example, if a query includes the term "airport", you might want "airport" in the HotelName field rather than the Description field.
164164

165165
Weighted fields are name-value pairs composed of a `searchable` field and a positive number that is used as a multiplier. If the original field score of HotelName is 3, the boosted score for that field becomes 6, contributing to a higher overall score for the parent document itself.
166166

@@ -186,8 +186,12 @@ Use functions when simple relative weights are insufficient or don't apply, as i
186186
|-|-|
187187
| distance | Boost by proximity or geographic location. This function can only be used with `Edm.GeographyPoint` fields. | Use for "find near me" scenarios. |
188188
| freshness | Boost by values in a datetime field (`Edm.DateTimeOffset`). [Set boostingDuration](#set-boostingduration-for-freshness-function) to specify a value representing a timespan over which boosting occurs. | Use when you want to boost by newer or older dates. Rank items like calendar events with future dates such that items closer to the present can be ranked higher than items further in the future. One end of the range is fixed to the current time. To boost a range of times in the past, use a positive boostingDuration. To boost a range of times in the future, use a negative boostingDuration. |
189-
| magnitude | Alter rankings based on the range of values for a numeric field. The value must be an integer or floating-point number. For star ratings of 1 through 4, this would be 1. For margins over 50%, this would be 50. This function can only be used with `Edm.Double` and `Edm.Int` fields. For the magnitude function, you can reverse the range, high to low, if you want the inverse pattern (for example, to boost lower-priced items more than higher-priced items). Given a range of prices from $100 to $1, you would set `boostingRangeStart` at 100 and `boostingRangeEnd` at 1 to boost the lower-priced items. | Use when you want to boost by profit margin, ratings, clickthrough counts, number of downloads, highest price, lowest price, or a count of downloads. When two items are relevant, the item with the higher rating will be displayed first. |
190-
| tag | Boost by tags that are common to both search documents and query strings. Tags are provided in a `tagsParameter`. This function can only be used with search fields of type `Edm.String` and `Collection(Edm.String)`. | Use when you have tag fields. If a given tag within the list is itself a comma-delimited list, you can [use a text normalizer](search-normalizers.md) on the field to strip out the commas at query time (map the comma character to a space). This approach will "flatten" the list so that all terms are a single, long string of comma-delimited terms. |
189+
| magnitude | Alter rankings based on the range of values for a numeric field. The value must be an integer or floating-point number. For star ratings of 1 through 4, this would be 1. For margins over 50%, this would be 50. This function can only be used with `Edm.Double` and `Edm.Int` fields. For the magnitude function, you can reverse the range, high to low, if you want the inverse pattern (for example, to boost lower-priced items more than higher-priced items). Given a range of prices from $100 to $1, you would set `boostingRangeStart` at 100 and `boostingRangeEnd` at 1 to boost the lower-priced items. | Use when you want to boost by profit margin, ratings, clickthrough counts, number of downloads, highest price, lowest price, or a count of downloads. When two items are relevant, the item with the higher rating is displayed first. |
190+
| tag | Boost by tags that are common to both search documents and query strings. Tags are provided in a `tagsParameter`. This function can only be used with search fields of type `Edm.String` and `Collection(Edm.String)`. | Use when you have tag fields. If a given tag within the list is itself a comma-delimited list, you can [use a text normalizer](search-normalizers.md) on the field to strip out the commas at query time (map the comma character to a space). This approach "flattens" the list so that all terms are a single, long string of comma-delimited terms. |
191+
192+
Magnitude is the computed distance between a field's value (such as a date or location) and a reference point (such as "now" or a target location). It's the input to the scoring function and determines how much boost is applied.
193+
194+
Freshness and distance scoring are special cases of magnitude-based scoring, where the magnitude is automatically computed from a datetime or geographic field. For intuitive boosting that promotes newer or closer values over older or farther values, use a negative boost value (see the [example](#example-boosting-by-freshness-or-distance) for more details).
191195

192196
### Rules for using functions
193197

@@ -198,20 +202,24 @@ Use functions when simple relative weights are insufficient or don't apply, as i
198202

199203
### Set interpolations
200204

201-
Interpolations set the shape of the slope used for scoring. Because scoring is high to low, the slope is always decreasing, but the interpolation determines the curve of the downward slope. The following interpolations can be used:
205+
Interpolations set the shape of the slope used for boosting freshness and distance.
206+
207+
When the boost value is positive, scoring is high to low, and the slope is always decreasing. With negative boosts, the slope is increasing (newer documents get higher scores). The interpolation values determines the curve of the upward or downward slope and how aggressively the boost score changes in response to date or distance changes. The following interpolations can be used:
202208

203209
| Interpolation | Description |
204210
|-|-|
205-
|`linear`|For items that are within the max and min range, boosting is applied in a constantly decreasing amount. Linear is the default interpolation for a scoring profile.|
206-
|`constant`|For items that are within the start and ending range, a constant boost is applied to the rank results.|
207-
|`quadratic`|In comparison to a linear interpolation that has a constantly decreasing boost, Quadratic initially decreases at smaller pace and then as it approaches the end range, it decreases at a much higher interval. This interpolation option isn't allowed in tag scoring functions.|
208-
|`logarithmic` |In comparison to a linear interpolation that has a constantly decreasing boost, logarithmic initially decreases at higher pace and then as it approaches the end range, it decreases at a much smaller interval. This interpolation option isn't allowed in tag scoring functions.|
211+
|`linear`|For items that are within the max and min range, boosting is applied in a constantly decreasing amount. A negative boost penalizes older documents proportionally. Good for gradual decay in relevance. Linear is the default interpolation for a scoring profile.|
212+
|`constant`|For items that are within the start and ending range, a constant boost is applied to the rank results. For freshness and distance, applies the same negative boost to all documents within the range. Use this when you want a flat penalty regardless of age.|
213+
|`quadratic`|Quadratic initially decreases at smaller pace and then as it approaches the end range, it decreases at a much higher interval. For negative boosting, it penalizes older documents increasingly more as they age. Use this when you want to strongly favor the most recent documents and sharply demote older ones. This interpolation option isn't allowed in the tag scoring function.|
214+
|`logarithmic` |Logarithmic initially decreases at higher pace and then as it approaches the end range, it decreases at a much smaller interval. For negative boosting, it penalizes older documents more sharply at first, then tapers off. Ideal when you want strong preference for very recent content but less sensitivity as documents age. This interpolation option isn't allowed in the tag scoring function.|
209215

210-
![Constant, linear, quadratic, log10 lines on graph](media/scoring-profiles/azuresearch_scorefunctioninterpolationgrapht.png "AzureSearch_ScoreFunctionInterpolationGrapht")
216+
<!-- ![Constant, linear, quadratic, log10 lines on graph](media/scoring-profiles/azuresearch_scorefunctioninterpolationgrapht.png "AzureSearch_ScoreFunctionInterpolationGrapht") -->
217+
218+
:::image type="content" source="media/scoring-profiles/interpolation-graph.png" alt-text="Diagram of slope shapes for constant, linear, logarithmic, and quadratic interpolations over a 365 day range":::
211219

212220
### Set boostingDuration for freshness function
213221

214-
`boostingDuration` is an attribute of the `freshness` function. You use it to set an expiration period after which boosting will stop for a particular document. For example, to boost a product line or brand for a 10-day promotional period, you would specify the 10-day period as "P10D" for those documents.
222+
`boostingDuration` is an attribute of the `freshness` function. You use it to set an expiration period after which boosting stops for a particular document. For example, to boost a product line or brand for a 10-day promotional period, you would specify the 10-day period as "P10D" for those documents.
215223

216224
`boostingDuration` must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an ISO 8601 duration value). The pattern for this is: "P[nD][T[nH][nM][nS]]".
217225

@@ -222,18 +230,61 @@ The following table provides several examples.
222230
|1 day|"P1D"|
223231
|2 days and 12 hours|"P2DT12H"|
224232
|15 minutes|"PT15M"|
225-
|30 days, 5 hours, 10 minutes, and 6.334 seconds|"P30DT5H10M6.334S"|
233+
|30 days, 5 hours, 10 minutes, and 6.334 seconds|"P30DT5H10M6.334S"|
234+
|1 year | "365D" |
226235

227236
For more examples, see [XML Schema: Datatypes (W3.org web site)](https://www.w3.org/TR/xmlschema11-2/#dayTimeDuration).
228237

229-
## Example of a scoring profile
238+
## Example: boosting by freshness or distance
239+
240+
In Azure AI Search, freshness scoring converts date and values into a numeric magnitude—a single number representing how far a document's date is from the current time. The older the date, the larger the magnitude. This leads to a counter-intuitive behavior: more recent documents have smaller magnitudes, which means that positive boosting factors favor older documents unless you explicitly invert the boost direction.
241+
242+
This same logic applies to distance boosting, where farther locations yield larger magnitudes.
243+
244+
To boost by freshness or distance, use negative boosting values to prioritize newer dates or closer locations. Inverting the boost direction through a negative boosting factor penalizes larger magnitudes (older dates), effectively boosting more recent ones. For example, assume a boosting function like `b * (1 - x)` (where `x` is the normalized magnitude from 0 to 1) that gives higher scores to smaller magnitudes (that is, newer dates).
245+
246+
The shape of the boost curve (constant, linear, logarithmic, quadratic) affects how aggressively scores change across the range. With a negative factor, the curve’s behavior flips—for example, a quadratic curve tapers off more slowly for older dates, while a logarithmic curve shifts more sharply at the far end.
247+
248+
Here's an example scoring profile that demonstrates how to address counter-intuitive freshness scoring using negative boosting and explains how magnitude works in this context.
249+
250+
```json
251+
252+
"scoringProfiles": [
253+
{
254+
"name": "freshnessBoost",
255+
"text": {
256+
"weights": {
257+
"content": 1.0
258+
}
259+
},
260+
"functions": [
261+
{
262+
"type": "freshness",
263+
"fieldName": "lastUpdated",
264+
"boost": -2.0,
265+
"interpolation": "quadratic",
266+
"parameters": {
267+
"boostingDuration": "365D"
268+
}
269+
}
270+
]
271+
}
272+
]
273+
```
274+
275+
+ `"fieldName": "lastUpdated"` is the datetime field used to calculate freshness.
276+
+ `"boost": -2.0` is a negative boosting factor, which inverts the default behavior. Since older dates have larger magnitudes, this penalizes them and boosts newer documents.
277+
+ `"interpolation": "quadratic"` means the boost effect is stronger for documents closer to the current date and tapers off more sharply for older ones.
278+
+ `"boostingDuration": "365D"` defines the time window over which freshness is evaluated.
279+
280+
## Example: boosting by weighted text and functions
230281

231282
> [!TIP]
232283
> See this [blog post](https://farzzy.hashnode.dev/enhance-azure-ai-search-document-boosting) and [notebook](https://github.com/farzad528/azure-ai-search-python-playground/blob/main/azure-ai-search-document-boosting.ipynb) for a demonstration of using scoring profiles and document boosting in vector and generative AI scenarios.
233284
234-
The following example shows the schema of an index with two scoring profiles (`boostGenre`, `newAndHighlyRated`). Any query against this index that includes either profile as a query parameter will use the profile to score the result set.
285+
The following example shows the schema of an index with two scoring profiles (`boostGenre`, `newAndHighlyRated`). Any query against this index that includes either profile as a query parameter uses the profile to score the result set.
235286

236-
The `boostGenre` profile uses weighted text fields, boosting matches found in albumTitle, genre, and artistName fields. The fields are boosted 1.5, 5, and 2 respectively. Why is genre boosted so much higher than the others? If search is conducted over data that is somewhat homogeneous (as is the case with 'genre' in the musicstoreindex), you might need a larger variance in the relative weights. For example, in the musicstoreindex, 'rock' appears as both a genre and in identically phrased genre descriptions. If you want genre to outweigh genre description, the genre field will need a much higher relative weight.
287+
The `boostGenre` profile uses weighted text fields, boosting matches found in albumTitle, genre, and artistName fields. The fields are boosted 1.5, 5, and 2 respectively. Why is genre boosted so much higher than the others? If search is conducted over data that is somewhat homogeneous (as is the case with 'genre' in the musicstoreindex), you might need a larger variance in the relative weights. For example, in the musicstoreindex, 'rock' appears as both a genre and in identically phrased genre descriptions. If you want genre to outweigh genre description, the genre field needs a much higher relative weight.
237288

238289
```json
239290
{
@@ -270,7 +321,7 @@ The `boostGenre` profile uses weighted text fields, boosting matches found in al
270321
{
271322
"type": "freshness",
272323
"fieldName": "lastUpdated",
273-
"boost": 10,
324+
"boost": -10,
274325
"interpolation": "quadratic",
275326
"freshness": {
276327
"boostingDuration": "P365D"
43 KB
Loading

articles/search/query-lucene-syntax.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Proximity searches are used to find terms that are near each other in a document
130130

131131
Term boosting refers to ranking a document higher if it contains the boosted term, relative to documents that don't contain the term. This differs from scoring profiles in that scoring profiles boost certain fields, rather than specific terms.
132132

133-
The following example helps illustrate the differences. Suppose that there's a scoring profile that boosts matches in a certain field, say *genre* in the [musicstoreindex example](index-add-scoring-profiles.md#example-of-a-scoring-profile). Term boosting could be used to further boost certain search terms higher than others. For example, `rock^2 electronic` boosts documents that contain the search terms in the genre field higher than other searchable fields in the index. Further, documents that contain the search term *rock* are ranked higher than the other search term *electronic* as a result of the term boost value (2).
133+
The following example helps illustrate the differences. Suppose that there's a scoring profile that boosts matches in a certain field, say *genre* in the [musicstoreindex example](index-add-scoring-profiles.md#example-boosting-by-weighted-text-and-functions). Term boosting could be used to further boost certain search terms higher than others. For example, `rock^2 electronic` boosts documents that contain the search terms in the genre field higher than other searchable fields in the index. Further, documents that contain the search term *rock* are ranked higher than the other search term *electronic* as a result of the term boost value (2).
134134

135135
To boost a term, use the caret, `^`, symbol with a boost factor (a number) at the end of the term you're searching. You can also boost phrases. The higher the boost factor, the more relevant the term is relative to other search terms. By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, 0.20).
136136

0 commit comments

Comments
 (0)