Skip to content

Commit 3bfaf76

Browse files
feat: [google-cloud-discoveryengine] Added ranking_expression_backed and rank_signals fields related to the Custom Ranking feature (googleapis#14393)
BEGIN_COMMIT_OVERRIDE feat: Added `ranking_expression_backed` and `rank_signals` fields related to the Custom Ranking feature docs: A comment for field `ranking_expression` in messages `.google.cloud.discoveryengine.v1alpha.SearchRequest` and `.google.cloud.discoveryengine.v1beta.SearchRequest` is changed to support the Custom Ranking use case END_COMMIT_OVERRIDE - [ ] Regenerate this pull request now. docs: A comment for field `ranking_expression` in messages `.google.cloud.discoveryengine.v1alpha.SearchRequest` and `.google.cloud.discoveryengine.v1beta.SearchRequest` is changed to support the Custom Ranking use case PiperOrigin-RevId: 805306845 Source-Link: googleapis/googleapis@0ef98bc Source-Link: https://github.com/googleapis/googleapis-gen/commit/79a868820f3426367466456fc7e134a87b7c1ef1 Copy-Tag: eyJwIjoicGFja2FnZXMvZ29vZ2xlLWNsb3VkLWRpc2NvdmVyeWVuZ2luZS8uT3dsQm90LnlhbWwiLCJoIjoiNzlhODY4ODIwZjM0MjYzNjc0NjY0NTZmYzdlMTM0YTg3YjdjMWVmMSJ9 --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
1 parent 1111f8c commit 3bfaf76

File tree

9 files changed

+695
-11
lines changed

9 files changed

+695
-11
lines changed

packages/google-cloud-discoveryengine/google/cloud/discoveryengine_v1/types/search_service.py

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,112 @@ class SearchRequest(proto.Message):
301301
relevance_score_spec (google.cloud.discoveryengine_v1.types.SearchRequest.RelevanceScoreSpec):
302302
Optional. The specification for returning the
303303
relevance score.
304+
ranking_expression (str):
305+
The ranking expression controls the customized ranking on
306+
retrieval documents. This overrides
307+
[ServingConfig.ranking_expression][google.cloud.discoveryengine.v1.ServingConfig.ranking_expression].
308+
The syntax and supported features depend on the
309+
``ranking_expression_backend`` value. If
310+
``ranking_expression_backend`` is not provided, it defaults
311+
to ``RANK_BY_EMBEDDING``.
312+
313+
If
314+
[ranking_expression_backend][google.cloud.discoveryengine.v1.SearchRequest.ranking_expression_backend]
315+
is not provided or set to ``RANK_BY_EMBEDDING``, it should
316+
be a single function or multiple functions that are joined
317+
by "+".
318+
319+
- ranking_expression = function, { " + ", function };
320+
321+
Supported functions:
322+
323+
- double \* relevance_score
324+
- double \* dotProduct(embedding_field_path)
325+
326+
Function variables:
327+
328+
- ``relevance_score``: pre-defined keywords, used for
329+
measure relevance between query and document.
330+
- ``embedding_field_path``: the document embedding field
331+
used with query embedding vector.
332+
- ``dotProduct``: embedding function between
333+
``embedding_field_path`` and query embedding vector.
334+
335+
Example ranking expression:
336+
337+
::
338+
339+
If document has an embedding field doc_embedding, the ranking expression
340+
could be `0.5 * relevance_score + 0.3 * dotProduct(doc_embedding)`.
341+
342+
If
343+
[ranking_expression_backend][google.cloud.discoveryengine.v1.SearchRequest.ranking_expression_backend]
344+
is set to ``RANK_BY_FORMULA``, the following expression
345+
types (and combinations of those chained using + or
346+
347+
- operators) are supported:
348+
349+
- ``double``
350+
- ``signal``
351+
- ``log(signal)``
352+
- ``exp(signal)``
353+
- ``rr(signal, double > 0)`` -- reciprocal rank
354+
transformation with second argument being a denominator
355+
constant.
356+
- ``is_nan(signal)`` -- returns 0 if signal is NaN, 1
357+
otherwise.
358+
- ``fill_nan(signal1, signal2 | double)`` -- if signal1 is
359+
NaN, returns signal2 \| double, else returns signal1.
360+
361+
Here are a few examples of ranking formulas that use the
362+
supported ranking expression types:
363+
364+
- ``0.2 * semantic_similarity_score + 0.8 * log(keyword_similarity_score)``
365+
-- mostly rank by the logarithm of
366+
``keyword_similarity_score`` with slight
367+
``semantic_smilarity_score`` adjustment.
368+
- ``0.2 * exp(fill_nan(semantic_similarity_score, 0)) + 0.3 * is_nan(keyword_similarity_score)``
369+
-- rank by the exponent of ``semantic_similarity_score``
370+
filling the value with 0 if it's NaN, also add constant
371+
0.3 adjustment to the final score if
372+
``semantic_similarity_score`` is NaN.
373+
- ``0.2 * rr(semantic_similarity_score, 16) + 0.8 * rr(keyword_similarity_score, 16)``
374+
-- mostly rank by the reciprocal rank of
375+
``keyword_similarity_score`` with slight adjustment of
376+
reciprocal rank of ``semantic_smilarity_score``.
377+
378+
The following signals are supported:
379+
380+
- ``semantic_similarity_score``: semantic similarity
381+
adjustment that is calculated using the embeddings
382+
generated by a proprietary Google model. This score
383+
determines how semantically similar a search query is to a
384+
document.
385+
- ``keyword_similarity_score``: keyword match adjustment
386+
uses the Best Match 25 (BM25) ranking function. This score
387+
is calculated using a probabilistic model to estimate the
388+
probability that a document is relevant to a given query.
389+
- ``relevance_score``: semantic relevance adjustment that
390+
uses a proprietary Google model to determine the meaning
391+
and intent behind a user's query in context with the
392+
content in the documents.
393+
- ``pctr_rank``: predicted conversion rate adjustment as a
394+
rank use predicted Click-through rate (pCTR) to gauge the
395+
relevance and attractiveness of a search result from a
396+
user's perspective. A higher pCTR suggests that the result
397+
is more likely to satisfy the user's query and intent,
398+
making it a valuable signal for ranking.
399+
- ``freshness_rank``: freshness adjustment as a rank
400+
- ``document_age``: The time in hours elapsed since the
401+
document was last updated, a floating-point number (e.g.,
402+
0.25 means 15 minutes).
403+
- ``topicality_rank``: topicality adjustment as a rank. Uses
404+
proprietary Google model to determine the keyword-based
405+
overlap between the query and the document.
406+
- ``base_rank``: the default rank of the result
407+
ranking_expression_backend (google.cloud.discoveryengine_v1.types.SearchRequest.RankingExpressionBackend):
408+
The backend to use for the ranking expression
409+
evaluation.
304410
"""
305411

306412
class RelevanceThreshold(proto.Enum):
@@ -327,6 +433,23 @@ class RelevanceThreshold(proto.Enum):
327433
MEDIUM = 3
328434
HIGH = 4
329435

436+
class RankingExpressionBackend(proto.Enum):
437+
r"""The backend to use for the ranking expression evaluation.
438+
439+
Values:
440+
RANKING_EXPRESSION_BACKEND_UNSPECIFIED (0):
441+
Default option for unspecified/unknown
442+
values.
443+
RANK_BY_EMBEDDING (3):
444+
Ranking by custom embedding model, the
445+
default way to evaluate the ranking expression.
446+
RANK_BY_FORMULA (4):
447+
Ranking by custom formula.
448+
"""
449+
RANKING_EXPRESSION_BACKEND_UNSPECIFIED = 0
450+
RANK_BY_EMBEDDING = 3
451+
RANK_BY_FORMULA = 4
452+
330453
class ImageQuery(proto.Message):
331454
r"""Specifies the image query input.
332455
@@ -1540,6 +1663,15 @@ class RelevanceScoreSpec(proto.Message):
15401663
number=52,
15411664
message=RelevanceScoreSpec,
15421665
)
1666+
ranking_expression: str = proto.Field(
1667+
proto.STRING,
1668+
number=26,
1669+
)
1670+
ranking_expression_backend: RankingExpressionBackend = proto.Field(
1671+
proto.ENUM,
1672+
number=53,
1673+
enum=RankingExpressionBackend,
1674+
)
15431675

15441676

15451677
class SearchResponse(proto.Message):
@@ -1620,8 +1752,119 @@ class SearchResult(proto.Message):
16201752
model_scores (MutableMapping[str, google.cloud.discoveryengine_v1.types.DoubleList]):
16211753
Output only. Google provided available
16221754
scores.
1755+
rank_signals (google.cloud.discoveryengine_v1.types.SearchResponse.SearchResult.RankSignals):
1756+
A set of ranking signals associated with the
1757+
result.
16231758
"""
16241759

1760+
class RankSignals(proto.Message):
1761+
r"""A set of ranking signals.
1762+
1763+
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
1764+
1765+
Attributes:
1766+
keyword_similarity_score (float):
1767+
Keyword matching adjustment.
1768+
1769+
This field is a member of `oneof`_ ``_keyword_similarity_score``.
1770+
relevance_score (float):
1771+
Semantic relevance adjustment.
1772+
1773+
This field is a member of `oneof`_ ``_relevance_score``.
1774+
semantic_similarity_score (float):
1775+
Semantic similarity adjustment.
1776+
1777+
This field is a member of `oneof`_ ``_semantic_similarity_score``.
1778+
pctr_rank (float):
1779+
Predicted conversion rate adjustment as a
1780+
rank.
1781+
1782+
This field is a member of `oneof`_ ``_pctr_rank``.
1783+
topicality_rank (float):
1784+
Topicality adjustment as a rank.
1785+
1786+
This field is a member of `oneof`_ ``_topicality_rank``.
1787+
document_age (float):
1788+
Age of the document in hours.
1789+
1790+
This field is a member of `oneof`_ ``_document_age``.
1791+
boosting_factor (float):
1792+
Combined custom boosts for a doc.
1793+
1794+
This field is a member of `oneof`_ ``_boosting_factor``.
1795+
default_rank (float):
1796+
The default rank of the result.
1797+
custom_signals (MutableSequence[google.cloud.discoveryengine_v1.types.SearchResponse.SearchResult.RankSignals.CustomSignal]):
1798+
A list of custom clearbox signals.
1799+
"""
1800+
1801+
class CustomSignal(proto.Message):
1802+
r"""Custom clearbox signal represented by name and value pair.
1803+
1804+
Attributes:
1805+
name (str):
1806+
Name of the signal.
1807+
value (float):
1808+
Float value representing the ranking signal
1809+
(e.g. 1.25 for BM25).
1810+
"""
1811+
1812+
name: str = proto.Field(
1813+
proto.STRING,
1814+
number=1,
1815+
)
1816+
value: float = proto.Field(
1817+
proto.FLOAT,
1818+
number=2,
1819+
)
1820+
1821+
keyword_similarity_score: float = proto.Field(
1822+
proto.FLOAT,
1823+
number=1,
1824+
optional=True,
1825+
)
1826+
relevance_score: float = proto.Field(
1827+
proto.FLOAT,
1828+
number=2,
1829+
optional=True,
1830+
)
1831+
semantic_similarity_score: float = proto.Field(
1832+
proto.FLOAT,
1833+
number=3,
1834+
optional=True,
1835+
)
1836+
pctr_rank: float = proto.Field(
1837+
proto.FLOAT,
1838+
number=4,
1839+
optional=True,
1840+
)
1841+
topicality_rank: float = proto.Field(
1842+
proto.FLOAT,
1843+
number=6,
1844+
optional=True,
1845+
)
1846+
document_age: float = proto.Field(
1847+
proto.FLOAT,
1848+
number=7,
1849+
optional=True,
1850+
)
1851+
boosting_factor: float = proto.Field(
1852+
proto.FLOAT,
1853+
number=8,
1854+
optional=True,
1855+
)
1856+
default_rank: float = proto.Field(
1857+
proto.FLOAT,
1858+
number=32,
1859+
)
1860+
custom_signals: MutableSequence[
1861+
"SearchResponse.SearchResult.RankSignals.CustomSignal"
1862+
] = proto.RepeatedField(
1863+
proto.MESSAGE,
1864+
number=33,
1865+
message="SearchResponse.SearchResult.RankSignals.CustomSignal",
1866+
)
1867+
16251868
id: str = proto.Field(
16261869
proto.STRING,
16271870
number=1,
@@ -1642,6 +1885,11 @@ class SearchResult(proto.Message):
16421885
number=4,
16431886
message=common.DoubleList,
16441887
)
1888+
rank_signals: "SearchResponse.SearchResult.RankSignals" = proto.Field(
1889+
proto.MESSAGE,
1890+
number=7,
1891+
message="SearchResponse.SearchResult.RankSignals",
1892+
)
16451893

16461894
class Facet(proto.Message):
16471895
r"""A facet result.

0 commit comments

Comments
 (0)