Skip to content

Commit c163923

Browse files
authored
Add seeded and earlyTermination examples in tutorial-vectors doc (#3797)
1 parent 512e02a commit c163923

File tree

1 file changed

+35
-24
lines changed

1 file changed

+35
-24
lines changed

solr/solr-ref-guide/modules/getting-started/pages/tutorial-vectors.adoc

Lines changed: 35 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ $ curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:applicat
6767
"type":"pdate",
6868
"stored":true
6969
}
70-
]
70+
]
7171
}'
7272
----
7373

@@ -81,22 +81,22 @@ $ bin/solr post -c films example/films/films.json
8181
----
8282

8383
=== Let's do some Vector searches
84-
Before making the queries, we define an example target vector, simulating a person that
85-
watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the Chamber of Secrets_.
86-
We get the vector of each movie, then calculate the resulting average vector, which will
84+
Before making the queries, we define an example target vector, simulating a person that
85+
watched 3 movies: _Finding Nemo_, _Bee Movie_, and _Harry Potter and the Chamber of Secrets_.
86+
We get the vector of each movie, then calculate the resulting average vector, which will
8787
be used as the input vector for all the following example queries.
88-
88+
8989
```
9090
[-0.1784, 0.0096, -0.1455, 0.4167, -0.1148, -0.0053, -0.0651, -0.0415, 0.0859, -0.1789]
9191
```
9292

9393
[NOTE]
9494
====
95-
Interested in calculating the vector using Solr's xref:query-guide:streaming-expressions.adoc[streaming capability]?
95+
Interested in calculating the vector using Solr's xref:query-guide:streaming-expressions.adoc[streaming capability]?
9696
Here is an example of a streaming expression that you can run via the xref:query-guide:stream-screen.adoc[Solr Admin Stream UI]:
9797
```
9898
let(
99-
a=select(
99+
a=select(
100100
search(films,
101101
qt="/select",
102102
q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and the Chamber of Secrets"",
@@ -141,43 +141,54 @@ The output is:
141141

142142
// Solr URL examples below all have [ and ] characters which, when used with Curl, causes encoding issues so just putting plain http links
143143

144-
Search for the top 10 movies most similar to the target vector that we previously calculated (KNN Query for recommendation):
144+
**KNN Query for recommendation** - Search for the top 10 movies most similar to the target vector that we previously calculated:
145+
146+
http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
147+
148+
- Notice that among the results, there are some animation family movies, such as _Curious George_ and _Bambi_, which makes sense, since the target vector was created with two other animation family movies (_Finding Nemo_ and _Bee Movie_).
149+
- We also notice that among the results there are two movies that the person already watched. In the next example we will filter them out.
150+
151+
**KNN query with Filter Query** - Search for the top 10 movies most similar to the resulting vector, excluding the movies already watched:
152+
153+
http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("/en/finding_nemo" "/en/bee_movie" "/en/harry_potter_and_the_chamber_of_secrets_2002")
154+
155+
**KNN as Filter Query** - Search for movies with "cinderella" in the name among the top 50 movies most similar to the target vector:
156+
157+
http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn f=film_vector topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
145158

146-
'http://localhost:8983/solr/films/query?q={%21knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]'
159+
- There are 3 "cinderella" movies in the index, but only 1 is among the top 50 most similar to the target vector (_Cinderella III: A Twist in Time_).
147160

148-
* Notice that among the results, there are some animation family movies, such as _Curious George_ and _Bambi_, which makes sense, since the target vector was created with two other animation family movies (_Finding Nemo_ and _Bee Movie_).
149-
* We also notice that among the results there are two movies that the person already watched. In the next example we will filter them out.
161+
*KNN with SeededQuery* - Search for the top 10 movies most similar to the target vector, guided by a seed lexical query on the `genre` field, which provides the initial entry points in the vector graph search:
150162

151-
Search for the top 10 movies most similar to the resulting vector, excluding the movies already watched (KNN query with Filter Query):
163+
http://localhost:8983/solr/films/query?seedQuery=genre:Family&q={!knn f=film_vector topK=10 seedQuery=$seedQuery}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
152164

153-
http://localhost:8983/solr/films/query?q={!knn%20f=film_vector%20topK=10}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq=-id:("%2Fen%2Ffinding_nemo"%20"%2Fen%2Fbee_movie"%20"%2Fen%2Fharry_potter_and_the_chamber_of_secrets_2002")
165+
- This allows the KNN algorithm to start the similarity exploration from documents that already match the lexical criteria, potentially improving relevance and reducing search time.
154166

155-
- Search for movies with "cinderella" in the name among the top 50 movies most similar to the target vector (KNN as Filter Query):
167+
*KNN with EarlyTermination* - Search for the top 10 movies most similar to the target vector, allowing the KNN search to stop early for lower latency:
156168

157-
http://localhost:8983/solr/films/query?q=name:cinderella&fq={!knn%20f=film_vector%20topK=50}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
169+
http://localhost:8983/solr/films/query?q={!knn f=film_vector topK=10 earlyTermination=true}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]
158170

159-
* There are 3 "cinderella" movies in the index, but only 1 is among the top 50 most similar to the target vector (_Cinderella III: A Twist in Time_).
171+
- This allows Solr to return results faster by stopping the graph search once a good enough set of neighbors is found, instead of exploring all nodes in the vector index.
160172

161-
- Search for movies with "animation" in the genre, and rerank the top 5 documents by combining (sum) the original query score with twice (2x) the similarity to the target vector (KNN with ReRanking):
173+
**KNN with ReRanking** - Search for movies with "animation" in the genre, and rerank the top 5 documents by combining (sum) the original query score with twice (2x) the similarity to the target vector:
162174

163-
http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=5%20reRankWeight=2}
175+
http://localhost:8983/solr/films/query?q=genre:animation&rqq={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&rq={!rerank reRankQuery=$rqq reRankDocs=5 reRankWeight=2}
164176

165-
* To guarantee we calculate the vector similarity score for all the movies, we set `topK=10000`, a number higher than the total number of documents (`1100`).
177+
- To guarantee we calculate the vector similarity score for all the movies, we set `topK=10000`, a number higher than the total number of documents (`1100`).
166178

167-
* It's possible to combine the vector similarity scores with other scores, by using Sub-query,
168-
xref:query-guide:function-queries.adoc[Function Queries] and xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter Dereferencing] Solr features:
179+
It's possible to combine the vector similarity scores with other scores, by using Sub-query, xref:query-guide:function-queries.adoc[Function Queries] and xref:query-guide:local-params.adoc#parameter-dereferencing[Parameter Dereferencing] Solr features:
169180

170181
- Search for "harry potter" movies, ranking the results by the similarity to the target vector instead of the lexical query score. Beside the `q` parameter, we define a "sub-query" named `q_vector`, that will calculate the similarity score between all the movies (since we set `topK=10000`). Then we use the sub-query parameter name as input for the `sort`, specifying that we want to rank descending according to the vector similarity score (`sort=$q_vector desc`):
171182

172-
http://localhost:8983/solr/films/query?q=name:"harry%20potter"&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector%20desc
183+
http://localhost:8983/solr/films/query?q=name:"harry potter"&q_vector={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&sort=$q_vector desc
173184

174185
- Search for movies with "the" in the name, keeping the original lexical query ranking, but returning only movies with similarity to the target vector of 0.8 or higher. Like previously, we define the sub-query `q_vector`, but this time we use it as input for the `frange` filter, specifying that we want documents with at least 0.8 of vector similarity score:
175186

176-
http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange%20l=0.8}$q_vector
187+
http://localhost:8983/solr/films/query?q=name:the&q_vector={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&fq={!frange l=0.8}$q_vector
177188

178189
- Search for "batman" movies, ranking the results by combining 70% of the original lexical query score and 30% of the similarity to the target vector. Besides the `q` main query and the `q_vector` sub-query, we also specify the `q_lexical` query, which will hold the lexical score of the main `q` query. Then we specify a parameter variable called `score_combined`, which scales the lexical and similarity scores, applies the 0.7 and 0.3 weights, then sum the result. We set the `sort` parameter to order according the combined score, and also set the `fl` parameter so that we can view the intermediary and the combined score values in the response:
179190

180-
http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax%20v=$q}&q_vector={!knn%20f=film_vector%20topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined%20desc&fl=name,score,$q_lexical,$q_vector,$score_combined
191+
http://localhost:8983/solr/films/query?q=name:batman&q_lexical={!edismax v=$q}&q_vector={!knn f=film_vector topK=10000}[-0.1784,0.0096,-0.1455,0.4167,-0.1148,-0.0053,-0.0651,-0.0415,0.0859,-0.1789]&score_combined=sum(mul(scale($q_lexical,0,1),0.7),mul(scale($q_vector,0,1),0.3))&sort=$score_combined desc&fl=name,score,$q_lexical,$q_vector,$score_combined
181192

182193

183194
=== Exercise 5 Wrap Up

0 commit comments

Comments
 (0)