@@ -56,7 +56,7 @@ import numpy as np
5656
5757The first of these imports is the
5858` SentenceTransformer ` class, which generates an embedding from a section of text.
59- Here, we create an instance of ` SentenceTransformer ` that uses the
59+ This example uses an instance of ` SentenceTransformer ` with the
6060[ ` all-MiniLM-L6-v2 ` ] ( https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 )
6161model for the embeddings. This model generates vectors with 384 dimensions, regardless
6262of the length of the input text, but note that the input is truncated to 256
@@ -71,8 +71,8 @@ model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
7171
7272## Create the data
7373
74- For the example, we will use a dictionary of data that contains brief
75- descriptions of some famous people:
74+ The example data is contained a dictionary with some brief
75+ descriptions of famous people:
7676
7777``` python
7878peopleData = {
@@ -146,11 +146,11 @@ The code below uses the dictionary's
146146view to iterate through all the key-value pairs and add corresponding
147147elements to a vector set called ` famousPeople ` .
148148
149- We use the
149+ Use the
150150[ ` encode() ` ] ( https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode )
151151method of ` SentenceTransformer ` to generate the
152152embedding as an array of ` float32 ` values. The ` tobytes() ` method converts
153- the array to a byte string that we pass to the
153+ the array to a byte string that you can pass to the
154154[ ` vadd() ` ] ({{< relref "/commands/vadd" >}}) command to set the embedding.
155155Note that ` vadd() ` can also accept a list of ` float ` values to set the
156156vector, but the byte string format is more compact and saves a little
@@ -183,9 +183,9 @@ for name, details in peopleData.items():
183183
184184## Query the vector set
185185
186- We can now query the data in the set. The basic approach is to use the
186+ You can now query the data in the set. The basic approach is to use the
187187` encode() ` method to generate another embedding vector for the query text.
188- (This is the same method we used when we added the elements to the set.) Then, we pass
188+ (This is the same method used to add the elements to the set.) Then, pass
189189the query vector to [ ` vsim() ` ] ({{< relref "/commands/vsim" >}}) to return elements
190190of the set, ranked in order of similarity to the query.
191191
@@ -211,8 +211,8 @@ This returns the following list of elements (formatted slightly for clarity):
211211```
212212
213213The first two people in the list are the two actors, as expected, but none of the
214- people from Linus Pauling onward was especially well-known for acting (and we certainly
215- didn't include any information about that in the short description text).
214+ people from Linus Pauling onward was especially well-known for acting (and there certainly
215+ isn't any information about that in the short description text).
216216As it stands, the search attempts to rank all the elements in the set, based
217217on the information contained in the embedding model.
218218You can use the ` count ` parameter of ` vsim() ` to limit the list of elements
@@ -234,10 +234,9 @@ print(f"'actors (2)': {two_actors_results}")
234234The reason for using text embeddings rather than simple text search
235235is that the embeddings represent semantic information. This allows a query
236236to find elements with a similar meaning even if the text is
237- different. For example, we
238- don't use the word "entertainer" in any of the descriptions but
239- if we use it as a query, the actors and musicians are ranked highest
240- in the results list:
237+ different. For example, the word "entertainer" doesn't appear in any of the
238+ descriptions but if you use it as a query, the actors and musicians are ranked
239+ highest in the results list:
241240
242241``` py
243242query_value = " entertainer"
@@ -253,7 +252,7 @@ print(f"'entertainer': {entertainer_results}")
253252# 'Paul Erdos', 'Maryam Mirzakhani', 'Marie Curie']
254253```
255254
256- Similarly, if we use "science" as a query, we get the following results:
255+ Similarly, if you use "science" as a query, you get the following results:
257256
258257```
259258'science': ['Marie Curie', 'Linus Pauling', 'Maryam Mirzakhani',
0 commit comments