Skip to content

Commit 902ec2d

Browse files
authored
Update Semantic Search Example to use UV and better explain API keys (#493)
Replaced current install pattern with UV for a much faster experience. Added text around how API keys work to see how that affects completion rates. Fixed Colab api key setting to see if smoother experience will improve usage.
1 parent d288ac9 commit 902ec2d

File tree

1 file changed

+63
-129
lines changed

1 file changed

+63
-129
lines changed

docs/semantic-search.ipynb

Lines changed: 63 additions & 129 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,31 @@
1010
"\n",
1111
"# Semantic Search\n",
1212
"\n",
13-
"In this walkthrough, we'll learn how to use Pinecone for semantic search using a multilingual translation dataset. We'll grab English sentences and search over a corpus of related sentences, aiming to find the relevant subset to our query.\n",
13+
"In this walkthrough, we'll learn how to use Pinecone for semantic search using a multilingual translation dataset. \n",
1414
"\n",
15+
"We'll grab English sentences and search over a corpus of related sentences, aiming to find the relevant subset to our query.\n",
1516
"\n",
16-
"Semantic search is a form of retrieval that allows you to find documents that are similar in meaning to a given query, irrespective of the words used in each query. Semantic search is often in opposition to lexical search, where keywords are used to identify relevant documents to a given query, though it doesn't have to always be this way!\n",
17+
"\n",
18+
"Semantic search is a form of retrieval that allows you to find documents that are similar in meaning to a given query, irrespective of the words used in each query. \n",
19+
"\n",
20+
"Semantic search is often in opposition to lexical search, where keywords are used to identify relevant documents to a given query, though it doesn't have to always be this way!\n",
1721
"\n",
1822
" It's super helpful for applications that require an understanding of a query's intent (such as when a user queries with a question over a corpus), or for when traditional lexical search doesn't work (such as in multimodal or multilingual applications).\n",
1923
"\n",
2024
"\n",
2125
"To begin, let's install the following libraries:"
2226
]
2327
},
28+
{
29+
"cell_type": "markdown",
30+
"metadata": {},
31+
"source": [
32+
"## Installation"
33+
]
34+
},
2435
{
2536
"cell_type": "code",
26-
"execution_count": 1,
37+
"execution_count": null,
2738
"metadata": {
2839
"colab": {
2940
"base_uri": "https://localhost:8080/"
@@ -33,10 +44,10 @@
3344
},
3445
"outputs": [],
3546
"source": [
36-
"!pip install -qU \\\n",
37-
" pinecone==6.0.2 \\\n",
47+
"!uv pip install -qU \\\n",
48+
" pinecone~=7.3.0 \\\n",
3849
" pinecone-notebooks==0.1.1 \\\n",
39-
" datasets==3.5.1"
50+
" datasets==3.5.1 \\"
4051
]
4152
},
4253
{
@@ -47,7 +58,7 @@
4758
"source": [
4859
"---\n",
4960
"\n",
50-
"🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._\n",
61+
"🚨 _Note: the above `uv pip install` is formatted for Colab Jupyter notebooks. If running elsewhere you may need to drop the `!`._. If you want to run without uv, remove \"uv\"\n",
5162
"\n",
5263
"---"
5364
]
@@ -61,9 +72,22 @@
6172
"## Setting up"
6273
]
6374
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"### Get and Set the Pinecone API Key\n",
80+
"\n",
81+
"We'll first need a free Pinecone account and API key. \n",
82+
"\n",
83+
"This cell will help you create an account if you don't have one and then create an API key and save it in your Colab environment.\n",
84+
"\n",
85+
"Run the cell below, and click the Pinecone Connect button to create an account or log in, and follow the prompts to generate an API key:"
86+
]
87+
},
6488
{
6589
"cell_type": "code",
66-
"execution_count": 2,
90+
"execution_count": null,
6791
"metadata": {},
6892
"outputs": [
6993
{
@@ -75,58 +99,31 @@
7599
}
76100
],
77101
"source": [
78-
"import os\n",
79-
"from getpass import getpass\n",
80-
"\n",
81-
"def get_pinecone_api_key():\n",
82-
" \"\"\"\n",
83-
" Get Pinecone API key from environment variable or prompt user for input.\n",
84-
" Returns the API key as a string.\n",
85-
"\n",
86-
" Only necessary for notebooks. When using Pinecone yourself, \n",
87-
" you can use environment variables or the like to set your API key.\n",
88-
" \"\"\"\n",
89-
" api_key = os.environ.get(\"PINECONE_API_KEY\")\n",
90-
" \n",
91-
" if api_key is None:\n",
92-
" try:\n",
93-
" # Try Colab authentication if available\n",
94-
" from pinecone_notebooks.colab import Authenticate\n",
95-
" Authenticate()\n",
96-
" # If successful, key will now be in environment\n",
97-
" api_key = os.environ.get(\"PINECONE_API_KEY\")\n",
98-
" except ImportError:\n",
99-
" # If not in Colab or authentication fails, prompt user for API key\n",
100-
" print(\"Pinecone API key not found in environment.\")\n",
101-
" api_key = getpass(\"Please enter your Pinecone API key: \")\n",
102-
" # Save to environment for future use in session\n",
103-
" os.environ[\"PINECONE_API_KEY\"] = api_key\n",
104-
" \n",
105-
" return api_key\n",
106-
"\n",
107-
"api_key = get_pinecone_api_key()"
102+
"from pinecone_notebooks.colab import Authenticate\n",
103+
"\n",
104+
"Authenticate()"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"Now that our key is ready, we can retrieve it from our environment and proceed:"
108112
]
109113
},
110114
{
111115
"cell_type": "code",
112-
"execution_count": 3,
116+
"execution_count": null,
113117
"metadata": {
114118
"id": "mc66NEBAcQHY"
115119
},
116-
"outputs": [
117-
{
118-
"name": "stderr",
119-
"output_type": "stream",
120-
"text": [
121-
"/opt/miniconda3/envs/pinecone-examples/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
122-
" from .autonotebook import tqdm as notebook_tqdm\n"
123-
]
124-
}
125-
],
120+
"outputs": [],
126121
"source": [
127122
"from pinecone import Pinecone\n",
128-
"\n",
129123
"# Initialize client\n",
124+
"import os\n",
125+
"\n",
126+
"api_key = os.environ.get(\"PINECONE_API_KEY\")\n",
130127
"\n",
131128
"pc = Pinecone(\n",
132129
" # You can remove this for your own projects!\n",
@@ -153,7 +150,7 @@
153150
},
154151
{
155152
"cell_type": "code",
156-
"execution_count": 4,
153+
"execution_count": 5,
157154
"metadata": {
158155
"colab": {
159156
"base_uri": "https://localhost:8080/"
@@ -173,7 +170,7 @@
173170
" 'vector_type': 'dense'}"
174171
]
175172
},
176-
"execution_count": 4,
173+
"execution_count": 5,
177174
"metadata": {},
178175
"output_type": "execute_result"
179176
}
@@ -226,7 +223,7 @@
226223
},
227224
{
228225
"cell_type": "code",
229-
"execution_count": 5,
226+
"execution_count": 6,
230227
"metadata": {},
231228
"outputs": [],
232229
"source": [
@@ -244,7 +241,7 @@
244241
},
245242
{
246243
"cell_type": "code",
247-
"execution_count": 6,
244+
"execution_count": 7,
248245
"metadata": {},
249246
"outputs": [
250247
{
@@ -258,7 +255,7 @@
258255
" {'en': 'I have to go to sleep.', 'es': 'Tengo que irme a dormir.'}]}"
259256
]
260257
},
261-
"execution_count": 6,
258+
"execution_count": 7,
262259
"metadata": {},
263260
"output_type": "execute_result"
264261
}
@@ -269,18 +266,9 @@
269266
},
270267
{
271268
"cell_type": "code",
272-
"execution_count": 7,
269+
"execution_count": 8,
273270
"metadata": {},
274-
"outputs": [
275-
{
276-
"name": "stderr",
277-
"output_type": "stream",
278-
"text": [
279-
"Filter: 100%|██████████| 214127/214127 [00:00<00:00, 439387.27 examples/s]\n",
280-
"Flattening the indices: 100%|██████████| 416/416 [00:00<00:00, 237004.95 examples/s]\n"
281-
]
282-
}
283-
],
271+
"outputs": [],
284272
"source": [
285273
"keywords= [\"park\"]\n",
286274
"\n",
@@ -360,7 +348,7 @@
360348
},
361349
{
362350
"cell_type": "code",
363-
"execution_count": 8,
351+
"execution_count": 9,
364352
"metadata": {
365353
"colab": {
366354
"base_uri": "https://localhost:8080/",
@@ -387,7 +375,7 @@
387375
"name": "stderr",
388376
"output_type": "stream",
389377
"text": [
390-
"Upserting records batch: 100%|██████████| 5/5 [00:02<00:00, 1.79it/s]\n"
378+
"Upserting records batch: 100%|██████████| 5/5 [00:02<00:00, 1.91it/s]\n"
391379
]
392380
}
393381
],
@@ -434,36 +422,9 @@
434422
},
435423
{
436424
"cell_type": "code",
437-
"execution_count": 12,
425+
"execution_count": 10,
438426
"metadata": {},
439-
"outputs": [
440-
{
441-
"name": "stdout",
442-
"output_type": "stream",
443-
"text": [
444-
"Sentence: I have the afternoon off today, so I plan to go to the park, sit under a tree and read a book. Semantic Similarity Score: 0.4675264060497284\n",
445-
"\n",
446-
"Sentence: I went to the park to play tennis. Semantic Similarity Score: 0.4330753684043884\n",
447-
"\n",
448-
"Sentence: I go to the park. Semantic Similarity Score: 0.4261631369590759\n",
449-
"\n",
450-
"Sentence: I went to the park yesterday. Semantic Similarity Score: 0.42239895462989807\n",
451-
"\n",
452-
"Sentence: I went to the park last Sunday. Semantic Similarity Score: 0.42069774866104126\n",
453-
"\n",
454-
"Sentence: I like going for a walk in the park. Semantic Similarity Score: 0.41970351338386536\n",
455-
"\n",
456-
"Sentence: I went to the park last Saturday. Semantic Similarity Score: 0.4103226661682129\n",
457-
"\n",
458-
"Sentence: I need light plates because today my family is going to eat lunch in the park. Semantic Similarity Score: 0.40211308002471924\n",
459-
"\n",
460-
"Sentence: Linda went to the park to listen to music. Semantic Similarity Score: 0.4012303650379181\n",
461-
"\n",
462-
"Sentence: I'll go to the park. Semantic Similarity Score: 0.3996794819831848\n",
463-
"\n"
464-
]
465-
}
466-
],
427+
"outputs": [],
467428
"source": [
468429
"search_query = \"I want to go to the park and relax\"\n",
469430
"\n",
@@ -490,36 +451,9 @@
490451
},
491452
{
492453
"cell_type": "code",
493-
"execution_count": 13,
454+
"execution_count": 11,
494455
"metadata": {},
495-
"outputs": [
496-
{
497-
"name": "stdout",
498-
"output_type": "stream",
499-
"text": [
500-
"Sentence: I can't find a spot to park my spaceship. Semantic Similarity Score: 0.44190075993537903\n",
501-
"\n",
502-
"Sentence: I can't find a spot to park my spaceship. Semantic Similarity Score: 0.44190075993537903\n",
503-
"\n",
504-
"Sentence: There isn't anywhere else to park. Semantic Similarity Score: 0.4017431437969208\n",
505-
"\n",
506-
"Sentence: I have to park my car here. Semantic Similarity Score: 0.3978813886642456\n",
507-
"\n",
508-
"Sentence: Where can I park? Semantic Similarity Score: 0.39125218987464905\n",
509-
"\n",
510-
"Sentence: Where can I park? Semantic Similarity Score: 0.39125218987464905\n",
511-
"\n",
512-
"Sentence: I am parking my car near the office. Semantic Similarity Score: 0.37668246030807495\n",
513-
"\n",
514-
"Sentence: May I park here for a while? Semantic Similarity Score: 0.3707844614982605\n",
515-
"\n",
516-
"Sentence: I parked on the left side of the street just in front of the school. Semantic Similarity Score: 0.37002164125442505\n",
517-
"\n",
518-
"Sentence: Where can I park my car? Semantic Similarity Score: 0.3609045743942261\n",
519-
"\n"
520-
]
521-
}
522-
],
456+
"outputs": [],
523457
"source": [
524458
"search_query = \"I need a place to park\"\n",
525459
"\n",
@@ -563,13 +497,13 @@
563497
},
564498
{
565499
"cell_type": "code",
566-
"execution_count": 11,
500+
"execution_count": 12,
567501
"metadata": {
568502
"id": "-cWdeKzhAtww"
569503
},
570504
"outputs": [],
571505
"source": [
572-
"#pc.delete_index(name=index_name)"
506+
"pc.delete_index(name=index_name)"
573507
]
574508
},
575509
{
@@ -588,7 +522,7 @@
588522
},
589523
"gpuClass": "standard",
590524
"kernelspec": {
591-
"display_name": "Python 3 (ipykernel)",
525+
"display_name": "pinecone-examples",
592526
"language": "python",
593527
"name": "python3"
594528
},

0 commit comments

Comments
 (0)