|
10 | 10 | "\n",
|
11 | 11 | "# Semantic Search\n",
|
12 | 12 | "\n",
|
13 |
| - "In this walkthrough, we'll learn how to use Pinecone for semantic search using a multilingual translation dataset. We'll grab English sentences and search over a corpus of related sentences, aiming to find the relevant subset to our query.\n", |
| 13 | + "In this walkthrough, we'll learn how to use Pinecone for semantic search using a multilingual translation dataset. \n", |
14 | 14 | "\n",
|
| 15 | + "We'll grab English sentences and search over a corpus of related sentences, aiming to find the relevant subset to our query.\n", |
15 | 16 | "\n",
|
16 |
| - "Semantic search is a form of retrieval that allows you to find documents that are similar in meaning to a given query, irrespective of the words used in each query. Semantic search is often in opposition to lexical search, where keywords are used to identify relevant documents to a given query, though it doesn't have to always be this way!\n", |
| 17 | + "\n", |
| 18 | + "Semantic search is a form of retrieval that allows you to find documents that are similar in meaning to a given query, irrespective of the words used in each query. \n", |
| 19 | + "\n", |
| 20 | + "Semantic search is often in opposition to lexical search, where keywords are used to identify relevant documents to a given query, though it doesn't have to always be this way!\n", |
17 | 21 | "\n",
|
18 | 22 | " It's super helpful for applications that require an understanding of a query's intent (such as when a user queries with a question over a corpus), or for when traditional lexical search doesn't work (such as in multimodal or multilingual applications).\n",
|
19 | 23 | "\n",
|
20 | 24 | "\n",
|
21 | 25 | "To begin, let's install the following libraries:"
|
22 | 26 | ]
|
23 | 27 | },
|
| 28 | + { |
| 29 | + "cell_type": "markdown", |
| 30 | + "metadata": {}, |
| 31 | + "source": [ |
| 32 | + "## Installation" |
| 33 | + ] |
| 34 | + }, |
24 | 35 | {
|
25 | 36 | "cell_type": "code",
|
26 |
| - "execution_count": 1, |
| 37 | + "execution_count": null, |
27 | 38 | "metadata": {
|
28 | 39 | "colab": {
|
29 | 40 | "base_uri": "https://localhost:8080/"
|
|
33 | 44 | },
|
34 | 45 | "outputs": [],
|
35 | 46 | "source": [
|
36 |
| - "!pip install -qU \\\n", |
37 |
| - " pinecone==6.0.2 \\\n", |
| 47 | + "!uv pip install -qU \\\n", |
| 48 | + " pinecone~=7.3.0 \\\n", |
38 | 49 | " pinecone-notebooks==0.1.1 \\\n",
|
39 |
| - " datasets==3.5.1" |
| 50 | + " datasets==3.5.1 \\" |
40 | 51 | ]
|
41 | 52 | },
|
42 | 53 | {
|
|
47 | 58 | "source": [
|
48 | 59 | "---\n",
|
49 | 60 | "\n",
|
50 |
| - "🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._\n", |
| 61 | + "🚨 _Note: the above `uv pip install` is formatted for Colab Jupyter notebooks. If running elsewhere you may need to drop the `!`._. If you want to run without uv, remove \"uv\"\n", |
51 | 62 | "\n",
|
52 | 63 | "---"
|
53 | 64 | ]
|
|
61 | 72 | "## Setting up"
|
62 | 73 | ]
|
63 | 74 | },
|
| 75 | + { |
| 76 | + "cell_type": "markdown", |
| 77 | + "metadata": {}, |
| 78 | + "source": [ |
| 79 | + "### Get and Set the Pinecone API Key\n", |
| 80 | + "\n", |
| 81 | + "We'll first need a free Pinecone account and API key. \n", |
| 82 | + "\n", |
| 83 | + "This cell will help you create an account if you don't have one and then create an API key and save it in your Colab environment.\n", |
| 84 | + "\n", |
| 85 | + "Run the cell below, and click the Pinecone Connect button to create an account or log in, and follow the prompts to generate an API key:" |
| 86 | + ] |
| 87 | + }, |
64 | 88 | {
|
65 | 89 | "cell_type": "code",
|
66 |
| - "execution_count": 2, |
| 90 | + "execution_count": null, |
67 | 91 | "metadata": {},
|
68 | 92 | "outputs": [
|
69 | 93 | {
|
|
75 | 99 | }
|
76 | 100 | ],
|
77 | 101 | "source": [
|
78 |
| - "import os\n", |
79 |
| - "from getpass import getpass\n", |
80 |
| - "\n", |
81 |
| - "def get_pinecone_api_key():\n", |
82 |
| - " \"\"\"\n", |
83 |
| - " Get Pinecone API key from environment variable or prompt user for input.\n", |
84 |
| - " Returns the API key as a string.\n", |
85 |
| - "\n", |
86 |
| - " Only necessary for notebooks. When using Pinecone yourself, \n", |
87 |
| - " you can use environment variables or the like to set your API key.\n", |
88 |
| - " \"\"\"\n", |
89 |
| - " api_key = os.environ.get(\"PINECONE_API_KEY\")\n", |
90 |
| - " \n", |
91 |
| - " if api_key is None:\n", |
92 |
| - " try:\n", |
93 |
| - " # Try Colab authentication if available\n", |
94 |
| - " from pinecone_notebooks.colab import Authenticate\n", |
95 |
| - " Authenticate()\n", |
96 |
| - " # If successful, key will now be in environment\n", |
97 |
| - " api_key = os.environ.get(\"PINECONE_API_KEY\")\n", |
98 |
| - " except ImportError:\n", |
99 |
| - " # If not in Colab or authentication fails, prompt user for API key\n", |
100 |
| - " print(\"Pinecone API key not found in environment.\")\n", |
101 |
| - " api_key = getpass(\"Please enter your Pinecone API key: \")\n", |
102 |
| - " # Save to environment for future use in session\n", |
103 |
| - " os.environ[\"PINECONE_API_KEY\"] = api_key\n", |
104 |
| - " \n", |
105 |
| - " return api_key\n", |
106 |
| - "\n", |
107 |
| - "api_key = get_pinecone_api_key()" |
| 102 | + "from pinecone_notebooks.colab import Authenticate\n", |
| 103 | + "\n", |
| 104 | + "Authenticate()" |
| 105 | + ] |
| 106 | + }, |
| 107 | + { |
| 108 | + "cell_type": "markdown", |
| 109 | + "metadata": {}, |
| 110 | + "source": [ |
| 111 | + "Now that our key is ready, we can retrieve it from our environment and proceed:" |
108 | 112 | ]
|
109 | 113 | },
|
110 | 114 | {
|
111 | 115 | "cell_type": "code",
|
112 |
| - "execution_count": 3, |
| 116 | + "execution_count": null, |
113 | 117 | "metadata": {
|
114 | 118 | "id": "mc66NEBAcQHY"
|
115 | 119 | },
|
116 |
| - "outputs": [ |
117 |
| - { |
118 |
| - "name": "stderr", |
119 |
| - "output_type": "stream", |
120 |
| - "text": [ |
121 |
| - "/opt/miniconda3/envs/pinecone-examples/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", |
122 |
| - " from .autonotebook import tqdm as notebook_tqdm\n" |
123 |
| - ] |
124 |
| - } |
125 |
| - ], |
| 120 | + "outputs": [], |
126 | 121 | "source": [
|
127 | 122 | "from pinecone import Pinecone\n",
|
128 |
| - "\n", |
129 | 123 | "# Initialize client\n",
|
| 124 | + "import os\n", |
| 125 | + "\n", |
| 126 | + "api_key = os.environ.get(\"PINECONE_API_KEY\")\n", |
130 | 127 | "\n",
|
131 | 128 | "pc = Pinecone(\n",
|
132 | 129 | " # You can remove this for your own projects!\n",
|
|
153 | 150 | },
|
154 | 151 | {
|
155 | 152 | "cell_type": "code",
|
156 |
| - "execution_count": 4, |
| 153 | + "execution_count": 5, |
157 | 154 | "metadata": {
|
158 | 155 | "colab": {
|
159 | 156 | "base_uri": "https://localhost:8080/"
|
|
173 | 170 | " 'vector_type': 'dense'}"
|
174 | 171 | ]
|
175 | 172 | },
|
176 |
| - "execution_count": 4, |
| 173 | + "execution_count": 5, |
177 | 174 | "metadata": {},
|
178 | 175 | "output_type": "execute_result"
|
179 | 176 | }
|
|
226 | 223 | },
|
227 | 224 | {
|
228 | 225 | "cell_type": "code",
|
229 |
| - "execution_count": 5, |
| 226 | + "execution_count": 6, |
230 | 227 | "metadata": {},
|
231 | 228 | "outputs": [],
|
232 | 229 | "source": [
|
|
244 | 241 | },
|
245 | 242 | {
|
246 | 243 | "cell_type": "code",
|
247 |
| - "execution_count": 6, |
| 244 | + "execution_count": 7, |
248 | 245 | "metadata": {},
|
249 | 246 | "outputs": [
|
250 | 247 | {
|
|
258 | 255 | " {'en': 'I have to go to sleep.', 'es': 'Tengo que irme a dormir.'}]}"
|
259 | 256 | ]
|
260 | 257 | },
|
261 |
| - "execution_count": 6, |
| 258 | + "execution_count": 7, |
262 | 259 | "metadata": {},
|
263 | 260 | "output_type": "execute_result"
|
264 | 261 | }
|
|
269 | 266 | },
|
270 | 267 | {
|
271 | 268 | "cell_type": "code",
|
272 |
| - "execution_count": 7, |
| 269 | + "execution_count": 8, |
273 | 270 | "metadata": {},
|
274 |
| - "outputs": [ |
275 |
| - { |
276 |
| - "name": "stderr", |
277 |
| - "output_type": "stream", |
278 |
| - "text": [ |
279 |
| - "Filter: 100%|██████████| 214127/214127 [00:00<00:00, 439387.27 examples/s]\n", |
280 |
| - "Flattening the indices: 100%|██████████| 416/416 [00:00<00:00, 237004.95 examples/s]\n" |
281 |
| - ] |
282 |
| - } |
283 |
| - ], |
| 271 | + "outputs": [], |
284 | 272 | "source": [
|
285 | 273 | "keywords= [\"park\"]\n",
|
286 | 274 | "\n",
|
|
360 | 348 | },
|
361 | 349 | {
|
362 | 350 | "cell_type": "code",
|
363 |
| - "execution_count": 8, |
| 351 | + "execution_count": 9, |
364 | 352 | "metadata": {
|
365 | 353 | "colab": {
|
366 | 354 | "base_uri": "https://localhost:8080/",
|
|
387 | 375 | "name": "stderr",
|
388 | 376 | "output_type": "stream",
|
389 | 377 | "text": [
|
390 |
| - "Upserting records batch: 100%|██████████| 5/5 [00:02<00:00, 1.79it/s]\n" |
| 378 | + "Upserting records batch: 100%|██████████| 5/5 [00:02<00:00, 1.91it/s]\n" |
391 | 379 | ]
|
392 | 380 | }
|
393 | 381 | ],
|
|
434 | 422 | },
|
435 | 423 | {
|
436 | 424 | "cell_type": "code",
|
437 |
| - "execution_count": 12, |
| 425 | + "execution_count": 10, |
438 | 426 | "metadata": {},
|
439 |
| - "outputs": [ |
440 |
| - { |
441 |
| - "name": "stdout", |
442 |
| - "output_type": "stream", |
443 |
| - "text": [ |
444 |
| - "Sentence: I have the afternoon off today, so I plan to go to the park, sit under a tree and read a book. Semantic Similarity Score: 0.4675264060497284\n", |
445 |
| - "\n", |
446 |
| - "Sentence: I went to the park to play tennis. Semantic Similarity Score: 0.4330753684043884\n", |
447 |
| - "\n", |
448 |
| - "Sentence: I go to the park. Semantic Similarity Score: 0.4261631369590759\n", |
449 |
| - "\n", |
450 |
| - "Sentence: I went to the park yesterday. Semantic Similarity Score: 0.42239895462989807\n", |
451 |
| - "\n", |
452 |
| - "Sentence: I went to the park last Sunday. Semantic Similarity Score: 0.42069774866104126\n", |
453 |
| - "\n", |
454 |
| - "Sentence: I like going for a walk in the park. Semantic Similarity Score: 0.41970351338386536\n", |
455 |
| - "\n", |
456 |
| - "Sentence: I went to the park last Saturday. Semantic Similarity Score: 0.4103226661682129\n", |
457 |
| - "\n", |
458 |
| - "Sentence: I need light plates because today my family is going to eat lunch in the park. Semantic Similarity Score: 0.40211308002471924\n", |
459 |
| - "\n", |
460 |
| - "Sentence: Linda went to the park to listen to music. Semantic Similarity Score: 0.4012303650379181\n", |
461 |
| - "\n", |
462 |
| - "Sentence: I'll go to the park. Semantic Similarity Score: 0.3996794819831848\n", |
463 |
| - "\n" |
464 |
| - ] |
465 |
| - } |
466 |
| - ], |
| 427 | + "outputs": [], |
467 | 428 | "source": [
|
468 | 429 | "search_query = \"I want to go to the park and relax\"\n",
|
469 | 430 | "\n",
|
|
490 | 451 | },
|
491 | 452 | {
|
492 | 453 | "cell_type": "code",
|
493 |
| - "execution_count": 13, |
| 454 | + "execution_count": 11, |
494 | 455 | "metadata": {},
|
495 |
| - "outputs": [ |
496 |
| - { |
497 |
| - "name": "stdout", |
498 |
| - "output_type": "stream", |
499 |
| - "text": [ |
500 |
| - "Sentence: I can't find a spot to park my spaceship. Semantic Similarity Score: 0.44190075993537903\n", |
501 |
| - "\n", |
502 |
| - "Sentence: I can't find a spot to park my spaceship. Semantic Similarity Score: 0.44190075993537903\n", |
503 |
| - "\n", |
504 |
| - "Sentence: There isn't anywhere else to park. Semantic Similarity Score: 0.4017431437969208\n", |
505 |
| - "\n", |
506 |
| - "Sentence: I have to park my car here. Semantic Similarity Score: 0.3978813886642456\n", |
507 |
| - "\n", |
508 |
| - "Sentence: Where can I park? Semantic Similarity Score: 0.39125218987464905\n", |
509 |
| - "\n", |
510 |
| - "Sentence: Where can I park? Semantic Similarity Score: 0.39125218987464905\n", |
511 |
| - "\n", |
512 |
| - "Sentence: I am parking my car near the office. Semantic Similarity Score: 0.37668246030807495\n", |
513 |
| - "\n", |
514 |
| - "Sentence: May I park here for a while? Semantic Similarity Score: 0.3707844614982605\n", |
515 |
| - "\n", |
516 |
| - "Sentence: I parked on the left side of the street just in front of the school. Semantic Similarity Score: 0.37002164125442505\n", |
517 |
| - "\n", |
518 |
| - "Sentence: Where can I park my car? Semantic Similarity Score: 0.3609045743942261\n", |
519 |
| - "\n" |
520 |
| - ] |
521 |
| - } |
522 |
| - ], |
| 456 | + "outputs": [], |
523 | 457 | "source": [
|
524 | 458 | "search_query = \"I need a place to park\"\n",
|
525 | 459 | "\n",
|
|
563 | 497 | },
|
564 | 498 | {
|
565 | 499 | "cell_type": "code",
|
566 |
| - "execution_count": 11, |
| 500 | + "execution_count": 12, |
567 | 501 | "metadata": {
|
568 | 502 | "id": "-cWdeKzhAtww"
|
569 | 503 | },
|
570 | 504 | "outputs": [],
|
571 | 505 | "source": [
|
572 |
| - "#pc.delete_index(name=index_name)" |
| 506 | + "pc.delete_index(name=index_name)" |
573 | 507 | ]
|
574 | 508 | },
|
575 | 509 | {
|
|
588 | 522 | },
|
589 | 523 | "gpuClass": "standard",
|
590 | 524 | "kernelspec": {
|
591 |
| - "display_name": "Python 3 (ipykernel)", |
| 525 | + "display_name": "pinecone-examples", |
592 | 526 | "language": "python",
|
593 | 527 | "name": "python3"
|
594 | 528 | },
|
|
0 commit comments