Skip to content

Commit 0988876

Browse files
committed
better documentation
1 parent 3dc5f66 commit 0988876

File tree

2 files changed

+116
-31
lines changed

2 files changed

+116
-31
lines changed

README.md

Lines changed: 44 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -78,24 +78,24 @@ Now you can query for similar items:
7878
await vec.search([1.0, 9.0])
7979
```
8080

81-
[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,
82-
<Record id=UUID('71e70e23-65fd-4555-be30-bc40710654be') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]
81+
[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,
82+
<Record id=UUID('2cdb8cbd-5dd7-4555-926a-5efafb4b1cf0') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]
8383

8484
You can specify the number of records to return.
8585

8686
``` python
8787
await vec.search([1.0, 9.0], limit=1)
8888
```
8989

90-
[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]
90+
[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]
9191

9292
You can also specify a filter on the metadata as a simple dictionary
9393

9494
``` python
9595
await vec.search([1.0, 9.0], limit=1, filter={"action": "jump"})
9696
```
9797

98-
[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]
98+
[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]
9999

100100
You can also specify a list of filter dictionaries, where an item is
101101
returned if it matches any dict
@@ -104,8 +104,8 @@ returned if it matches any dict
104104
await vec.search([1.0, 9.0], limit=2, filter=[{"action": "jump"}, {"animal": "fox"}])
105105
```
106106

107-
[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,
108-
<Record id=UUID('71e70e23-65fd-4555-be30-bc40710654be') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]
107+
[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,
108+
<Record id=UUID('2cdb8cbd-5dd7-4555-926a-5efafb4b1cf0') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]
109109

110110
You can access the fields as follows
111111

@@ -114,7 +114,7 @@ records = await vec.search([1.0, 9.0], limit=1, filter={"action": "jump"})
114114
records[0][client.SEARCH_RESULT_ID_IDX]
115115
```
116116

117-
UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d')
117+
UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864')
118118

119119
``` python
120120
records[0][client.SEARCH_RESULT_METADATA_IDX]
@@ -185,8 +185,10 @@ await vec.drop_embedding_index()
185185
```
186186

187187
While we recommend the timescale-vector index type, we also have 2 more
188-
index types availabe: - The pgvector ivfflat index - The pgvector hnsw
189-
index
188+
index types availabe:
189+
190+
- The pgvector ivfflat index
191+
- The pgvector hnsw index
190192

191193
Usage examples below:
192194

@@ -218,10 +220,12 @@ similariy-search index less effective.
218220

219221
One approach to solving this is partitioning the data by time and
220222
creating ANN indexes on each partition individually. Then, during search
221-
you can: - Step 1: filter our partitions that don’t match the time
222-
predicate - Step 2: perform the similarity search on all matching
223-
partitions - Step 3: combine all the results from each partition in step
224-
2, rerank, and filter out results by time.
223+
you can:
224+
225+
- Step 1: filter our partitions that don’t match the time predicate
226+
- Step 2: perform the similarity search on all matching partitions
227+
- Step 3: combine all the results from each partition in step 2, rerank,
228+
and filter out results by time.
225229

226230
Step 1 makes the search a lot more effecient by filtering out whole
227231
swaths of data in one go.
@@ -232,13 +236,38 @@ each partition when creating the client:
232236

233237
``` python
234238
from datetime import timedelta
239+
from datetime import datetime
235240
```
236241

237242
``` python
238-
vec = client.Async(service_url, "data_table_with_time_partition", 2, time_partition_interval=timedelta(hours=6))
243+
vec = client.Async(service_url, "my_data_with_time_partition", 2, time_partition_interval=timedelta(hours=6))
244+
await vec.create_tables()
245+
```
239246

247+
Then insert data where the ids use uuid’s v1 and the time component of
248+
the uuid specifies the time of the embedding. For example, to create an
249+
embedding for the current time simply do:
250+
251+
``` python
240252
id = uuid.uuid1()
241-
vec.upsert([(id, {"key": "val"}, "the brown fox", [1.0, 1.2])])
253+
await vec.upsert([(id, {"key": "val"}, "the brown fox", [1.0, 1.2])])
254+
```
255+
256+
To insert data for a specific time in the past, create the uuid using
257+
our
258+
[`uuid_from_time`](https://timescale.github.io/python-vector/vector.html#uuid_from_time)
259+
function
260+
261+
``` python
262+
specific_datetime = datetime(2018, 8, 10, 15, 30, 0)
263+
await vec.upsert([(client.uuid_from_time(specific_datetime), {"key": "val"}, "the brown fox", [1.0, 1.2])])
264+
```
265+
266+
You can then query the data by specifing a `uuid_time_filter` in the
267+
search call:
268+
269+
``` python
270+
rec = await vec.search([1.0, 2.0], limit=4, uuid_time_filter=client.UUIDTimeRange(specific_datetime-timedelta(days=7), specific_datetime+timedelta(days=7)))
242271
```
243272

244273
## Development

nbs/index.ipynb

Lines changed: 72 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@
107107
"#| hide\n",
108108
"con = await asyncpg.connect(service_url)\n",
109109
"await con.execute(\"DROP TABLE IF EXISTS my_data;\")\n",
110+
"await con.execute(\"DROP TABLE IF EXISTS my_data_with_time_partition;\")\n",
110111
"await con.close()"
111112
]
112113
},
@@ -197,8 +198,8 @@
197198
{
198199
"data": {
199200
"text/plain": [
200-
"[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,\n",
201-
" <Record id=UUID('71e70e23-65fd-4555-be30-bc40710654be') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]"
201+
"[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,\n",
202+
" <Record id=UUID('2cdb8cbd-5dd7-4555-926a-5efafb4b1cf0') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]"
202203
]
203204
},
204205
"execution_count": null,
@@ -226,7 +227,7 @@
226227
{
227228
"data": {
228229
"text/plain": [
229-
"[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]"
230+
"[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]"
230231
]
231232
},
232233
"execution_count": null,
@@ -254,7 +255,7 @@
254255
{
255256
"data": {
256257
"text/plain": [
257-
"[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]"
258+
"[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>]"
258259
]
259260
},
260261
"execution_count": null,
@@ -282,8 +283,8 @@
282283
{
283284
"data": {
284285
"text/plain": [
285-
"[<Record id=UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,\n",
286-
" <Record id=UUID('71e70e23-65fd-4555-be30-bc40710654be') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]"
286+
"[<Record id=UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864') metadata={'action': 'jump', 'animal': 'fox'} contents='jumped over the' embedding=array([ 1. , 10.8], dtype=float32) distance=0.00016793422934946456>,\n",
287+
" <Record id=UUID('2cdb8cbd-5dd7-4555-926a-5efafb4b1cf0') metadata={'animal': 'fox'} contents='the brown fox' embedding=array([1. , 1.3], dtype=float32) distance=0.14489260377438218>]"
287288
]
288289
},
289290
"execution_count": null,
@@ -311,7 +312,7 @@
311312
{
312313
"data": {
313314
"text/plain": [
314-
"UUID('1c35a4c3-8a04-4d3e-b74d-511585db052d')"
315+
"UUID('e5dbaa7c-081b-4131-be18-c81ce47fc864')"
315316
]
316317
},
317318
"execution_count": null,
@@ -531,8 +532,9 @@
531532
"metadata": {},
532533
"source": [
533534
"While we recommend the timescale-vector index type, we also have 2 more index types availabe:\n",
534-
"- The pgvector ivfflat index\n",
535-
"- The pgvector hnsw index\n",
535+
"\n",
536+
"* The pgvector ivfflat index\n",
537+
"* The pgvector hnsw index\n",
536538
"\n",
537539
"Usage examples below:"
538540
]
@@ -577,9 +579,10 @@
577579
"Yet, traditionally, searching by two components \"similarity\" and \"time\" is challenging approximate nearest neigbor (ANN) indexes and makes the similariy-search index less effective.\n",
578580
"\n",
579581
"One approach to solving this is partitioning the data by time and creating ANN indexes on each partition individually. Then, during search you can:\n",
580-
"- Step 1: filter our partitions that don't match the time predicate\n",
581-
"- Step 2: perform the similarity search on all matching partitions\n",
582-
"- Step 3: combine all the results from each partition in step 2, rerank, and filter out results by time.\n",
582+
"\n",
583+
" * Step 1: filter our partitions that don't match the time predicate\n",
584+
" * Step 2: perform the similarity search on all matching partitions\n",
585+
" * Step 3: combine all the results from each partition in step 2, rerank, and filter out results by time.\n",
583586
"\n",
584587
"Step 1 makes the search a lot more effecient by filtering out whole swaths of data in one go.\n",
585588
"\n",
@@ -592,7 +595,27 @@
592595
"metadata": {},
593596
"outputs": [],
594597
"source": [
595-
"from datetime import timedelta"
598+
"from datetime import timedelta\n",
599+
"from datetime import datetime"
600+
]
601+
},
602+
{
603+
"cell_type": "code",
604+
"execution_count": null,
605+
"metadata": {},
606+
"outputs": [],
607+
"source": [
608+
"vec = client.Async(service_url, \"my_data_with_time_partition\", 2, time_partition_interval=timedelta(hours=6))\n",
609+
"await vec.create_tables()"
610+
]
611+
},
612+
{
613+
"attachments": {},
614+
"cell_type": "markdown",
615+
"metadata": {},
616+
"source": [
617+
"Then insert data where the ids use uuid's v1 and the time component of the uuid specifies the time of the embedding.\n",
618+
"For example, to create an embedding for the current time simply do: "
596619
]
597620
},
598621
{
@@ -601,10 +624,43 @@
601624
"metadata": {},
602625
"outputs": [],
603626
"source": [
604-
"vec = client.Async(service_url, \"data_table_with_time_partition\", 2, time_partition_interval=timedelta(hours=6))\n",
605-
"\n",
606627
"id = uuid.uuid1()\n",
607-
"vec.upsert([(id, {\"key\": \"val\"}, \"the brown fox\", [1.0, 1.2])])\n"
628+
"await vec.upsert([(id, {\"key\": \"val\"}, \"the brown fox\", [1.0, 1.2])])"
629+
]
630+
},
631+
{
632+
"attachments": {},
633+
"cell_type": "markdown",
634+
"metadata": {},
635+
"source": [
636+
"To insert data for a specific time in the past, create the uuid using our `uuid_from_time` function"
637+
]
638+
},
639+
{
640+
"cell_type": "code",
641+
"execution_count": null,
642+
"metadata": {},
643+
"outputs": [],
644+
"source": [
645+
"specific_datetime = datetime(2018, 8, 10, 15, 30, 0)\n",
646+
"await vec.upsert([(client.uuid_from_time(specific_datetime), {\"key\": \"val\"}, \"the brown fox\", [1.0, 1.2])])"
647+
]
648+
},
649+
{
650+
"attachments": {},
651+
"cell_type": "markdown",
652+
"metadata": {},
653+
"source": [
654+
"You can then query the data by specifing a `uuid_time_filter` in the search call:"
655+
]
656+
},
657+
{
658+
"cell_type": "code",
659+
"execution_count": null,
660+
"metadata": {},
661+
"outputs": [],
662+
"source": [
663+
"rec = await vec.search([1.0, 2.0], limit=4, uuid_time_filter=client.UUIDTimeRange(specific_datetime-timedelta(days=7), specific_datetime+timedelta(days=7)))"
608664
]
609665
},
610666
{

0 commit comments

Comments
 (0)