You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"By default, we setup indexes to query your data by the uuid and the metadata.\n",
498
498
"\n",
499
-
"If you have many rows, you also need to setup an index on the embedding. You can create an ivfflat index with the following command after the table has been populated."
499
+
"If you have many rows, you also need to setup an index on the embedding. You can create a timescale-vector index on the table with."
"Please note it is very important create the ivfflat index only after you have data in the table. "
558
+
]
559
+
},
530
560
{
531
561
"attachments": {},
532
562
"cell_type": "markdown",
@@ -535,12 +565,47 @@
535
565
"Please note the community is actively working on new indexing methods for embeddings. As they become available, we will add them to our client as well."
536
566
]
537
567
},
568
+
{
569
+
"attachments": {},
570
+
"cell_type": "markdown",
571
+
"metadata": {},
572
+
"source": [
573
+
"### Time-partitioning\n",
574
+
"\n",
575
+
"In many use-cases where you have many embeddings time is an important component associated with the embeddings. For example, when embedding news stories you often search by time as well as similarity (e.g. stories related to bitcoin in the past week, or stories about Clinton in November 2016). \n",
576
+
"\n",
577
+
"Yet, traditionally, searching by two components \"similarity\" and \"time\" is challenging approximate nearest neigbor (ANN) indexes and makes the similariy-search index less effective.\n",
578
+
"\n",
579
+
"One approach to solving this is partitioning the data by time and creating ANN indexes on each partition individually. Then, during search you can:\n",
580
+
"- Step 1: filter our partitions that don't match the time predicate\n",
581
+
"- Step 2: perform the similarity search on all matching partitions\n",
582
+
"- Step 3: combine all the results from each partition in step 2, rerank, and filter out results by time.\n",
583
+
"\n",
584
+
"Step 1 makes the search a lot more effecient by filtering out whole swaths of data in one go.\n",
585
+
"\n",
586
+
"Timescale-vector supports time partitioning using TimescaleDB's hypertables. To use this feature, simply indicate the length in time for each partition when creating the client:"
0 commit comments