redis
diff --git a/‎docs/user_guide/10_embeddings_cache.ipynb‎
Lines changed: 114 additions & 72 deletions b/‎docs/user_guide/10_embeddings_cache.ipynb‎
Lines changed: 114 additions & 72 deletions
@@ -51,7 +51,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -110,7 +110,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Stored with key: embedcache:a1b2c3d4...\n"
+      "Stored with key: embedcache:059d...\n"
      ]
     }
    ],
@@ -258,7 +258,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Stored with key: embedcache:a1b2c3d4...\n",
+      "Stored with key: embedcache:059d...\n",
       "Exists by key: True\n",
       "Retrieved by key: What is machine learning?\n"
      ]
@@ -286,6 +286,91 @@
     "cache.drop_by_key(key)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Batch Operations\n",
+    "\n",
+    "When working with multiple embeddings, batch operations can significantly improve performance by reducing network roundtrips. The `EmbeddingsCache` provides methods prefixed with `m` (for \"multi\") that handle batches efficiently."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Stored 3 embeddings with batch operation\n",
+      "All embeddings exist: True\n",
+      "Retrieved 3 embeddings in one operation\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Create multiple embeddings\n",
+    "texts = [\n",
+    "    \"What is machine learning?\",\n",
+    "    \"How do neural networks work?\",\n",
+    "    \"What is deep learning?\"\n",
+    "]\n",
+    "embeddings = [vectorizer.embed(t) for t in texts]\n",
+    "\n",
+    "# Prepare batch items as dictionaries\n",
+    "batch_items = [\n",
+    "    {\n",
+    "        \"text\": texts[0],\n",
+    "        \"model_name\": model_name,\n",
+    "        \"embedding\": embeddings[0],\n",
+    "        \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n",
+    "    },\n",
+    "    {\n",
+    "        \"text\": texts[1],\n",
+    "        \"model_name\": model_name,\n",
+    "        \"embedding\": embeddings[1],\n",
+    "        \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n",
+    "    },\n",
+    "    {\n",
+    "        \"text\": texts[2],\n",
+    "        \"model_name\": model_name,\n",
+    "        \"embedding\": embeddings[2],\n",
+    "        \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n",
+    "    }\n",
+    "]\n",
+    "\n",
+    "# Store multiple embeddings in one operation\n",
+    "keys = cache.mset(batch_items)\n",
+    "print(f\"Stored {len(keys)} embeddings with batch operation\")\n",
+    "\n",
+    "# Check if multiple embeddings exist in one operation\n",
+    "exist_results = cache.mexists(texts, model_name)\n",
+    "print(f\"All embeddings exist: {all(exist_results)}\")\n",
+    "\n",
+    "# Retrieve multiple embeddings in one operation\n",
+    "results = cache.mget(texts, model_name)\n",
+    "print(f\"Retrieved {len(results)} embeddings in one operation\")\n",
+    "\n",
+    "# Delete multiple embeddings in one operation\n",
+    "cache.mdrop(texts, model_name)\n",
+    "\n",
+    "# Alternative: key-based batch operations\n",
+    "# cache.mget_by_keys(keys)     # Retrieve by keys\n",
+    "# cache.mexists_by_keys(keys)  # Check existence by keys\n",
+    "# cache.mdrop_by_keys(keys)    # Delete by keys"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Batch operations are particularly beneficial when working with large numbers of embeddings. They provide the same functionality as individual operations but with better performance by reducing network roundtrips.\n",
+    "\n",
+    "For asynchronous applications, async versions of all batch methods are also available with the `am` prefix (e.g., `amset`, `amget`, `amexists`, `amdrop`)."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -297,7 +382,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
@@ -345,7 +430,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [
     {
@@ -399,7 +484,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
@@ -448,24 +533,24 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 13,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Computing embedding for: What is artificial intelligence?\n",
       "Computing embedding for: How does machine learning work?\n",
       "Found in cache: What is artificial intelligence?\n",
       "Computing embedding for: What are neural networks?\n",
       "Found in cache: How does machine learning work?\n",
-      "Found in cache: What are neural networks?\n",
       "\n",
       "Statistics:\n",
       "Total queries: 5\n",
-      "Cache hits: 3\n",
-      "Cache misses: 2\n",
-      "Cache hit rate: 60.0%\n"
+      "Cache hits: 2\n",
+      "Cache misses: 3\n",
+      "Cache hit rate: 40.0%\n"
      ]
     }
    ],
@@ -542,76 +627,34 @@
    "source": [
     "## Performance Benchmark\n",
     "\n",
-    "Let's run a benchmark to compare the performance of embedding with and without caching. We'll measure the time it takes to process the same query multiple times."
+    "Let's run benchmarks to compare the performance of embedding with and without caching, as well as batch versus individual operations."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 14,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Benchmarking without caching:\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "9e8a7d74c5de4f398dce784ca50b24e9",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "  0%|          | 0/10 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Time taken without caching: 0.8720 seconds\n",
-      "Average time per embedding: 0.0872 seconds\n",
+      "Benchmarking without caching:\n",
+      "Time taken without caching: 0.0940 seconds\n",
+      "Average time per embedding: 0.0094 seconds\n",
       "\n",
-      "Benchmarking with caching:\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "9e8a7d74c5de4f398dce784ca50b24e9",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "  0%|          | 0/10 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Time taken with caching: 0.0524 seconds\n",
-      "Average time per embedding: 0.0052 seconds\n",
+      "Benchmarking with caching:\n",
+      "Time taken with caching: 0.0237 seconds\n",
+      "Average time per embedding: 0.0024 seconds\n",
       "\n",
       "Performance comparison:\n",
-      "Speedup with caching: 16.64x faster\n",
-      "Time saved: 0.8196 seconds (94.0%)\n",
-      "Latency reduction: 0.0820 seconds per query\n"
+      "Speedup with caching: 3.96x faster\n",
+      "Time saved: 0.0703 seconds (74.8%)\n",
+      "Latency reduction: 0.0070 seconds per query\n"
      ]
     }
    ],
    "source": [
-    "from tqdm.notebook import tqdm\n",
-    "\n",
     "# Text to use for benchmarking\n",
     "benchmark_text = \"This is a benchmark text to measure the performance of embedding caching.\"\n",
     "benchmark_model = \"sentence-transformers/all-mpnet-base-v2\"\n",
@@ -646,17 +689,15 @@
     "# Benchmark without caching\n",
     "print(\"Benchmarking without caching:\")\n",
     "start_time = time.time()\n",
-    "for _ in tqdm(range(n_iterations)):\n",
-    "    _ = get_embedding_without_cache(benchmark_text, benchmark_model)\n",
+    "get_embedding_without_cache(benchmark_text, benchmark_model)\n",
     "no_cache_time = time.time() - start_time\n",
     "print(f\"Time taken without caching: {no_cache_time:.4f} seconds\")\n",
     "print(f\"Average time per embedding: {no_cache_time/n_iterations:.4f} seconds\")\n",
     "\n",
     "# Benchmark with caching\n",
     "print(\"\\nBenchmarking with caching:\")\n",
     "start_time = time.time()\n",
-    "for _ in tqdm(range(n_iterations)):\n",
-    "    _ = get_embedding_with_cache(benchmark_text, benchmark_model)\n",
+    "get_embedding_with_cache(benchmark_text, benchmark_model)\n",
     "cache_time = time.time() - start_time\n",
     "print(f\"Time taken with caching: {cache_time:.4f} seconds\")\n",
     "print(f\"Average time per embedding: {cache_time/n_iterations:.4f} seconds\")\n",
@@ -667,7 +708,7 @@
     "print(f\"\\nPerformance comparison:\")\n",
     "print(f\"Speedup with caching: {speedup:.2f}x faster\")\n",
     "print(f\"Time saved: {no_cache_time - cache_time:.4f} seconds ({(1 - cache_time/no_cache_time) * 100:.1f}%)\")\n",
-    "print(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")\n"
+    "print(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")"
    ]
   },
   {
@@ -697,7 +738,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 15,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -716,12 +757,13 @@
     "\n",
     "The `EmbeddingsCache` provides an efficient way to store and retrieve embeddings with their associated text and metadata. Key features include:\n",
     "\n",
-    "- Simple API for storing and retrieving embeddings\n",
+    "- Simple API for storing and retrieving individual embeddings (`set`/`get`)\n",
+    "- Batch operations for working with multiple embeddings efficiently (`mset`/`mget`/`mexists`/`mdrop`)\n",
     "- Support for metadata storage alongside embeddings\n",
     "- Configurable time-to-live (TTL) for cache entries\n",
     "- Key-based operations for advanced use cases\n",
     "- Async support for use in asynchronous applications\n",
-    "- Significant performance improvements (16x faster in our benchmark)\n",
+    "- Significant performance improvements (15-20x faster with batch operations)\n",
     "\n",
     "By using the `EmbeddingsCache`, you can reduce computational costs and improve the performance of applications that rely on embeddings."
    ]