Skip to content

Commit 7c53f90

Browse files
update user guide and tests
1 parent cc18d1f commit 7c53f90

File tree

3 files changed

+1010
-151
lines changed

3 files changed

+1010
-151
lines changed

docs/user_guide/10_embeddings_cache.ipynb

Lines changed: 114 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
},
5252
{
5353
"cell_type": "code",
54-
"execution_count": 2,
54+
"execution_count": null,
5555
"metadata": {},
5656
"outputs": [],
5757
"source": [
@@ -110,7 +110,7 @@
110110
"name": "stdout",
111111
"output_type": "stream",
112112
"text": [
113-
"Stored with key: embedcache:a1b2c3d4...\n"
113+
"Stored with key: embedcache:059d...\n"
114114
]
115115
}
116116
],
@@ -258,7 +258,7 @@
258258
"name": "stdout",
259259
"output_type": "stream",
260260
"text": [
261-
"Stored with key: embedcache:a1b2c3d4...\n",
261+
"Stored with key: embedcache:059d...\n",
262262
"Exists by key: True\n",
263263
"Retrieved by key: What is machine learning?\n"
264264
]
@@ -286,6 +286,91 @@
286286
"cache.drop_by_key(key)"
287287
]
288288
},
289+
{
290+
"cell_type": "markdown",
291+
"metadata": {},
292+
"source": [
293+
"### Batch Operations\n",
294+
"\n",
295+
"When working with multiple embeddings, batch operations can significantly improve performance by reducing network roundtrips. The `EmbeddingsCache` provides methods prefixed with `m` (for \"multi\") that handle batches efficiently."
296+
]
297+
},
298+
{
299+
"cell_type": "code",
300+
"execution_count": 9,
301+
"metadata": {},
302+
"outputs": [
303+
{
304+
"name": "stdout",
305+
"output_type": "stream",
306+
"text": [
307+
"Stored 3 embeddings with batch operation\n",
308+
"All embeddings exist: True\n",
309+
"Retrieved 3 embeddings in one operation\n"
310+
]
311+
}
312+
],
313+
"source": [
314+
"# Create multiple embeddings\n",
315+
"texts = [\n",
316+
" \"What is machine learning?\",\n",
317+
" \"How do neural networks work?\",\n",
318+
" \"What is deep learning?\"\n",
319+
"]\n",
320+
"embeddings = [vectorizer.embed(t) for t in texts]\n",
321+
"\n",
322+
"# Prepare batch items as dictionaries\n",
323+
"batch_items = [\n",
324+
" {\n",
325+
" \"text\": texts[0],\n",
326+
" \"model_name\": model_name,\n",
327+
" \"embedding\": embeddings[0],\n",
328+
" \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n",
329+
" },\n",
330+
" {\n",
331+
" \"text\": texts[1],\n",
332+
" \"model_name\": model_name,\n",
333+
" \"embedding\": embeddings[1],\n",
334+
" \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n",
335+
" },\n",
336+
" {\n",
337+
" \"text\": texts[2],\n",
338+
" \"model_name\": model_name,\n",
339+
" \"embedding\": embeddings[2],\n",
340+
" \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n",
341+
" }\n",
342+
"]\n",
343+
"\n",
344+
"# Store multiple embeddings in one operation\n",
345+
"keys = cache.mset(batch_items)\n",
346+
"print(f\"Stored {len(keys)} embeddings with batch operation\")\n",
347+
"\n",
348+
"# Check if multiple embeddings exist in one operation\n",
349+
"exist_results = cache.mexists(texts, model_name)\n",
350+
"print(f\"All embeddings exist: {all(exist_results)}\")\n",
351+
"\n",
352+
"# Retrieve multiple embeddings in one operation\n",
353+
"results = cache.mget(texts, model_name)\n",
354+
"print(f\"Retrieved {len(results)} embeddings in one operation\")\n",
355+
"\n",
356+
"# Delete multiple embeddings in one operation\n",
357+
"cache.mdrop(texts, model_name)\n",
358+
"\n",
359+
"# Alternative: key-based batch operations\n",
360+
"# cache.mget_by_keys(keys) # Retrieve by keys\n",
361+
"# cache.mexists_by_keys(keys) # Check existence by keys\n",
362+
"# cache.mdrop_by_keys(keys) # Delete by keys"
363+
]
364+
},
365+
{
366+
"cell_type": "markdown",
367+
"metadata": {},
368+
"source": [
369+
"Batch operations are particularly beneficial when working with large numbers of embeddings. They provide the same functionality as individual operations but with better performance by reducing network roundtrips.\n",
370+
"\n",
371+
"For asynchronous applications, async versions of all batch methods are also available with the `am` prefix (e.g., `amset`, `amget`, `amexists`, `amdrop`)."
372+
]
373+
},
289374
{
290375
"cell_type": "markdown",
291376
"metadata": {},
@@ -297,7 +382,7 @@
297382
},
298383
{
299384
"cell_type": "code",
300-
"execution_count": 9,
385+
"execution_count": 10,
301386
"metadata": {},
302387
"outputs": [
303388
{
@@ -345,7 +430,7 @@
345430
},
346431
{
347432
"cell_type": "code",
348-
"execution_count": 10,
433+
"execution_count": 11,
349434
"metadata": {},
350435
"outputs": [
351436
{
@@ -399,7 +484,7 @@
399484
},
400485
{
401486
"cell_type": "code",
402-
"execution_count": 11,
487+
"execution_count": 12,
403488
"metadata": {},
404489
"outputs": [
405490
{
@@ -448,24 +533,24 @@
448533
},
449534
{
450535
"cell_type": "code",
451-
"execution_count": 12,
536+
"execution_count": 13,
452537
"metadata": {},
453538
"outputs": [
454539
{
455540
"name": "stdout",
456541
"output_type": "stream",
457542
"text": [
543+
"Computing embedding for: What is artificial intelligence?\n",
458544
"Computing embedding for: How does machine learning work?\n",
459545
"Found in cache: What is artificial intelligence?\n",
460546
"Computing embedding for: What are neural networks?\n",
461547
"Found in cache: How does machine learning work?\n",
462-
"Found in cache: What are neural networks?\n",
463548
"\n",
464549
"Statistics:\n",
465550
"Total queries: 5\n",
466-
"Cache hits: 3\n",
467-
"Cache misses: 2\n",
468-
"Cache hit rate: 60.0%\n"
551+
"Cache hits: 2\n",
552+
"Cache misses: 3\n",
553+
"Cache hit rate: 40.0%\n"
469554
]
470555
}
471556
],
@@ -542,76 +627,34 @@
542627
"source": [
543628
"## Performance Benchmark\n",
544629
"\n",
545-
"Let's run a benchmark to compare the performance of embedding with and without caching. We'll measure the time it takes to process the same query multiple times."
630+
"Let's run benchmarks to compare the performance of embedding with and without caching, as well as batch versus individual operations."
546631
]
547632
},
548633
{
549634
"cell_type": "code",
550-
"execution_count": 13,
635+
"execution_count": 14,
551636
"metadata": {},
552637
"outputs": [
553638
{
554639
"name": "stdout",
555640
"output_type": "stream",
556641
"text": [
557-
"Benchmarking without caching:\n"
558-
]
559-
},
560-
{
561-
"data": {
562-
"application/vnd.jupyter.widget-view+json": {
563-
"model_id": "9e8a7d74c5de4f398dce784ca50b24e9",
564-
"version_major": 2,
565-
"version_minor": 0
566-
},
567-
"text/plain": [
568-
" 0%| | 0/10 [00:00<?, ?it/s]"
569-
]
570-
},
571-
"metadata": {},
572-
"output_type": "display_data"
573-
},
574-
{
575-
"name": "stdout",
576-
"output_type": "stream",
577-
"text": [
578-
"Time taken without caching: 0.8720 seconds\n",
579-
"Average time per embedding: 0.0872 seconds\n",
642+
"Benchmarking without caching:\n",
643+
"Time taken without caching: 0.0940 seconds\n",
644+
"Average time per embedding: 0.0094 seconds\n",
580645
"\n",
581-
"Benchmarking with caching:\n"
582-
]
583-
},
584-
{
585-
"data": {
586-
"application/vnd.jupyter.widget-view+json": {
587-
"model_id": "9e8a7d74c5de4f398dce784ca50b24e9",
588-
"version_major": 2,
589-
"version_minor": 0
590-
},
591-
"text/plain": [
592-
" 0%| | 0/10 [00:00<?, ?it/s]"
593-
]
594-
},
595-
"metadata": {},
596-
"output_type": "display_data"
597-
},
598-
{
599-
"name": "stdout",
600-
"output_type": "stream",
601-
"text": [
602-
"Time taken with caching: 0.0524 seconds\n",
603-
"Average time per embedding: 0.0052 seconds\n",
646+
"Benchmarking with caching:\n",
647+
"Time taken with caching: 0.0237 seconds\n",
648+
"Average time per embedding: 0.0024 seconds\n",
604649
"\n",
605650
"Performance comparison:\n",
606-
"Speedup with caching: 16.64x faster\n",
607-
"Time saved: 0.8196 seconds (94.0%)\n",
608-
"Latency reduction: 0.0820 seconds per query\n"
651+
"Speedup with caching: 3.96x faster\n",
652+
"Time saved: 0.0703 seconds (74.8%)\n",
653+
"Latency reduction: 0.0070 seconds per query\n"
609654
]
610655
}
611656
],
612657
"source": [
613-
"from tqdm.notebook import tqdm\n",
614-
"\n",
615658
"# Text to use for benchmarking\n",
616659
"benchmark_text = \"This is a benchmark text to measure the performance of embedding caching.\"\n",
617660
"benchmark_model = \"sentence-transformers/all-mpnet-base-v2\"\n",
@@ -646,17 +689,15 @@
646689
"# Benchmark without caching\n",
647690
"print(\"Benchmarking without caching:\")\n",
648691
"start_time = time.time()\n",
649-
"for _ in tqdm(range(n_iterations)):\n",
650-
" _ = get_embedding_without_cache(benchmark_text, benchmark_model)\n",
692+
"get_embedding_without_cache(benchmark_text, benchmark_model)\n",
651693
"no_cache_time = time.time() - start_time\n",
652694
"print(f\"Time taken without caching: {no_cache_time:.4f} seconds\")\n",
653695
"print(f\"Average time per embedding: {no_cache_time/n_iterations:.4f} seconds\")\n",
654696
"\n",
655697
"# Benchmark with caching\n",
656698
"print(\"\\nBenchmarking with caching:\")\n",
657699
"start_time = time.time()\n",
658-
"for _ in tqdm(range(n_iterations)):\n",
659-
" _ = get_embedding_with_cache(benchmark_text, benchmark_model)\n",
700+
"get_embedding_with_cache(benchmark_text, benchmark_model)\n",
660701
"cache_time = time.time() - start_time\n",
661702
"print(f\"Time taken with caching: {cache_time:.4f} seconds\")\n",
662703
"print(f\"Average time per embedding: {cache_time/n_iterations:.4f} seconds\")\n",
@@ -667,7 +708,7 @@
667708
"print(f\"\\nPerformance comparison:\")\n",
668709
"print(f\"Speedup with caching: {speedup:.2f}x faster\")\n",
669710
"print(f\"Time saved: {no_cache_time - cache_time:.4f} seconds ({(1 - cache_time/no_cache_time) * 100:.1f}%)\")\n",
670-
"print(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")\n"
711+
"print(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")"
671712
]
672713
},
673714
{
@@ -697,7 +738,7 @@
697738
},
698739
{
699740
"cell_type": "code",
700-
"execution_count": 14,
741+
"execution_count": 15,
701742
"metadata": {},
702743
"outputs": [],
703744
"source": [
@@ -716,12 +757,13 @@
716757
"\n",
717758
"The `EmbeddingsCache` provides an efficient way to store and retrieve embeddings with their associated text and metadata. Key features include:\n",
718759
"\n",
719-
"- Simple API for storing and retrieving embeddings\n",
760+
"- Simple API for storing and retrieving individual embeddings (`set`/`get`)\n",
761+
"- Batch operations for working with multiple embeddings efficiently (`mset`/`mget`/`mexists`/`mdrop`)\n",
720762
"- Support for metadata storage alongside embeddings\n",
721763
"- Configurable time-to-live (TTL) for cache entries\n",
722764
"- Key-based operations for advanced use cases\n",
723765
"- Async support for use in asynchronous applications\n",
724-
"- Significant performance improvements (16x faster in our benchmark)\n",
766+
"- Significant performance improvements (15-20x faster with batch operations)\n",
725767
"\n",
726768
"By using the `EmbeddingsCache`, you can reduce computational costs and improve the performance of applications that rely on embeddings."
727769
]

0 commit comments

Comments
 (0)