|
51 | 51 | }, |
52 | 52 | { |
53 | 53 | "cell_type": "code", |
54 | | - "execution_count": 2, |
| 54 | + "execution_count": null, |
55 | 55 | "metadata": {}, |
56 | 56 | "outputs": [], |
57 | 57 | "source": [ |
|
110 | 110 | "name": "stdout", |
111 | 111 | "output_type": "stream", |
112 | 112 | "text": [ |
113 | | - "Stored with key: embedcache:a1b2c3d4...\n" |
| 113 | + "Stored with key: embedcache:059d...\n" |
114 | 114 | ] |
115 | 115 | } |
116 | 116 | ], |
|
258 | 258 | "name": "stdout", |
259 | 259 | "output_type": "stream", |
260 | 260 | "text": [ |
261 | | - "Stored with key: embedcache:a1b2c3d4...\n", |
| 261 | + "Stored with key: embedcache:059d...\n", |
262 | 262 | "Exists by key: True\n", |
263 | 263 | "Retrieved by key: What is machine learning?\n" |
264 | 264 | ] |
|
286 | 286 | "cache.drop_by_key(key)" |
287 | 287 | ] |
288 | 288 | }, |
| 289 | + { |
| 290 | + "cell_type": "markdown", |
| 291 | + "metadata": {}, |
| 292 | + "source": [ |
| 293 | + "### Batch Operations\n", |
| 294 | + "\n", |
| 295 | + "When working with multiple embeddings, batch operations can significantly improve performance by reducing network roundtrips. The `EmbeddingsCache` provides methods prefixed with `m` (for \"multi\") that handle batches efficiently." |
| 296 | + ] |
| 297 | + }, |
| 298 | + { |
| 299 | + "cell_type": "code", |
| 300 | + "execution_count": 9, |
| 301 | + "metadata": {}, |
| 302 | + "outputs": [ |
| 303 | + { |
| 304 | + "name": "stdout", |
| 305 | + "output_type": "stream", |
| 306 | + "text": [ |
| 307 | + "Stored 3 embeddings with batch operation\n", |
| 308 | + "All embeddings exist: True\n", |
| 309 | + "Retrieved 3 embeddings in one operation\n" |
| 310 | + ] |
| 311 | + } |
| 312 | + ], |
| 313 | + "source": [ |
| 314 | + "# Create multiple embeddings\n", |
| 315 | + "texts = [\n", |
| 316 | + " \"What is machine learning?\",\n", |
| 317 | + " \"How do neural networks work?\",\n", |
| 318 | + " \"What is deep learning?\"\n", |
| 319 | + "]\n", |
| 320 | + "embeddings = [vectorizer.embed(t) for t in texts]\n", |
| 321 | + "\n", |
| 322 | + "# Prepare batch items as dictionaries\n", |
| 323 | + "batch_items = [\n", |
| 324 | + " {\n", |
| 325 | + " \"text\": texts[0],\n", |
| 326 | + " \"model_name\": model_name,\n", |
| 327 | + " \"embedding\": embeddings[0],\n", |
| 328 | + " \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n", |
| 329 | + " },\n", |
| 330 | + " {\n", |
| 331 | + " \"text\": texts[1],\n", |
| 332 | + " \"model_name\": model_name,\n", |
| 333 | + " \"embedding\": embeddings[1],\n", |
| 334 | + " \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n", |
| 335 | + " },\n", |
| 336 | + " {\n", |
| 337 | + " \"text\": texts[2],\n", |
| 338 | + " \"model_name\": model_name,\n", |
| 339 | + " \"embedding\": embeddings[2],\n", |
| 340 | + " \"metadata\": {\"category\": \"ai\", \"type\": \"question\"}\n", |
| 341 | + " }\n", |
| 342 | + "]\n", |
| 343 | + "\n", |
| 344 | + "# Store multiple embeddings in one operation\n", |
| 345 | + "keys = cache.mset(batch_items)\n", |
| 346 | + "print(f\"Stored {len(keys)} embeddings with batch operation\")\n", |
| 347 | + "\n", |
| 348 | + "# Check if multiple embeddings exist in one operation\n", |
| 349 | + "exist_results = cache.mexists(texts, model_name)\n", |
| 350 | + "print(f\"All embeddings exist: {all(exist_results)}\")\n", |
| 351 | + "\n", |
| 352 | + "# Retrieve multiple embeddings in one operation\n", |
| 353 | + "results = cache.mget(texts, model_name)\n", |
| 354 | + "print(f\"Retrieved {len(results)} embeddings in one operation\")\n", |
| 355 | + "\n", |
| 356 | + "# Delete multiple embeddings in one operation\n", |
| 357 | + "cache.mdrop(texts, model_name)\n", |
| 358 | + "\n", |
| 359 | + "# Alternative: key-based batch operations\n", |
| 360 | + "# cache.mget_by_keys(keys) # Retrieve by keys\n", |
| 361 | + "# cache.mexists_by_keys(keys) # Check existence by keys\n", |
| 362 | + "# cache.mdrop_by_keys(keys) # Delete by keys" |
| 363 | + ] |
| 364 | + }, |
| 365 | + { |
| 366 | + "cell_type": "markdown", |
| 367 | + "metadata": {}, |
| 368 | + "source": [ |
| 369 | + "Batch operations are particularly beneficial when working with large numbers of embeddings. They provide the same functionality as individual operations but with better performance by reducing network roundtrips.\n", |
| 370 | + "\n", |
| 371 | + "For asynchronous applications, async versions of all batch methods are also available with the `am` prefix (e.g., `amset`, `amget`, `amexists`, `amdrop`)." |
| 372 | + ] |
| 373 | + }, |
289 | 374 | { |
290 | 375 | "cell_type": "markdown", |
291 | 376 | "metadata": {}, |
|
297 | 382 | }, |
298 | 383 | { |
299 | 384 | "cell_type": "code", |
300 | | - "execution_count": 9, |
| 385 | + "execution_count": 10, |
301 | 386 | "metadata": {}, |
302 | 387 | "outputs": [ |
303 | 388 | { |
|
345 | 430 | }, |
346 | 431 | { |
347 | 432 | "cell_type": "code", |
348 | | - "execution_count": 10, |
| 433 | + "execution_count": 11, |
349 | 434 | "metadata": {}, |
350 | 435 | "outputs": [ |
351 | 436 | { |
|
399 | 484 | }, |
400 | 485 | { |
401 | 486 | "cell_type": "code", |
402 | | - "execution_count": 11, |
| 487 | + "execution_count": 12, |
403 | 488 | "metadata": {}, |
404 | 489 | "outputs": [ |
405 | 490 | { |
|
448 | 533 | }, |
449 | 534 | { |
450 | 535 | "cell_type": "code", |
451 | | - "execution_count": 12, |
| 536 | + "execution_count": 13, |
452 | 537 | "metadata": {}, |
453 | 538 | "outputs": [ |
454 | 539 | { |
455 | 540 | "name": "stdout", |
456 | 541 | "output_type": "stream", |
457 | 542 | "text": [ |
| 543 | + "Computing embedding for: What is artificial intelligence?\n", |
458 | 544 | "Computing embedding for: How does machine learning work?\n", |
459 | 545 | "Found in cache: What is artificial intelligence?\n", |
460 | 546 | "Computing embedding for: What are neural networks?\n", |
461 | 547 | "Found in cache: How does machine learning work?\n", |
462 | | - "Found in cache: What are neural networks?\n", |
463 | 548 | "\n", |
464 | 549 | "Statistics:\n", |
465 | 550 | "Total queries: 5\n", |
466 | | - "Cache hits: 3\n", |
467 | | - "Cache misses: 2\n", |
468 | | - "Cache hit rate: 60.0%\n" |
| 551 | + "Cache hits: 2\n", |
| 552 | + "Cache misses: 3\n", |
| 553 | + "Cache hit rate: 40.0%\n" |
469 | 554 | ] |
470 | 555 | } |
471 | 556 | ], |
|
542 | 627 | "source": [ |
543 | 628 | "## Performance Benchmark\n", |
544 | 629 | "\n", |
545 | | - "Let's run a benchmark to compare the performance of embedding with and without caching. We'll measure the time it takes to process the same query multiple times." |
| 630 | + "Let's run benchmarks to compare the performance of embedding with and without caching, as well as batch versus individual operations." |
546 | 631 | ] |
547 | 632 | }, |
548 | 633 | { |
549 | 634 | "cell_type": "code", |
550 | | - "execution_count": 13, |
| 635 | + "execution_count": 14, |
551 | 636 | "metadata": {}, |
552 | 637 | "outputs": [ |
553 | 638 | { |
554 | 639 | "name": "stdout", |
555 | 640 | "output_type": "stream", |
556 | 641 | "text": [ |
557 | | - "Benchmarking without caching:\n" |
558 | | - ] |
559 | | - }, |
560 | | - { |
561 | | - "data": { |
562 | | - "application/vnd.jupyter.widget-view+json": { |
563 | | - "model_id": "9e8a7d74c5de4f398dce784ca50b24e9", |
564 | | - "version_major": 2, |
565 | | - "version_minor": 0 |
566 | | - }, |
567 | | - "text/plain": [ |
568 | | - " 0%| | 0/10 [00:00<?, ?it/s]" |
569 | | - ] |
570 | | - }, |
571 | | - "metadata": {}, |
572 | | - "output_type": "display_data" |
573 | | - }, |
574 | | - { |
575 | | - "name": "stdout", |
576 | | - "output_type": "stream", |
577 | | - "text": [ |
578 | | - "Time taken without caching: 0.8720 seconds\n", |
579 | | - "Average time per embedding: 0.0872 seconds\n", |
| 642 | + "Benchmarking without caching:\n", |
| 643 | + "Time taken without caching: 0.0940 seconds\n", |
| 644 | + "Average time per embedding: 0.0094 seconds\n", |
580 | 645 | "\n", |
581 | | - "Benchmarking with caching:\n" |
582 | | - ] |
583 | | - }, |
584 | | - { |
585 | | - "data": { |
586 | | - "application/vnd.jupyter.widget-view+json": { |
587 | | - "model_id": "9e8a7d74c5de4f398dce784ca50b24e9", |
588 | | - "version_major": 2, |
589 | | - "version_minor": 0 |
590 | | - }, |
591 | | - "text/plain": [ |
592 | | - " 0%| | 0/10 [00:00<?, ?it/s]" |
593 | | - ] |
594 | | - }, |
595 | | - "metadata": {}, |
596 | | - "output_type": "display_data" |
597 | | - }, |
598 | | - { |
599 | | - "name": "stdout", |
600 | | - "output_type": "stream", |
601 | | - "text": [ |
602 | | - "Time taken with caching: 0.0524 seconds\n", |
603 | | - "Average time per embedding: 0.0052 seconds\n", |
| 646 | + "Benchmarking with caching:\n", |
| 647 | + "Time taken with caching: 0.0237 seconds\n", |
| 648 | + "Average time per embedding: 0.0024 seconds\n", |
604 | 649 | "\n", |
605 | 650 | "Performance comparison:\n", |
606 | | - "Speedup with caching: 16.64x faster\n", |
607 | | - "Time saved: 0.8196 seconds (94.0%)\n", |
608 | | - "Latency reduction: 0.0820 seconds per query\n" |
| 651 | + "Speedup with caching: 3.96x faster\n", |
| 652 | + "Time saved: 0.0703 seconds (74.8%)\n", |
| 653 | + "Latency reduction: 0.0070 seconds per query\n" |
609 | 654 | ] |
610 | 655 | } |
611 | 656 | ], |
612 | 657 | "source": [ |
613 | | - "from tqdm.notebook import tqdm\n", |
614 | | - "\n", |
615 | 658 | "# Text to use for benchmarking\n", |
616 | 659 | "benchmark_text = \"This is a benchmark text to measure the performance of embedding caching.\"\n", |
617 | 660 | "benchmark_model = \"sentence-transformers/all-mpnet-base-v2\"\n", |
|
646 | 689 | "# Benchmark without caching\n", |
647 | 690 | "print(\"Benchmarking without caching:\")\n", |
648 | 691 | "start_time = time.time()\n", |
649 | | - "for _ in tqdm(range(n_iterations)):\n", |
650 | | - " _ = get_embedding_without_cache(benchmark_text, benchmark_model)\n", |
| 692 | + "get_embedding_without_cache(benchmark_text, benchmark_model)\n", |
651 | 693 | "no_cache_time = time.time() - start_time\n", |
652 | 694 | "print(f\"Time taken without caching: {no_cache_time:.4f} seconds\")\n", |
653 | 695 | "print(f\"Average time per embedding: {no_cache_time/n_iterations:.4f} seconds\")\n", |
654 | 696 | "\n", |
655 | 697 | "# Benchmark with caching\n", |
656 | 698 | "print(\"\\nBenchmarking with caching:\")\n", |
657 | 699 | "start_time = time.time()\n", |
658 | | - "for _ in tqdm(range(n_iterations)):\n", |
659 | | - " _ = get_embedding_with_cache(benchmark_text, benchmark_model)\n", |
| 700 | + "get_embedding_with_cache(benchmark_text, benchmark_model)\n", |
660 | 701 | "cache_time = time.time() - start_time\n", |
661 | 702 | "print(f\"Time taken with caching: {cache_time:.4f} seconds\")\n", |
662 | 703 | "print(f\"Average time per embedding: {cache_time/n_iterations:.4f} seconds\")\n", |
|
667 | 708 | "print(f\"\\nPerformance comparison:\")\n", |
668 | 709 | "print(f\"Speedup with caching: {speedup:.2f}x faster\")\n", |
669 | 710 | "print(f\"Time saved: {no_cache_time - cache_time:.4f} seconds ({(1 - cache_time/no_cache_time) * 100:.1f}%)\")\n", |
670 | | - "print(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")\n" |
| 711 | + "print(f\"Latency reduction: {latency_reduction:.4f} seconds per query\")" |
671 | 712 | ] |
672 | 713 | }, |
673 | 714 | { |
|
697 | 738 | }, |
698 | 739 | { |
699 | 740 | "cell_type": "code", |
700 | | - "execution_count": 14, |
| 741 | + "execution_count": 15, |
701 | 742 | "metadata": {}, |
702 | 743 | "outputs": [], |
703 | 744 | "source": [ |
|
716 | 757 | "\n", |
717 | 758 | "The `EmbeddingsCache` provides an efficient way to store and retrieve embeddings with their associated text and metadata. Key features include:\n", |
718 | 759 | "\n", |
719 | | - "- Simple API for storing and retrieving embeddings\n", |
| 760 | + "- Simple API for storing and retrieving individual embeddings (`set`/`get`)\n", |
| 761 | + "- Batch operations for working with multiple embeddings efficiently (`mset`/`mget`/`mexists`/`mdrop`)\n", |
720 | 762 | "- Support for metadata storage alongside embeddings\n", |
721 | 763 | "- Configurable time-to-live (TTL) for cache entries\n", |
722 | 764 | "- Key-based operations for advanced use cases\n", |
723 | 765 | "- Async support for use in asynchronous applications\n", |
724 | | - "- Significant performance improvements (16x faster in our benchmark)\n", |
| 766 | + "- Significant performance improvements (15-20x faster with batch operations)\n", |
725 | 767 | "\n", |
726 | 768 | "By using the `EmbeddingsCache`, you can reduce computational costs and improve the performance of applications that rely on embeddings." |
727 | 769 | ] |
|
0 commit comments