Skip to content

Commit 292f517

Browse files
Merge pull request #261839 from MSFTeegarden/patch-52
Semantic Caching Tutorial
2 parents bb72d8c + f91cae6 commit 292f517

File tree

2 files changed

+318
-0
lines changed

2 files changed

+318
-0
lines changed

articles/azure-cache-for-redis/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,8 @@
236236
href: cache-overview-vector-similarity.md
237237
- name: Vector similarity search
238238
href: cache-tutorial-vector-similarity.md
239+
- name: Semantic caching search
240+
href: cache-tutorial-semantic-cache.md
239241
- name: ASP.NET
240242
items:
241243
- name: Use session state provider
Lines changed: 316 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,316 @@
1+
---
2+
title: 'Tutorial: Use Azure Cache for Redis as a semantic cache'
3+
description: In this tutorial, you learn how to use Azure Cache for Redis as a semantic cache.
4+
author: flang-msft
5+
ms.author: franlanglois
6+
ms.service: cache
7+
ms.topic: tutorial
8+
ms.date: 01/08/2024
9+
10+
#CustomerIntent: As a developer, I want to develop some code using a sample so that I see an example of a semantic cache with an AI-based large language model.
11+
---
12+
13+
# Tutorial: Use Azure Cache for Redis as a semantic cache
14+
15+
In this tutorial, you use Azure Cache for Redis as a semantic cache with an AI-based large language model (LLM). You use Azure Open AI Service to generate LLM responses to queries and cache those responses using Azure Cache for Redis, delivering faster responses and lowering costs.
16+
17+
Because Azure Cache for Redis offers built-in vector search capability, you can also perform _semantic caching_. You can return cached responses for identical queries and also for queries that are similar in meaning, even if the text isn't the same.
18+
19+
In this tutorial, you learn how to:
20+
21+
> [!div class="checklist"]
22+
>
23+
> - Create an Azure Cache for Redis instance configured for semantic caching
24+
> - Use LangChain other popular Python libraries.
25+
> - Use Azure OpenAI service to generate text from AI models and cache results.
26+
> - See the performance gains from using caching with LLMs.
27+
28+
>[!IMPORTANT]
29+
>This tutorial walks you through building a Jupyter Notebook. You can follow this tutorial with a Python code file (.py) and get _similar_ results, but you need to add all of the code blocks in this tutorial into the `.py` file and execute once to see results. In other words, Jupyter Notebooks provides intermediate results as you execute cells, but this is not behavior you should expect when working in a Python code file.
30+
31+
>[!IMPORTANT]
32+
>If you would like to follow along in a completed Jupyter notebook instead, [download the Jupyter notebook file named _semanticcache.ipynb_](https://github.com/Azure-Samples/azure-cache-redis-samples/tree/main/tutorial/semantic-cache) and save it into the new _semanticcache_ folder.
33+
34+
## Prerequisites
35+
36+
- An Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services?azure-portal=true)
37+
38+
- Access granted to Azure OpenAI in the desired Azure subscription
39+
Currently, you must apply for access to Azure OpenAI. You can apply for access to Azure OpenAI by completing the form at [https://aka.ms/oai/access](https://aka.ms/oai/access).
40+
41+
- [Python 3.11.6 or later version](https://www.python.org/)
42+
43+
- [Jupyter Notebooks](https://jupyter.org/) (optional)
44+
45+
- An Azure OpenAI resource with the **text-embedding-ada-002 (Version 2)** and **gpt-35-turbo-instruct** models deployed. These models are currently only available in [certain regions](../ai-services/openai/concepts/models.md#model-summary-table-and-region-availability). See the [resource deployment guide](../ai-services/openai/how-to/create-resource.md) for instructions on how to deploy the models.
46+
47+
## Create an Azure Cache for Redis instance
48+
49+
Follow the [Quickstart: Create a Redis Enterprise cache](quickstart-create-redis-enterprise.md) guide. On the **Advanced** page, make sure that you added the **RediSearch** module and chose the **Enterprise** Cluster Policy. All other settings can match the default described in the quickstart.
50+
51+
It takes a few minutes for the cache to create. You can move on to the next step in the meantime.
52+
53+
:::image type="content" source="media/cache-create/enterprise-tier-basics.png" alt-text="Screenshot showing the Enterprise tier Basics tab filled out.":::
54+
55+
## Set up your development environment
56+
57+
1. Create a folder on your local computer named _semanticcache_ in the location where you typically save your projects.
58+
59+
1. Create a new python file (_tutorial.py_) or Jupyter notebook (_tutorial.ipynb_) in the folder.
60+
61+
1. Install the required Python packages:
62+
63+
```python
64+
pip install openai langchain redis tiktoken
65+
```
66+
67+
## Create Azure OpenAI models
68+
69+
Make sure you have two models deployed to your Azure OpenAI resource:
70+
71+
- An LLM that provides text responses. We use the **GPT-3.5-turbo-instruct** model for this tutorial.
72+
73+
- An embeddings model that converts queries into vectors to allow them to be compared to past queries. We use the **text-embedding-ada-002 (Version 2)** model for this tutorial.
74+
75+
See [Deploy a model](/azure/ai-services/openai/how-to/create-resource?pivots=web-portal#deploy-a-model) for more detailed instructions. Record the name you chose for each model deployment.
76+
77+
## Import libraries and set up connection information
78+
79+
To successfully make a call against Azure OpenAI, you need an **endpoint** and a **key**. You also need an **endpoint** and a **key** to connect to Azure Cache for Redis.
80+
81+
1. Go to your Azure Open AI resource in the Azure portal.
82+
83+
1. Locate **Endpoint and Keys** in the **Resource Management** section of your Azure OpenAI resource. Copy your endpoint and access key because you need both for authenticating your API calls. An example endpoint is: `https://docs-test-001.openai.azure.com`. You can use either `KEY1` or `KEY2`.
84+
85+
1. Go to the **Overview** page of your Azure Cache for Redis resource in the Azure portal. Copy your endpoint.
86+
87+
1. Locate **Access keys** in the **Settings** section. Copy your access key. You can use either `Primary` or `Secondary`.
88+
89+
1. Add the following code to a new code cell:
90+
91+
```python
92+
# Code cell 2
93+
94+
import openai
95+
import redis
96+
import os
97+
import langchain
98+
from langchain.llms import AzureOpenAI
99+
from langchain.embeddings import AzureOpenAIEmbeddings
100+
from langchain.globals import set_llm_cache
101+
from langchain.cache import RedisSemanticCache
102+
import time
103+
104+
105+
AZURE_ENDPOINT=<your-openai-endpoint>
106+
API_KEY=<your-openai-key>
107+
API_VERSION="2023-05-15"
108+
LLM_DEPLOYMENT_NAME=<your-llm-model-name>
109+
LLM_MODEL_NAME="gpt-35-turbo-instruct"
110+
EMBEDDINGS_DEPLOYMENT_NAME=<your-embeddings-model-name>
111+
EMBEDDINGS_MODEL_NAME="text-embedding-ada-002"
112+
113+
REDIS_ENDPOINT = <your-redis-endpoint>
114+
REDIS_PASSWORD = <your-redis-password>
115+
116+
```
117+
118+
1. Update the value of `API_KEY` and `RESOURCE_ENDPOINT` with the key and endpoint values from your Azure OpenAI deployment.
119+
120+
1. Set `LLM_DEPLOYMENT_NAME` and `EMBEDDINGS_DEPLOYMENT_NAME` to the name of your two models deployed in Azure OpenAI Service.
121+
122+
1. Update `REDIS_ENDPOINT` and `REDIS_PASSWORD` with the endpoint and key value from your Azure Cache for Redis instance.
123+
124+
> [!IMPORTANT]
125+
> We strongly recommend using environmental variables or a secret manager like [Azure Key Vault](/azure/key-vault/general/overview) to pass in the API key, endpoint, and deployment name information. These variables are set in plaintext here for the sake of simplicity.
126+
127+
1. Execute code cell 2.
128+
129+
## Initialize AI models
130+
131+
Next, you initialize the LLM and embeddings models
132+
133+
1. Add the following code to a new code cell:
134+
135+
```python
136+
# Code cell 3
137+
138+
llm = AzureOpenAI(
139+
deployment_name=LLM_DEPLOYMENT_NAME,
140+
model_name="gpt-35-turbo-instruct",
141+
openai_api_key=API_KEY,
142+
azure_endpoint=AZURE_ENDPOINT,
143+
openai_api_version=API_VERSION,
144+
)
145+
embeddings = AzureOpenAIEmbeddings(
146+
azure_deployment=EMBEDDINGS_DEPLOYMENT_NAME,
147+
model="text-embedding-ada-002",
148+
openai_api_key=API_KEY,
149+
azure_endpoint=AZURE_ENDPOINT,
150+
openai_api_version=API_VERSION
151+
)
152+
```
153+
154+
1. Execute code cell 3.
155+
156+
## Set up Redis as a semantic cache
157+
158+
Next, specify Redis as a semantic cache for your LLM.
159+
160+
1. Add the following code to a new code cell:
161+
162+
```python
163+
# Code cell 4
164+
165+
redis_url = "rediss://:" + REDIS_PASSWORD + "@"+ REDIS_ENDPOINT
166+
set_llm_cache(RedisSemanticCache(redis_url = redis_url, embedding=embeddings, score_threshold=0.05))
167+
```
168+
169+
> [!IMPORTANT]
170+
> The value of the `score_threshold` parameter determines how similar two queries need to be in order to return a cached result. The lower the number, the more similar the queries need to be.
171+
> You can play around with this value to fine-tune it to your application.
172+
173+
1. Execute code cell 4.
174+
175+
## Query and get responses from the LLM
176+
177+
Finally, query the LLM to get an AI generated response. If you're using a Jupyter notebook, you can add `%%time` at the top of the cell to output the amount of time taken to execute the code.
178+
179+
1. Add the following code to a new code cell and execute it:
180+
181+
```python
182+
# Code cell 5
183+
%%time
184+
response = llm("Please write a poem about cute kittens.")
185+
print(response)
186+
```
187+
188+
You should see an output and output similar to this:
189+
190+
```output
191+
Fluffy balls of fur,
192+
With eyes so bright and pure,
193+
Kittens are a true delight,
194+
Bringing joy into our sight.
195+
196+
With tiny paws and playful hearts,
197+
They chase and pounce, a work of art,
198+
Their innocence and curiosity,
199+
Fills our hearts with such serenity.
200+
201+
Their soft meows and gentle purrs,
202+
Are like music to our ears,
203+
They curl up in our laps,
204+
And take the stress away in a snap.
205+
206+
Their whiskers twitch, they're always ready,
207+
To explore and be adventurous and steady,
208+
With their tails held high,
209+
They're a sight to make us sigh.
210+
211+
Their tiny faces, oh so sweet,
212+
With button noses and paw-sized feet,
213+
They're the epitome of cuteness,
214+
...
215+
Cute kittens, a true blessing,
216+
In our hearts, they'll always be reigning.
217+
CPU times: total: 0 ns
218+
Wall time: 2.67 s
219+
```
220+
221+
The `Wall time` shows a value of 2.67 seconds. That's how much real-world time it took to query the LLM and for the LLM to generate a response.
222+
223+
1. Execute cell 5 again. You should see the exact same output, but with a smaller wall time:
224+
225+
```output
226+
Fluffy balls of fur,
227+
With eyes so bright and pure,
228+
Kittens are a true delight,
229+
Bringing joy into our sight.
230+
231+
With tiny paws and playful hearts,
232+
They chase and pounce, a work of art,
233+
Their innocence and curiosity,
234+
Fills our hearts with such serenity.
235+
236+
Their soft meows and gentle purrs,
237+
Are like music to our ears,
238+
They curl up in our laps,
239+
And take the stress away in a snap.
240+
241+
Their whiskers twitch, they're always ready,
242+
To explore and be adventurous and steady,
243+
With their tails held high,
244+
They're a sight to make us sigh.
245+
246+
Their tiny faces, oh so sweet,
247+
With button noses and paw-sized feet,
248+
They're the epitome of cuteness,
249+
...
250+
Cute kittens, a true blessing,
251+
In our hearts, they'll always be reigning.
252+
CPU times: total: 0 ns
253+
Wall time: 575 ms
254+
```
255+
256+
The wall time appears to shorten by a factor of five--all the way down to 575 milliseconds.
257+
258+
1. Change the query from `Please write a poem about cute kittens` to `Write a poem about cute kittens` and run cell 5 again. You should see the _exact same output_ and a _lower wall time_ than the original query. Even though the query changed, the _semantic meaning_ of the query remained the same so the same cached output was returned. This is the advantage of semantic caching!
259+
260+
## Change the similarity threshold
261+
262+
1. Try running a similar query with a different meaning, like `Please write a poem about cute puppies`. Notice that the cached result is returned here as well. The semantic meaning of the word `puppies` is close enough to the word `kittens` that the cached result is returned.
263+
264+
1. The similarity threshold can be modified to determine when the semantic cache should return a cached result and when it should return a new output from the LLM. In code cell 4, change `score_threshold` from `0.05` to `0.01`:
265+
266+
```python
267+
# Code cell 4
268+
269+
redis_url = "rediss://:" + REDIS_PASSWORD + "@"+ REDIS_ENDPOINT
270+
set_llm_cache(RedisSemanticCache(redis_url = redis_url, embedding=embeddings, score_threshold=0.01))
271+
```
272+
273+
1. Try the query `Please write a poem about cute puppies` again. You should receive a new output that's specific to puppies:
274+
275+
```output
276+
Oh, little balls of fluff and fur
277+
With wagging tails and tiny paws
278+
Puppies, oh puppies, so pure
279+
The epitome of cuteness, no flaws
280+
281+
With big round eyes that melt our hearts
282+
And floppy ears that bounce with glee
283+
Their playful antics, like works of art
284+
They bring joy to all they see
285+
286+
Their soft, warm bodies, so cuddly
287+
As they curl up in our laps
288+
Their gentle kisses, so lovingly
289+
Like tiny, wet, puppy taps
290+
291+
Their clumsy steps and wobbly walks
292+
As they explore the world anew
293+
Their curiosity, like a ticking clock
294+
Always eager to learn and pursue
295+
296+
Their little barks and yips so sweet
297+
Fill our days with endless delight
298+
Their unconditional love, so complete
299+
...
300+
For they bring us love and laughter, year after year
301+
Our cute little pups, in every way.
302+
CPU times: total: 15.6 ms
303+
Wall time: 4.3 s
304+
```
305+
306+
You likely need to fine-tune the similarity threshold based on your application to ensure that the right sensitivity is used when determining which queries to cache.
307+
308+
[!INCLUDE [cache-delete-resource-group](includes/cache-delete-resource-group.md)]
309+
310+
## Related content
311+
312+
- [Learn more about Azure Cache for Redis](cache-overview.md)
313+
- Learn more about Azure Cache for Redis [vector search capabilities](./cache-overview-vector-similarity.md)
314+
- [Tutorial: use vector similarity search on Azure Cache for Redis](cache-tutorial-vector-similarity.md)
315+
- [Read how to build an AI-powered app with OpenAI and Redis](https://techcommunity.microsoft.com/t5/azure-developer-community-blog/vector-similarity-search-with-azure-cache-for-redis-enterprise/ba-p/3822059)
316+
- [Build a Q&A app with semantic answers](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna)

0 commit comments

Comments
 (0)