Skip to content

Commit 05a7607

Browse files
Merge pull request #268809 from khelanmodi/vcore-adgen-sample
Added Adgen sample demo
2 parents 72e7047 + c41c1ad commit 05a7607

File tree

5 files changed

+399
-0
lines changed

5 files changed

+399
-0
lines changed

articles/cosmos-db/mongodb/vcore/TOC.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,6 @@
8888
items:
8989
- name: Solution accelerators
9090
href: ../../solutions.md?pivots=api-mongodb
91+
- name: AI-Enhanced Advertisement Generation
92+
href: ai-advertisement-generation.md
93+
Lines changed: 392 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,392 @@
1+
---
2+
title: AI-Enhanced Advertisement Generation using Azure Cosmos DB for MongoDB vCore
3+
titleSuffix: Azure Cosmos DB
4+
description: Demonstrates the use of Azure Cosmos DB for MongoDB vCore's vector similarity search and OpenAI embeddings to generate advertising content.
5+
author: khelanmodi
6+
ms.author: khelanmodi
7+
ms.reviewer: gahllevy
8+
ms.service: cosmos-db
9+
ms.subservice: mongodb-vcore
10+
ms.topic: demonstration
11+
ms.date: 03/12/2024
12+
---
13+
14+
# AI-Enhanced Advertisement Generation using Azure Cosmos DB for MongoDB vCore
15+
In this guide, we demonstrate how to create dynamic advertising content that resonates with your audience, using our personalized AI assistant, Heelie. Utilizing Azure Cosmos DB for MongoDB vCore, we harness the [vector similarity search](./vector-search.md) functionality to semantically analyze and match inventory descriptions with advertisement topics. The process is made possible by generating vectors for inventory descriptions using OpenAI embeddings, which significantly enhance their semantic depth. These vectors are then stored and indexed within the Cosmos DB for MongoDB vCore resource. When generating content for advertisements, we vectorize the advertisement topic to find the best-matching inventory items. This is followed by a retrieval augmented generation (RAG) process, where the top matches are sent to OpenAI to craft a compelling advertisement. The entire codebase for the application is available in a [GitHub repository](https://aka.ms/adgen) for your reference.
16+
17+
## Features
18+
- **Vector Similarity Search**: Uses Azure Cosmos DB for MongoDB vCore's powerful vector similarity search to improve semantic search capabilities, making it easier to find relevant inventory items based on the content of advertisements.
19+
- **OpenAI Embeddings**: Utilizes the cutting-edge embeddings from OpenAI to generate vectors for inventory descriptions. This approach allows for more nuanced and semantically rich matches between the inventory and the advertisement content.
20+
- **Content Generation**: Employs OpenAI's advanced language models to generate engaging, trend-focused advertisements. This method ensures that the content is not only relevant but also captivating to the target audience.
21+
22+
<!-- > [!VIDEO https://www.youtube.com/live/MLY5Pc_tSXw?si=fQmAuQcZkVauhmu-&t=1078] -->
23+
24+
## Prerequisites
25+
- Azure OpenAI: Let's setup the Azure OpenAI resource. Access to this service is currently available by application only. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps:
26+
- Create an Azure OpenAI resource following this [quickstart](../../../ai-services/openai/how-to/create-resource.md?pivots=web-portal).
27+
- Deploy a `completions` and an `embeddings` model
28+
- For more information on `completions`, go [here](../../../ai-services/openai/how-to/completions.md).
29+
- For more information on `embeddings`, go [here](../../../ai-services/openai/how-to/embeddings.md).
30+
- Note down your endpoint, key, and deployment names.
31+
32+
- Cosmos DB for MongoDB vCore resource: Let's start by creating an Azure Cosmos DB for MongoDB vCore resource for free following this [quick start](./quickstart-portal.md) guide.
33+
- Note down the connection details (connection string).
34+
35+
- Python environment (>= 3.9 version) with packages such as `numpy`, `openai`, `pymongo`, `python-dotenv`, `azure-core`, `azure-cosmos`, `tenacity`, and `gradio`.
36+
37+
- Download the [data file](https://github.com/jayanta-mondal/ignite-demo/blob/main/data/shoes_with_vectors.json) and save it in a designated data folder.
38+
39+
## Running the Script
40+
Before we dive into the exciting part of generating AI-enhanced advertisements, we need to set up our environment. This setup involves installing the necessary packages to ensure our script runs smoothly. Here’s a step-by-step guide to get everything ready.
41+
42+
### 1.1 Install Necessary Packages
43+
44+
Firstly, we need to install a few Python packages. Open your terminal and run the following commands:
45+
46+
```bash
47+
pip install numpy
48+
pip install openai==1.2.3
49+
pip install pymongo
50+
pip install python-dotenv
51+
pip install azure-core
52+
pip install azure-cosmos
53+
pip install tenacity
54+
pip install gradio
55+
pip show openai
56+
```
57+
58+
### 1.2 Setting Up the OpenAI and Azure Client
59+
After installing the necessary packages, the next step involves setting up our OpenAI and Azure clients for the script, which is crucial for authenticating our requests to the OpenAI API and Azure services.
60+
61+
```python
62+
import json
63+
import time
64+
import openai
65+
66+
from dotenv import dotenv_values
67+
from openai import AzureOpenAI
68+
69+
# Configure the API to use Azure as the provider
70+
openai.api_type = "azure"
71+
openai.api_key = "<AZURE_OPENAI_API_KEY>" # Replace with your actual Azure OpenAI API key
72+
openai.api_base = "https://<OPENAI_ACCOUNT_NAME>.openai.azure.com/" # Replace with your OpenAI account name
73+
openai.api_version = "2023-06-01-preview"
74+
75+
# Initialize the AzureOpenAI client with your API key, version, and endpoint
76+
client = AzureOpenAI(
77+
api_key=openai.api_key,
78+
api_version=openai.api_version,
79+
azure_endpoint=openai.api_base
80+
)
81+
```
82+
83+
## Solution architecture
84+
![solution architecture](./media/tutorial-adgen/architecture.png)
85+
86+
## 2. Creating Embeddings and Setting up Cosmos DB
87+
88+
After setting up our environment and OpenAI client, we move to the core part of our AI-enhanced advertisement generation project. The following code creates vector embeddings from text descriptions of products and sets up our database in Azure Cosmos DB for MongoDB vCore to store and search these embeddings.
89+
90+
### 2.1 Create Embeddings
91+
92+
To generate compelling advertisements, we first need to understand the items in our inventory. We do this by creating vector embeddings from descriptions of our items, which allows us to capture their semantic meaning in a form that machines can understand and process. Here's how you can create vector embeddings for an item description using Azure OpenAI:
93+
94+
```python
95+
import openai
96+
97+
def generate_embeddings(text):
98+
try:
99+
response = client.embeddings.create(
100+
input=text, model="text-embedding-ada-002")
101+
embeddings = response.data[0].embedding
102+
return embeddings
103+
except Exception as e:
104+
print(f"An error occurred: {e}")
105+
return None
106+
107+
embeddings = generate_embeddings("Shoes for San Francisco summer")
108+
109+
if embeddings is not None:
110+
print(embeddings)
111+
```
112+
113+
The function takes a text input — like a product description — and uses the `client.embeddings.create` method from the OpenAI API to generate a vector embedding for that text. We're using the `text-embedding-ada-002` model here, but you can choose other models based on your requirements. If the process is successful, it prints the generated embeddings; otherwise, it handles exceptions by printing an error message.
114+
115+
## 3. Connect and set up Cosmos DB for MongoDB vCore
116+
With our embeddings ready, the next step is to store and index them in a database that supports vector similarity search. Azure Cosmos DB for MongoDB vCore is a perfect fit for this task because it's purpose built to store your transactional data and perform vector search all in one place.
117+
118+
### 3.1 Set up the connection
119+
To connect to Cosmos DB, we use the pymongo library, which allows us to interact with MongoDB easily. The following code snippet establishes a connection with our Cosmos DB for MongoDB vCore instance:
120+
```python
121+
import pymongo
122+
123+
# Replace <USERNAME>, <PASSWORD>, and <VCORE_CLUSTER_NAME> with your actual credentials and cluster name
124+
mongo_conn = "mongodb+srv://<USERNAME>:<PASSWORD>@<VCORE_CLUSTER_NAME>.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000"
125+
mongo_client = pymongo.MongoClient(mongo_conn)
126+
```
127+
128+
Replace `<USERNAME>`, `<PASSWORD>`, and `<VCORE_CLUSTER_NAME>` with your actual MongoDB username, password, and vCore cluster name, respectively.
129+
130+
## 4. Setting Up the Database and Vector Index in Cosmos DB
131+
132+
Once you've established a connection to Azure Cosmos DB, the next steps involve setting up your database and collection, and then creating a vector index to enable efficient vector similarity searches. Let's walk through these steps.
133+
134+
### 4.1 Set Up the Database and Collection
135+
136+
First, we create a database and a collection within our Cosmos DB instance. Here’s how:
137+
```python
138+
DATABASE_NAME = "AdgenDatabase"
139+
COLLECTION_NAME = "AdgenCollection"
140+
141+
mongo_client.drop_database(DATABASE_NAME)
142+
db = mongo_client[DATABASE_NAME]
143+
collection = db[COLLECTION_NAME]
144+
145+
if COLLECTION_NAME not in db.list_collection_names():
146+
# Creates a unsharded collection that uses the DBs shared throughput
147+
db.create_collection(COLLECTION_NAME)
148+
print("Created collection '{}'.\n".format(COLLECTION_NAME))
149+
else:
150+
print("Using collection: '{}'.\n".format(COLLECTION_NAME))
151+
```
152+
153+
### 4.2 Create the vector index
154+
To perform efficient vector similarity searches within our collection, we need to create a vector index. Cosmos DB supports different types of [vector indexes](./vector-search.md), and here we discuss two: IVF and HNSW.
155+
156+
### IVF
157+
IVF stands for Inverted File Index, is the default vector indexing algorithm, which works on all cluster tiers. It's an approximate nearest neighbors (ANN) approach that uses clustering to speeding up the search for similar vectors in a dataset. To create an IVF index, use the following command:
158+
159+
```javascript
160+
db.command({
161+
'createIndexes': COLLECTION_NAME,
162+
'indexes': [
163+
{
164+
'name': 'vectorSearchIndex',
165+
'key': {
166+
"contentVector": "cosmosSearch"
167+
},
168+
'cosmosSearchOptions': {
169+
'kind': 'vector-ivf',
170+
'numLists': 1,
171+
'similarity': 'COS',
172+
'dimensions': 1536
173+
}
174+
}
175+
]
176+
});
177+
```
178+
179+
> [!IMPORTANT]
180+
> **You can only create one index per vector property.** That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index.
181+
182+
### HNSW
183+
184+
HNSW stands for Hierarchical Navigable Small World, a graph-based data structure that partitions vectors into clusters and subclusters. With HNSW, you can perform fast approximate nearest neighbor search at higher speeds with greater accuracy. HNSW is an approximate (ANN) method. Here's how to set it up:
185+
186+
```javascript
187+
db.command(
188+
{
189+
"createIndexes": "ExampleCollection",
190+
"indexes": [
191+
{
192+
"name": "VectorSearchIndex",
193+
"key": {
194+
"contentVector": "cosmosSearch"
195+
},
196+
"cosmosSearchOptions": {
197+
"kind": "vector-hnsw",
198+
"m": 16, # default value
199+
"efConstruction": 64, # default value
200+
"similarity": "COS",
201+
"dimensions": 1536
202+
}
203+
}
204+
]
205+
}
206+
)
207+
```
208+
> [!NOTE]
209+
> HNSW indexing is only available on M40 cluster tiers and higher.
210+
211+
## 5. Insert data to the collection
212+
Now insert the inventory data, which includes descriptions and their corresponding vector embeddings, into the newly created collection. To insert data into our collection, we use the `insert_many()` method provided by the `pymongo` library. The method allows us to insert multiple documents into the collection at once. Our data is stored in a JSON file, which we'll load and then insert into the database.
213+
214+
Download the [shoes_with_vectors.json](https://github.com/jayanta-mondal/ignite-demo/blob/main/data/shoes_with_vectors.json) file from the GitHub repository and store it in a `data` directory within your project folder.
215+
216+
```python
217+
data_file = open(file="./data/shoes_with_vectors.json", mode="r")
218+
data = json.load(data_file)
219+
data_file.close()
220+
221+
result = collection.insert_many(data)
222+
223+
print(f"Number of data points added: {len(result.inserted_ids)}")
224+
```
225+
226+
## 6. Vector Search in Cosmos DB for MongoDB vCore
227+
With our data successfully uploaded, we can now apply the power of vector search to find the most relevant items based on a query. The vector index we created earlier enables us to perform semantic searches within our dataset.
228+
229+
### 6.1 Conducting a Vector Search
230+
To perform a vector search, we define a function `vector_search` that takes a query and the number of results to return. The function generates a vector for the query using the `generate_embeddings` function we defined earlier, then uses Cosmos DB's `$search` functionality to find the closest matching items based on their vector embeddings.
231+
232+
```python
233+
# Function to assist with vector search
234+
def vector_search(query, num_results=3):
235+
236+
query_vector = generate_embeddings(query)
237+
238+
embeddings_list = []
239+
pipeline = [
240+
{
241+
'$search': {
242+
"cosmosSearch": {
243+
"vector": query_vector,
244+
"numLists": 1,
245+
"path": "contentVector",
246+
"k": num_results
247+
},
248+
"returnStoredSource": True }},
249+
{'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } }
250+
]
251+
results = collection.aggregate(pipeline)
252+
return results
253+
```
254+
## 6.2 Perform vector search query
255+
Finally, we execute our vector search function with a specific query and process the results to display them:
256+
257+
```python
258+
query = "Shoes for Seattle sweater weather"
259+
results = vector_search(query, 3)
260+
261+
print("\nResults:\n")
262+
for result in results:
263+
print(f"Similarity Score: {result['similarityScore']}")
264+
print(f"Title: {result['document']['name']}")
265+
print(f"Price: {result['document']['price']}")
266+
print(f"Material: {result['document']['material']}")
267+
print(f"Image: {result['document']['img_url']}")
268+
print(f"Purchase: {result['document']['purchase_url']}\n")
269+
```
270+
271+
## 7. Generating Ad content with GPT-4 and DALL.E
272+
273+
We combine all developed components to craft compelling ads, employing OpenAI's GPT-4 for text and DALL·E 3 for images. Together with vector search results, they form a complete ad. We also introduce Heelie, our intelligent assistant, tasked with creating engaging ad taglines. Through the upcoming code, you see Heelie in action, enhancing our ad creation process.
274+
275+
```python
276+
from openai import OpenAI
277+
278+
def generate_ad_title(ad_topic):
279+
system_prompt = '''
280+
You are Heelie, an intelligent assistant for generating witty and cativating tagline for online advertisement.
281+
- The ad campaign taglines that you generate are short and typically under 100 characters.
282+
'''
283+
284+
user_prompt = f'''Generate a catchy, witty, and short sentence (less than 100 characters)
285+
for an advertisement for selling shoes for {ad_topic}'''
286+
messages=[
287+
{"role": "system", "content": system_prompt},
288+
{"role": "user", "content": user_prompt},
289+
]
290+
291+
response = client.chat.completions.create(
292+
model="gpt-4",
293+
messages=messages
294+
)
295+
296+
return response.choices[0].message.content
297+
298+
def generate_ad_image(ad_topic):
299+
daliClient = OpenAI(
300+
api_key="<DALI_API_KEY>"
301+
)
302+
303+
image_prompt = f'''
304+
Generate a photorealistic image of an ad campaign for selling {ad_topic}.
305+
The image should be clean, with the item being sold in the foreground with an easily identifiable landmark of the city in the background.
306+
The image should also try to depict the weather of the location for the time of the year mentioned.
307+
The image should not have any generated text overlay.
308+
'''
309+
310+
response = daliClient.images.generate(
311+
model="dall-e-3",
312+
prompt= image_prompt,
313+
size="1024x1024",
314+
quality="standard",
315+
n=1,
316+
)
317+
318+
return response.data[0].url
319+
320+
def render_html_page(ad_topic):
321+
322+
# Find the matching shoes from the inventory
323+
results = vector_search(ad_topic, 4)
324+
325+
ad_header = generate_ad_title(ad_topic)
326+
ad_image_url = generate_ad_image(ad_topic)
327+
328+
329+
with open('./data/ad-start.html', 'r', encoding='utf-8') as html_file:
330+
html_content = html_file.read()
331+
332+
html_content += f'''<header>
333+
<h1>{ad_header}</h1>
334+
</header>'''
335+
336+
html_content += f'''
337+
<section class="ad">
338+
<img src="{ad_image_url}" alt="Base Ad Image" class="ad-image">
339+
</section>'''
340+
341+
for result in results:
342+
html_content += f'''
343+
<section class="product">
344+
<img src="{result['document']['img_url']}" alt="{result['document']['name']}" class="product-image">
345+
<div class="product-details">
346+
<h3 class="product-title" color="gray">{result['document']['name']}</h2>
347+
<p class="product-price">{"$"+str(result['document']['price'])}</p>
348+
<p class="product-description">{result['document']['description']}</p>
349+
<a href="{result['document']['purchase_url']}" class="buy-now-button">Buy Now</a>
350+
</div>
351+
</section>
352+
'''
353+
354+
html_content += '''</article>
355+
</body>
356+
</html>'''
357+
358+
return html_content
359+
```
360+
361+
## 8. Putting it all together
362+
To make our advertisement generation interactive, we employ Gradio, a Python library for creating simple web UIs. We define a UI that allows users to input ad topics and then dynamically generates and displays the resulting advertisement.
363+
364+
```python
365+
import gradio as gr
366+
367+
css = """
368+
button { background-color: purple; color: red; }
369+
<style>
370+
</style>
371+
"""
372+
373+
with gr.Blocks(css=css, theme=gr.themes.Default(spacing_size=gr.themes.sizes.spacing_sm, radius_size="none")) as demo:
374+
subject = gr.Textbox(placeholder="Ad Keywords", label="Prompt for Heelie!!")
375+
btn = gr.Button("Generate Ad")
376+
output_html = gr.HTML(label="Generated Ad HTML")
377+
378+
btn.click(render_html_page, [subject], output_html)
379+
380+
btn = gr.Button("Copy HTML")
381+
382+
if __name__ == "__main__":
383+
demo.launch()
384+
```
385+
386+
## Output
387+
![Output screen](./media/tutorial-adgen/result.png)
388+
389+
## Next step
390+
391+
> [!div class="nextstepaction"]
392+
> [Check out our Solution Accelerator for more samples](../../solutions.md)

0 commit comments

Comments
 (0)