|
| 1 | +--- |
| 2 | +title: AI-Enhanced Advertisement Generation using Azure Cosmos DB for MongoDB vCore |
| 3 | +titleSuffix: Azure Cosmos DB |
| 4 | +description: Demonstrates the use of Azure Cosmos DB for MongoDB vCore's vector similarity search and OpenAI embeddings to generate advertising content. |
| 5 | +author: khelanmodi |
| 6 | +ms.author: khelanmodi |
| 7 | +ms.reviewer: gahllevy |
| 8 | +ms.service: cosmos-db |
| 9 | +ms.subservice: mongodb-vcore |
| 10 | +ms.topic: demonstration |
| 11 | +ms.date: 03/12/2024 |
| 12 | +--- |
| 13 | + |
| 14 | +# AI-Enhanced Advertisement Generation using Azure Cosmos DB for MongoDB vCore |
| 15 | +In this guide, we demonstrate how to create dynamic advertising content that resonates with your audience, using our personalized AI assistant, Heelie. Utilizing Azure Cosmos DB for MongoDB vCore, we harness the [vector similarity search](./vector-search.md) functionality to semantically analyze and match inventory descriptions with advertisement topics. The process is made possible by generating vectors for inventory descriptions using OpenAI embeddings, which significantly enhance their semantic depth. These vectors are then stored and indexed within the Cosmos DB for MongoDB vCore resource. When generating content for advertisements, we vectorize the advertisement topic to find the best-matching inventory items. This is followed by a retrieval augmented generation (RAG) process, where the top matches are sent to OpenAI to craft a compelling advertisement. The entire codebase for the application is available in a [GitHub repository](https://aka.ms/adgen) for your reference. |
| 16 | + |
| 17 | +## Features |
| 18 | +- **Vector Similarity Search**: Uses Azure Cosmos DB for MongoDB vCore's powerful vector similarity search to improve semantic search capabilities, making it easier to find relevant inventory items based on the content of advertisements. |
| 19 | +- **OpenAI Embeddings**: Utilizes the cutting-edge embeddings from OpenAI to generate vectors for inventory descriptions. This approach allows for more nuanced and semantically rich matches between the inventory and the advertisement content. |
| 20 | +- **Content Generation**: Employs OpenAI's advanced language models to generate engaging, trend-focused advertisements. This method ensures that the content is not only relevant but also captivating to the target audience. |
| 21 | + |
| 22 | +<!-- > [!VIDEO https://www.youtube.com/live/MLY5Pc_tSXw?si=fQmAuQcZkVauhmu-&t=1078] --> |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | +- Azure OpenAI: Let's setup the Azure OpenAI resource. Access to this service is currently available by application only. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access. Once you have access, complete the following steps: |
| 26 | + - Create an Azure OpenAI resource following this [quickstart](../../../ai-services/openai/how-to/create-resource.md?pivots=web-portal). |
| 27 | + - Deploy a `completions` and an `embeddings` model |
| 28 | + - For more information on `completions`, go [here](../../../ai-services/openai/how-to/completions.md). |
| 29 | + - For more information on `embeddings`, go [here](../../../ai-services/openai/how-to/embeddings.md). |
| 30 | + - Note down your endpoint, key, and deployment names. |
| 31 | + |
| 32 | +- Cosmos DB for MongoDB vCore resource: Let's start by creating an Azure Cosmos DB for MongoDB vCore resource for free following this [quick start](./quickstart-portal.md) guide. |
| 33 | + - Note down the connection details (connection string). |
| 34 | + |
| 35 | +- Python environment (>= 3.9 version) with packages such as `numpy`, `openai`, `pymongo`, `python-dotenv`, `azure-core`, `azure-cosmos`, `tenacity`, and `gradio`. |
| 36 | + |
| 37 | +- Download the [data file](https://github.com/jayanta-mondal/ignite-demo/blob/main/data/shoes_with_vectors.json) and save it in a designated data folder. |
| 38 | + |
| 39 | +## Running the Script |
| 40 | +Before we dive into the exciting part of generating AI-enhanced advertisements, we need to set up our environment. This setup involves installing the necessary packages to ensure our script runs smoothly. Here’s a step-by-step guide to get everything ready. |
| 41 | + |
| 42 | +### 1.1 Install Necessary Packages |
| 43 | + |
| 44 | +Firstly, we need to install a few Python packages. Open your terminal and run the following commands: |
| 45 | + |
| 46 | +```bash |
| 47 | + pip install numpy |
| 48 | + pip install openai==1.2.3 |
| 49 | + pip install pymongo |
| 50 | + pip install python-dotenv |
| 51 | + pip install azure-core |
| 52 | + pip install azure-cosmos |
| 53 | + pip install tenacity |
| 54 | + pip install gradio |
| 55 | + pip show openai |
| 56 | +``` |
| 57 | + |
| 58 | +### 1.2 Setting Up the OpenAI and Azure Client |
| 59 | +After installing the necessary packages, the next step involves setting up our OpenAI and Azure clients for the script, which is crucial for authenticating our requests to the OpenAI API and Azure services. |
| 60 | + |
| 61 | +```python |
| 62 | +import json |
| 63 | +import time |
| 64 | +import openai |
| 65 | + |
| 66 | +from dotenv import dotenv_values |
| 67 | +from openai import AzureOpenAI |
| 68 | + |
| 69 | +# Configure the API to use Azure as the provider |
| 70 | +openai.api_type = "azure" |
| 71 | +openai.api_key = "<AZURE_OPENAI_API_KEY>" # Replace with your actual Azure OpenAI API key |
| 72 | +openai.api_base = "https://<OPENAI_ACCOUNT_NAME>.openai.azure.com/" # Replace with your OpenAI account name |
| 73 | +openai.api_version = "2023-06-01-preview" |
| 74 | + |
| 75 | +# Initialize the AzureOpenAI client with your API key, version, and endpoint |
| 76 | +client = AzureOpenAI( |
| 77 | + api_key=openai.api_key, |
| 78 | + api_version=openai.api_version, |
| 79 | + azure_endpoint=openai.api_base |
| 80 | +) |
| 81 | +``` |
| 82 | + |
| 83 | +## Solution architecture |
| 84 | + |
| 85 | + |
| 86 | +## 2. Creating Embeddings and Setting up Cosmos DB |
| 87 | + |
| 88 | +After setting up our environment and OpenAI client, we move to the core part of our AI-enhanced advertisement generation project. The following code creates vector embeddings from text descriptions of products and sets up our database in Azure Cosmos DB for MongoDB vCore to store and search these embeddings. |
| 89 | + |
| 90 | +### 2.1 Create Embeddings |
| 91 | + |
| 92 | +To generate compelling advertisements, we first need to understand the items in our inventory. We do this by creating vector embeddings from descriptions of our items, which allows us to capture their semantic meaning in a form that machines can understand and process. Here's how you can create vector embeddings for an item description using Azure OpenAI: |
| 93 | + |
| 94 | +```python |
| 95 | +import openai |
| 96 | + |
| 97 | +def generate_embeddings(text): |
| 98 | + try: |
| 99 | + response = client.embeddings.create( |
| 100 | + input=text, model="text-embedding-ada-002") |
| 101 | + embeddings = response.data[0].embedding |
| 102 | + return embeddings |
| 103 | + except Exception as e: |
| 104 | + print(f"An error occurred: {e}") |
| 105 | + return None |
| 106 | + |
| 107 | +embeddings = generate_embeddings("Shoes for San Francisco summer") |
| 108 | + |
| 109 | +if embeddings is not None: |
| 110 | + print(embeddings) |
| 111 | +``` |
| 112 | + |
| 113 | +The function takes a text input — like a product description — and uses the `client.embeddings.create` method from the OpenAI API to generate a vector embedding for that text. We're using the `text-embedding-ada-002` model here, but you can choose other models based on your requirements. If the process is successful, it prints the generated embeddings; otherwise, it handles exceptions by printing an error message. |
| 114 | + |
| 115 | +## 3. Connect and set up Cosmos DB for MongoDB vCore |
| 116 | +With our embeddings ready, the next step is to store and index them in a database that supports vector similarity search. Azure Cosmos DB for MongoDB vCore is a perfect fit for this task because it's purpose built to store your transactional data and perform vector search all in one place. |
| 117 | + |
| 118 | +### 3.1 Set up the connection |
| 119 | +To connect to Cosmos DB, we use the pymongo library, which allows us to interact with MongoDB easily. The following code snippet establishes a connection with our Cosmos DB for MongoDB vCore instance: |
| 120 | +```python |
| 121 | +import pymongo |
| 122 | + |
| 123 | +# Replace <USERNAME>, <PASSWORD>, and <VCORE_CLUSTER_NAME> with your actual credentials and cluster name |
| 124 | +mongo_conn = "mongodb+srv://<USERNAME>:<PASSWORD>@<VCORE_CLUSTER_NAME>.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000" |
| 125 | +mongo_client = pymongo.MongoClient(mongo_conn) |
| 126 | +``` |
| 127 | + |
| 128 | +Replace `<USERNAME>`, `<PASSWORD>`, and `<VCORE_CLUSTER_NAME>` with your actual MongoDB username, password, and vCore cluster name, respectively. |
| 129 | + |
| 130 | +## 4. Setting Up the Database and Vector Index in Cosmos DB |
| 131 | + |
| 132 | +Once you've established a connection to Azure Cosmos DB, the next steps involve setting up your database and collection, and then creating a vector index to enable efficient vector similarity searches. Let's walk through these steps. |
| 133 | + |
| 134 | +### 4.1 Set Up the Database and Collection |
| 135 | + |
| 136 | +First, we create a database and a collection within our Cosmos DB instance. Here’s how: |
| 137 | +```python |
| 138 | +DATABASE_NAME = "AdgenDatabase" |
| 139 | +COLLECTION_NAME = "AdgenCollection" |
| 140 | + |
| 141 | +mongo_client.drop_database(DATABASE_NAME) |
| 142 | +db = mongo_client[DATABASE_NAME] |
| 143 | +collection = db[COLLECTION_NAME] |
| 144 | + |
| 145 | +if COLLECTION_NAME not in db.list_collection_names(): |
| 146 | + # Creates a unsharded collection that uses the DBs shared throughput |
| 147 | + db.create_collection(COLLECTION_NAME) |
| 148 | + print("Created collection '{}'.\n".format(COLLECTION_NAME)) |
| 149 | +else: |
| 150 | + print("Using collection: '{}'.\n".format(COLLECTION_NAME)) |
| 151 | +``` |
| 152 | + |
| 153 | +### 4.2 Create the vector index |
| 154 | +To perform efficient vector similarity searches within our collection, we need to create a vector index. Cosmos DB supports different types of [vector indexes](./vector-search.md), and here we discuss two: IVF and HNSW. |
| 155 | + |
| 156 | +### IVF |
| 157 | +IVF stands for Inverted File Index, is the default vector indexing algorithm, which works on all cluster tiers. It's an approximate nearest neighbors (ANN) approach that uses clustering to speeding up the search for similar vectors in a dataset. To create an IVF index, use the following command: |
| 158 | + |
| 159 | +```javascript |
| 160 | +db.command({ |
| 161 | + 'createIndexes': COLLECTION_NAME, |
| 162 | + 'indexes': [ |
| 163 | + { |
| 164 | + 'name': 'vectorSearchIndex', |
| 165 | + 'key': { |
| 166 | + "contentVector": "cosmosSearch" |
| 167 | + }, |
| 168 | + 'cosmosSearchOptions': { |
| 169 | + 'kind': 'vector-ivf', |
| 170 | + 'numLists': 1, |
| 171 | + 'similarity': 'COS', |
| 172 | + 'dimensions': 1536 |
| 173 | + } |
| 174 | + } |
| 175 | + ] |
| 176 | +}); |
| 177 | +``` |
| 178 | + |
| 179 | +> [!IMPORTANT] |
| 180 | +> **You can only create one index per vector property.** That is, you cannot create more than one index that points to the same vector property. If you want to change the index type (e.g., from IVF to HNSW) you must drop the index first before creating a new index. |
| 181 | +
|
| 182 | +### HNSW |
| 183 | + |
| 184 | +HNSW stands for Hierarchical Navigable Small World, a graph-based data structure that partitions vectors into clusters and subclusters. With HNSW, you can perform fast approximate nearest neighbor search at higher speeds with greater accuracy. HNSW is an approximate (ANN) method. Here's how to set it up: |
| 185 | + |
| 186 | +```javascript |
| 187 | +db.command( |
| 188 | +{ |
| 189 | + "createIndexes": "ExampleCollection", |
| 190 | + "indexes": [ |
| 191 | + { |
| 192 | + "name": "VectorSearchIndex", |
| 193 | + "key": { |
| 194 | + "contentVector": "cosmosSearch" |
| 195 | + }, |
| 196 | + "cosmosSearchOptions": { |
| 197 | + "kind": "vector-hnsw", |
| 198 | + "m": 16, # default value |
| 199 | + "efConstruction": 64, # default value |
| 200 | + "similarity": "COS", |
| 201 | + "dimensions": 1536 |
| 202 | + } |
| 203 | + } |
| 204 | + ] |
| 205 | +} |
| 206 | +) |
| 207 | +``` |
| 208 | +> [!NOTE] |
| 209 | +> HNSW indexing is only available on M40 cluster tiers and higher. |
| 210 | +
|
| 211 | +## 5. Insert data to the collection |
| 212 | +Now insert the inventory data, which includes descriptions and their corresponding vector embeddings, into the newly created collection. To insert data into our collection, we use the `insert_many()` method provided by the `pymongo` library. The method allows us to insert multiple documents into the collection at once. Our data is stored in a JSON file, which we'll load and then insert into the database. |
| 213 | + |
| 214 | +Download the [shoes_with_vectors.json](https://github.com/jayanta-mondal/ignite-demo/blob/main/data/shoes_with_vectors.json) file from the GitHub repository and store it in a `data` directory within your project folder. |
| 215 | + |
| 216 | +```python |
| 217 | +data_file = open(file="./data/shoes_with_vectors.json", mode="r") |
| 218 | +data = json.load(data_file) |
| 219 | +data_file.close() |
| 220 | + |
| 221 | +result = collection.insert_many(data) |
| 222 | + |
| 223 | +print(f"Number of data points added: {len(result.inserted_ids)}") |
| 224 | +``` |
| 225 | + |
| 226 | +## 6. Vector Search in Cosmos DB for MongoDB vCore |
| 227 | +With our data successfully uploaded, we can now apply the power of vector search to find the most relevant items based on a query. The vector index we created earlier enables us to perform semantic searches within our dataset. |
| 228 | + |
| 229 | +### 6.1 Conducting a Vector Search |
| 230 | +To perform a vector search, we define a function `vector_search` that takes a query and the number of results to return. The function generates a vector for the query using the `generate_embeddings` function we defined earlier, then uses Cosmos DB's `$search` functionality to find the closest matching items based on their vector embeddings. |
| 231 | + |
| 232 | +```python |
| 233 | +# Function to assist with vector search |
| 234 | +def vector_search(query, num_results=3): |
| 235 | + |
| 236 | + query_vector = generate_embeddings(query) |
| 237 | + |
| 238 | + embeddings_list = [] |
| 239 | + pipeline = [ |
| 240 | + { |
| 241 | + '$search': { |
| 242 | + "cosmosSearch": { |
| 243 | + "vector": query_vector, |
| 244 | + "numLists": 1, |
| 245 | + "path": "contentVector", |
| 246 | + "k": num_results |
| 247 | + }, |
| 248 | + "returnStoredSource": True }}, |
| 249 | + {'$project': { 'similarityScore': { '$meta': 'searchScore' }, 'document' : '$$ROOT' } } |
| 250 | + ] |
| 251 | + results = collection.aggregate(pipeline) |
| 252 | + return results |
| 253 | +``` |
| 254 | +## 6.2 Perform vector search query |
| 255 | +Finally, we execute our vector search function with a specific query and process the results to display them: |
| 256 | + |
| 257 | +```python |
| 258 | +query = "Shoes for Seattle sweater weather" |
| 259 | +results = vector_search(query, 3) |
| 260 | + |
| 261 | +print("\nResults:\n") |
| 262 | +for result in results: |
| 263 | + print(f"Similarity Score: {result['similarityScore']}") |
| 264 | + print(f"Title: {result['document']['name']}") |
| 265 | + print(f"Price: {result['document']['price']}") |
| 266 | + print(f"Material: {result['document']['material']}") |
| 267 | + print(f"Image: {result['document']['img_url']}") |
| 268 | + print(f"Purchase: {result['document']['purchase_url']}\n") |
| 269 | +``` |
| 270 | + |
| 271 | +## 7. Generating Ad content with GPT-4 and DALL.E |
| 272 | + |
| 273 | +We combine all developed components to craft compelling ads, employing OpenAI's GPT-4 for text and DALL·E 3 for images. Together with vector search results, they form a complete ad. We also introduce Heelie, our intelligent assistant, tasked with creating engaging ad taglines. Through the upcoming code, you see Heelie in action, enhancing our ad creation process. |
| 274 | + |
| 275 | +```python |
| 276 | +from openai import OpenAI |
| 277 | + |
| 278 | +def generate_ad_title(ad_topic): |
| 279 | + system_prompt = ''' |
| 280 | + You are Heelie, an intelligent assistant for generating witty and cativating tagline for online advertisement. |
| 281 | + - The ad campaign taglines that you generate are short and typically under 100 characters. |
| 282 | + ''' |
| 283 | + |
| 284 | + user_prompt = f'''Generate a catchy, witty, and short sentence (less than 100 characters) |
| 285 | + for an advertisement for selling shoes for {ad_topic}''' |
| 286 | + messages=[ |
| 287 | + {"role": "system", "content": system_prompt}, |
| 288 | + {"role": "user", "content": user_prompt}, |
| 289 | + ] |
| 290 | + |
| 291 | + response = client.chat.completions.create( |
| 292 | + model="gpt-4", |
| 293 | + messages=messages |
| 294 | + ) |
| 295 | + |
| 296 | + return response.choices[0].message.content |
| 297 | + |
| 298 | +def generate_ad_image(ad_topic): |
| 299 | + daliClient = OpenAI( |
| 300 | + api_key="<DALI_API_KEY>" |
| 301 | + ) |
| 302 | + |
| 303 | + image_prompt = f''' |
| 304 | + Generate a photorealistic image of an ad campaign for selling {ad_topic}. |
| 305 | + The image should be clean, with the item being sold in the foreground with an easily identifiable landmark of the city in the background. |
| 306 | + The image should also try to depict the weather of the location for the time of the year mentioned. |
| 307 | + The image should not have any generated text overlay. |
| 308 | + ''' |
| 309 | + |
| 310 | + response = daliClient.images.generate( |
| 311 | + model="dall-e-3", |
| 312 | + prompt= image_prompt, |
| 313 | + size="1024x1024", |
| 314 | + quality="standard", |
| 315 | + n=1, |
| 316 | + ) |
| 317 | + |
| 318 | + return response.data[0].url |
| 319 | + |
| 320 | +def render_html_page(ad_topic): |
| 321 | + |
| 322 | + # Find the matching shoes from the inventory |
| 323 | + results = vector_search(ad_topic, 4) |
| 324 | + |
| 325 | + ad_header = generate_ad_title(ad_topic) |
| 326 | + ad_image_url = generate_ad_image(ad_topic) |
| 327 | + |
| 328 | + |
| 329 | + with open('./data/ad-start.html', 'r', encoding='utf-8') as html_file: |
| 330 | + html_content = html_file.read() |
| 331 | + |
| 332 | + html_content += f'''<header> |
| 333 | + <h1>{ad_header}</h1> |
| 334 | + </header>''' |
| 335 | + |
| 336 | + html_content += f''' |
| 337 | + <section class="ad"> |
| 338 | + <img src="{ad_image_url}" alt="Base Ad Image" class="ad-image"> |
| 339 | + </section>''' |
| 340 | + |
| 341 | + for result in results: |
| 342 | + html_content += f''' |
| 343 | + <section class="product"> |
| 344 | + <img src="{result['document']['img_url']}" alt="{result['document']['name']}" class="product-image"> |
| 345 | + <div class="product-details"> |
| 346 | + <h3 class="product-title" color="gray">{result['document']['name']}</h2> |
| 347 | + <p class="product-price">{"$"+str(result['document']['price'])}</p> |
| 348 | + <p class="product-description">{result['document']['description']}</p> |
| 349 | + <a href="{result['document']['purchase_url']}" class="buy-now-button">Buy Now</a> |
| 350 | + </div> |
| 351 | + </section> |
| 352 | + ''' |
| 353 | + |
| 354 | + html_content += '''</article> |
| 355 | + </body> |
| 356 | + </html>''' |
| 357 | + |
| 358 | + return html_content |
| 359 | +``` |
| 360 | + |
| 361 | +## 8. Putting it all together |
| 362 | +To make our advertisement generation interactive, we employ Gradio, a Python library for creating simple web UIs. We define a UI that allows users to input ad topics and then dynamically generates and displays the resulting advertisement. |
| 363 | + |
| 364 | +```python |
| 365 | +import gradio as gr |
| 366 | + |
| 367 | +css = """ |
| 368 | + button { background-color: purple; color: red; } |
| 369 | + <style> |
| 370 | + </style> |
| 371 | +""" |
| 372 | + |
| 373 | +with gr.Blocks(css=css, theme=gr.themes.Default(spacing_size=gr.themes.sizes.spacing_sm, radius_size="none")) as demo: |
| 374 | + subject = gr.Textbox(placeholder="Ad Keywords", label="Prompt for Heelie!!") |
| 375 | + btn = gr.Button("Generate Ad") |
| 376 | + output_html = gr.HTML(label="Generated Ad HTML") |
| 377 | + |
| 378 | + btn.click(render_html_page, [subject], output_html) |
| 379 | + |
| 380 | + btn = gr.Button("Copy HTML") |
| 381 | + |
| 382 | +if __name__ == "__main__": |
| 383 | + demo.launch() |
| 384 | +``` |
| 385 | + |
| 386 | +## Output |
| 387 | + |
| 388 | + |
| 389 | +## Next step |
| 390 | + |
| 391 | +> [!div class="nextstepaction"] |
| 392 | +> [Check out our Solution Accelerator for more samples](../../solutions.md) |
0 commit comments