Skip to content

Commit 4d0cae5

Browse files
authored
Add version two updates (#1)
1 parent 19fe1f8 commit 4d0cae5

File tree

17 files changed

+689
-1025
lines changed

17 files changed

+689
-1025
lines changed

.github/docs/images/image-01.png

656 KB
Loading

README.md

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ To run this project, you need to configure the following environment variables.
5959
- `AZURE_SUBSCRIPTION_ID`: The Azure subscription ID to use for the deployment. For example, `00000000-0000-0000-0000-000000000000`.
6060
- `AZURE_RESOURCE_GROUP_NAME`: The name of the resource group to use for the deployment. For example, `my-resource-group`.
6161
- `AZURE_OPENAI_API_BASE`: The base URL for the Azure OpenAI API. For example, `https://my-resource.openai.azure.com/`.
62-
- `AZURE_OPENAI_API_VERSION`: The version of the Azure OpenAI API. You must set this to `2023-09-01-preview`.
62+
- `AZURE_OPENAI_API_VERSION`: The version of the Azure OpenAI API. You must set this to `2023-12-01-preview`.
6363
- `AZURE_OPENAI_API_TYPE`: The type of the Azure OpenAI API. You must set this to `azure`.
6464
- `AZURE_OPENAI_CHAT_DEPLOYMENT`: The name of the Azure OpenAI deployment to use for chat. For example, `gpt-35-turbo-16k-0613`.
6565
- `AZURE_OPENAI_CHAT_MODEL`: The name of the Azure OpenAI model to use for chat. For example, `gpt-35-turbo-16k`.
@@ -80,15 +80,20 @@ To run this project, you need to configure the following environment variables.
8080

8181
### Configure the Azure AI Search service
8282

83-
To run this project, you need to configure the Azure AI Search service. You can do this using the Azure portal or the Azure CLI. This will populate Azure AI Search with a data source, an index, an indexer, and a skillset.
83+
To run this project, you need to configure the Azure AI Search service. You can do this using the Azure portal or the Azure CLI. This will populate Azure AI Search with a data source, an index, an indexer, and a skillset.
8484

85-
All templates are provided in the `src/search/templates` folder and values for the variables, for example `{{ AZURE_OPENAI_API_BASE }}` are populated based on the environment variables.
85+
All templates are provided in the `src/search/templates/product-info` folder and values for the variables, for example `{{ AZURE_OPENAI_API_BASE }}` are populated based on the environment variables.
8686

87-
To create these artifacts to configure the Azure AI Search service, you can use the following command:
87+
To create these artifacts to configure the Azure AI Search service you can run the notebook `src/01-populate-index.ipynb`.
8888

89-
```bash
90-
python -m ./src/search/main.py --search_templates_dir ./src/search/templates/
91-
```
89+
### Query the Azure AI Search service
90+
91+
This notebook illistrates two appraoches to query the Azure AI Search service:
92+
93+
1. Using a custom client implementing the retreival-augmented generation (RAG) pattern.
94+
2. Using the Azure Open AI REST API.
95+
96+
To query the Azure AI Search service, you can run the notebook `src/02-query-index.ipynb`.
9297

9398
### Run streamlit app
9499

@@ -106,13 +111,13 @@ If you want to deploy this app to Azure, you can containerise it using the `Dock
106111

107112
## Resources
108113

109-
- [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/)
110-
- [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/)
114+
- [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/)
115+
- [Azure AI Search](https://learn.microsoft.com/azure/search/)
111116
- [Streamlit](https://streamlit.io/)
112-
- [Azure OpenAI Service REST API reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
113-
- [Securely use Azure OpenAI on your data](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/use-your-data-securely)
114-
- [Introduction to prompt engineering](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/prompt-engineering)
115-
- [Prompt engineering techniques](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering?pivots=programming-language-chat-completions)
117+
- [Azure OpenAI Service REST API reference](https://learn.microsoft.com/azure/ai-services/openai/reference)
118+
- [Securely use Azure OpenAI on your data](https://learn.microsoft.com/azure/ai-services/openai/how-to/use-your-data-securely)
119+
- [Introduction to prompt engineering](https://learn.microsoft.com/azure/ai-services/openai/concepts/prompt-engineering)
120+
- [Prompt engineering techniques](https://learn.microsoft.com/azure/ai-services/openai/concepts/advanced-prompt-engineering?pivots=programming-language-chat-completions)
116121

117122
## License
118123

environment/requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ ipykernel==6.29.2
55
Jinja2==3.1.3
66
requests==2.31.0
77
streamlit-extras==0.4.0
8-
azure-identity==1.15.0
8+
azure-identity==1.15.0
9+
nltk==3.8.1

notebooks/01-populate-index.ipynb

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Populate Azure AI Search Index"
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": 1,
13+
"metadata": {},
14+
"outputs": [],
15+
"source": [
16+
"import os\n",
17+
"import dotenv\n",
18+
"import sys\n",
19+
"\n",
20+
"dotenv.load_dotenv(\".env\")\n",
21+
"sys.path.append(os.path.join(os.getcwd(), \"..\", \"src\"))"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"### Approach 1: Pull-based\n",
29+
"\n",
30+
"The pull model uses indexers connecting to a supported data source, automatically uploading the data into your index. This is the recommended approach for data sources that are frequently updated."
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": 2,
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"from search.utilities import SearchClient\n",
40+
"\n",
41+
"# Create search client\n",
42+
"search_client = SearchClient(\n",
43+
" search_endpoint=os.environ[\"AZURE_AI_SEARCH_ENDPOINT\"],\n",
44+
")"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": 3,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"# Generate list of variables to be used in templates\n",
54+
"template_variables = {\n",
55+
" key: value for key, value in os.environ.items() if key.startswith((\"AZURE\"))\n",
56+
"}\n",
57+
"\n",
58+
"# Define template paths\n",
59+
"base_path = os.path.join(os.getcwd(), \"..\", \"src\", \"search\", \"templates\")\n",
60+
"datasource_template_path = os.path.join(base_path, \"product-info\", \"datasource.json\")\n",
61+
"index_template_path = os.path.join(base_path, \"product-info\", \"index.json\")\n",
62+
"skillset_template_path = os.path.join(base_path, \"product-info\", \"skillset.json\")\n",
63+
"indexer_template_path = os.path.join(base_path, \"product-info\", \"indexer.json\")\n",
64+
"\n",
65+
"# List of search assets\n",
66+
"assets = [\n",
67+
" {\n",
68+
" \"type\": \"indexes\",\n",
69+
" \"name\": os.environ[\"AZURE_AI_SEARCH_INDEX_NAME\"],\n",
70+
" \"template_path\": index_template_path,\n",
71+
" \"template_variables\": template_variables,\n",
72+
" },\n",
73+
" {\n",
74+
" \"type\": \"datasources\",\n",
75+
" \"name\": os.environ[\"AZURE_AI_SEARCH_DATASOURCE_NAME\"],\n",
76+
" \"template_path\": datasource_template_path,\n",
77+
" \"template_variables\": template_variables,\n",
78+
" },\n",
79+
" {\n",
80+
" \"type\": \"skillsets\",\n",
81+
" \"name\": os.environ[\"AZURE_AI_SEARCH_SKILLSET_NAME\"],\n",
82+
" \"template_path\": skillset_template_path,\n",
83+
" \"template_variables\": template_variables,\n",
84+
" },\n",
85+
" {\n",
86+
" \"type\": \"indexers\",\n",
87+
" \"name\": os.environ[\"AZURE_AI_SEARCH_INDEXER_NAME\"],\n",
88+
" \"template_path\": indexer_template_path,\n",
89+
" \"template_variables\": template_variables,\n",
90+
" },\n",
91+
"]\n",
92+
"\n",
93+
"# Load search asset templates\n",
94+
"search_client.load_search_management_asset_templates(assets)"
95+
]
96+
},
97+
{
98+
"cell_type": "code",
99+
"execution_count": 4,
100+
"metadata": {},
101+
"outputs": [],
102+
"source": [
103+
"# Create the index\n",
104+
"index_response = search_client.create_search_management_asset(asset_type=\"indexes\")\n",
105+
"\n",
106+
"# Create the data source\n",
107+
"datasource_response = search_client.create_search_management_asset(asset_type=\"datasources\")\n",
108+
"\n",
109+
"# Create skillset to enhance the indexer\n",
110+
"skillset_response = search_client.create_search_management_asset(asset_type=\"skillsets\")\n",
111+
"\n",
112+
"# Create the indexer\n",
113+
"indexer_response = search_client.create_search_management_asset(asset_type=\"indexers\")"
114+
]
115+
},
116+
{
117+
"cell_type": "code",
118+
"execution_count": 5,
119+
"metadata": {},
120+
"outputs": [],
121+
"source": [
122+
"# Run the indexer\n",
123+
"indexer_run_response = search_client.run_indexer()"
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": 9,
129+
"metadata": {},
130+
"outputs": [],
131+
"source": [
132+
"# Run the indexer with reset\n",
133+
"indexer_run_reset_response = search_client.run_indexer(reset_flag=True)"
134+
]
135+
}
136+
],
137+
"metadata": {
138+
"kernelspec": {
139+
"display_name": "base",
140+
"language": "python",
141+
"name": "python3"
142+
},
143+
"language_info": {
144+
"codemirror_mode": {
145+
"name": "ipython",
146+
"version": 3
147+
},
148+
"file_extension": ".py",
149+
"mimetype": "text/x-python",
150+
"name": "python",
151+
"nbconvert_exporter": "python",
152+
"pygments_lexer": "ipython3",
153+
"version": "3.11.4"
154+
}
155+
},
156+
"nbformat": 4,
157+
"nbformat_minor": 2
158+
}

notebooks/rag-orchestrator.ipynb renamed to notebooks/02-llm-queries.ipynb

Lines changed: 36 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"# Generate model response"
7+
"# LLM Queries with Knowledge Base Integration"
88
]
99
},
1010
{
@@ -25,7 +25,9 @@
2525
"cell_type": "markdown",
2626
"metadata": {},
2727
"source": [
28-
"### Custom RAG Queries"
28+
"### Approach 1: Custom Client\n",
29+
"\n",
30+
"This approach will use the `RetrievalAugmentedGenerationClient` class defined in `src/rag/utilities.py`. This will NOT require a Microsoft managed private endpoint for private access."
2931
]
3032
},
3133
{
@@ -34,15 +36,16 @@
3436
"metadata": {},
3537
"outputs": [],
3638
"source": [
37-
"from orchestration.utilities import OrchestrationClient\n",
39+
"from rag.utilities import RetrievalAugmentedGenerationClient\n",
3840
"\n",
3941
"# Create orchestration client\n",
40-
"orchestration_client = OrchestrationClient(\n",
42+
"rag_client = RetrievalAugmentedGenerationClient(\n",
4143
" open_ai_endpoint=os.getenv(\"AZURE_OPENAI_API_BASE\"),\n",
4244
" open_ai_chat_deployment=os.getenv(\"AZURE_OPENAI_CHAT_DEPLOYMENT\"),\n",
4345
" open_ai_embedding_deployment=os.getenv(\"AZURE_OPENAI_EMBEDDING_DEPLOYMENT\"),\n",
4446
" search_endpoint=os.getenv(\"AZURE_AI_SEARCH_ENDPOINT\"),\n",
4547
" search_index_name=os.getenv(\"AZURE_AI_SEARCH_INDEX_NAME\"),\n",
48+
" system_prompt_configuration_file=\"../src/rag/configuration.yaml\"\n",
4649
")"
4750
]
4851
},
@@ -52,15 +55,8 @@
5255
"metadata": {},
5356
"outputs": [],
5457
"source": [
55-
"# Generate chat response from initial user query\n",
56-
"chat_history = {\n",
57-
" \"messages\": [\n",
58-
" {\"role\": \"user\", \"content\": \"Which tent is the most waterproof?\"},\n",
59-
" ]\n",
60-
"}\n",
61-
"\n",
62-
"chat_history = orchestration_client.generate_chat_response(chat_history)\n",
63-
"print(chat_history[\"messages\"][-1][\"content\"])"
58+
"message_history = []\n",
59+
"message_history = rag_client.get_answer(\"Which tent is the most waterproof?\", message_history=message_history)"
6460
]
6561
},
6662
{
@@ -69,22 +65,38 @@
6965
"metadata": {},
7066
"outputs": [],
7167
"source": [
72-
"# Generate chat response from follow-up user query\n",
73-
"chat_history[\"messages\"].append(\n",
74-
" {\"role\": \"user\", \"content\": \"Tell me more about the Alpine Explorer Tent?\"}\n",
75-
")\n",
76-
"\n",
77-
"chat_history = orchestration_client.generate_chat_response(chat_history)\n",
78-
"print(chat_history[\"messages\"][-1][\"content\"])"
68+
"for message in message_history:\n",
69+
" content = message['content'].split(\"Sources:\")[0].strip()\n",
70+
" print(f\"{message['role'].title()}: {content}\\n\")"
71+
]
72+
},
73+
{
74+
"cell_type": "code",
75+
"execution_count": null,
76+
"metadata": {},
77+
"outputs": [],
78+
"source": [
79+
"message_history = rag_client.get_answer(\"Tell me more about the Alpine Explorer Tent?\", message_history=message_history)"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {},
86+
"outputs": [],
87+
"source": [
88+
"for message in message_history:\n",
89+
" content = message['content'].split(\"Sources:\")[0].strip()\n",
90+
" print(f\"{message['role'].title()}: {content}\\n\")"
7991
]
8092
},
8193
{
8294
"cell_type": "markdown",
8395
"metadata": {},
8496
"source": [
85-
"### Azure OpenAI Service REST API\n",
97+
"### Approach 2: Azure OpenAI Service REST API\n",
8698
"\n",
87-
"Note: this will require public access on Azure AI Search or a Microsft managed private endpoint for private access."
99+
"This will require public access on Azure AI Search or a Microsoft managed private endpoint for private access."
88100
]
89101
},
90102
{
@@ -134,48 +146,7 @@
134146
" json=request_payload,\n",
135147
")\n",
136148
"\n",
137-
"print(response.json())"
138-
]
139-
},
140-
{
141-
"cell_type": "code",
142-
"execution_count": null,
143-
"metadata": {},
144-
"outputs": [],
145-
"source": [
146-
"text = \"Based on the information provided, both the Alpine Explorer Tent and the TrailMaster X4 Tent are waterproof. The Alpine Explorer Tent has a rainfly with a waterproof rating of 3000mm [product_info_8.md], while the TrailMaster X4 Tent has a rainfly with a waterproof rating of 2000mm [product_info_1.md]. Therefore, both tents offer reliable protection against rain and moisture.\"\n",
147-
"\n",
148-
"text"
149-
]
150-
},
151-
{
152-
"cell_type": "code",
153-
"execution_count": null,
154-
"metadata": {},
155-
"outputs": [],
156-
"source": [
157-
"import re\n",
158-
"\n",
159-
"text = (\n",
160-
" \"This document refers to [product_info_1.md] and [another_file.md]. More text here.\"\n",
161-
")\n",
162-
"\n",
163-
"def replace_references(text: str) -> str:\n",
164-
" # Regex to match references in the format [*.md]\n",
165-
" regex = r\"\\[([^\\]]*.md)\\]\"\n",
166-
"\n",
167-
" # Replace matched references with modified references (appending \":blue\")\n",
168-
" modified_text = re.sub(regex, r\"*:blue[\\1]*\", text)\n",
169-
"\n",
170-
" return modified_text\n",
171-
"\n",
172-
"# Regex to match references in the format [*.md]\n",
173-
"regex = r\"\\[([^\\]]*.md)\\]\"\n",
174-
"\n",
175-
"# Replace matched references with modified references (appending \":blue\")\n",
176-
"modified_text = re.sub(regex, r\"*:blue[\\1]*\", text)\n",
177-
"\n",
178-
"print(modified_text)"
149+
"print(response.json()[\"choices\"][0][\"message\"][\"content\"])"
179150
]
180151
},
181152
{
@@ -202,7 +173,7 @@
202173
"name": "python",
203174
"nbconvert_exporter": "python",
204175
"pygments_lexer": "ipython3",
205-
"version": "3.11.7"
176+
"version": "3.11.4"
206177
}
207178
},
208179
"nbformat": 4,

0 commit comments

Comments
 (0)