Merge branch 'main' into flask-cmd

pamelafox · pamelafox · commit eab811b8ec51 · 2023-07-06T09:31:38.000-07:00
diff --git a/README.md b/README.md
@@ -27,14 +27,15 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
 ### Prerequisites
 
 #### To Run Locally
-- [Azure Developer CLI](https://aka.ms/azure-dev/install)
-- [Python 3+](https://www.python.org/downloads/)
-    - **Important**: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
-    - **Important**: Ensure you can run `python --version` from console. On Ubuntu, you might need to run `sudo apt install python-is-python3` to link `python` to `python3`.    
-- [Node.js](https://nodejs.org/en/download/)
-- [Git](https://git-scm.com/downloads)
-- [Powershell 7+ (pwsh)](https://github.com/powershell/powershell) - For Windows users only.
-   - **Important**: Ensure you can run `pwsh.exe` from a PowerShell command. If this fails, you likely need to upgrade PowerShell.
+
+* [Azure Developer CLI](https://aka.ms/azure-dev/install)
+* [Python 3+](https://www.python.org/downloads/)
+  * **Important**: Python and the pip package manager must be in the path in Windows for the setup scripts to work.
+  * **Important**: Ensure you can run `python --version` from console. On Ubuntu, you might need to run `sudo apt install python-is-python3` to link `python` to `python3`.
+* [Node.js](https://nodejs.org/en/download/)
+* [Git](https://git-scm.com/downloads)
+* [Powershell 7+ (pwsh)](https://github.com/powershell/powershell) - For Windows users only.
+  * **Important**: Ensure you can run `pwsh.exe` from a PowerShell command. If this fails, you likely need to upgrade PowerShell.
 
 >NOTE: Your Azure Account must have `Microsoft.Authorization/roleAssignments/write` permissions, such as [User Access Administrator](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#user-access-administrator) or [Owner](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles#owner).  
 
@@ -54,21 +55,21 @@ You can run this repo virtually by using GitHub Codespaces or VS Code Remote Con
 1. Run `azd init -t azure-search-openai-demo`
     * note that this command will initialize a git repository and you do not need to clone this repository
 
-#### Starting from scratch:
+#### Starting from scratch
 
 Execute the following command, if you don't have any pre-existing Azure services and want to start from a fresh deployment.
 
 1. Run `azd up` - This will provision Azure resources and deploy this sample to those resources, including building the search index based on the files found in the `./data` folder.
-    * For the target location, the regions that currently support the models used in this sample are **East US** or **South Central US**. For an up-to-date list of regions and models, check [here](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models)
+    * For the target location, the regions that currently support the models used in this sample are **East US**, **France Central**, **South Central US**, **UK South**, and **West Europe**. For an up-to-date list of regions and models, check [here](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/concepts/models)
 1. After the application has been successfully deployed you will see a URL printed to the console.  Click that URL to interact with the application in your browser.  
 
 It will look like the following:
 
 !['Output from running azd up'](assets/endpoint.png)
-    
+
 > NOTE: It may take a minute for the application to be fully deployed. If you see a "Python Developer" welcome screen, then wait a minute and refresh the page.
 
-#### Use existing resources:
+#### Use existing resources
 
 1. Run `azd env set AZURE_OPENAI_SERVICE {Name of existing OpenAI service}`
 1. Run `azd env set AZURE_OPENAI_RESOURCE_GROUP {Name of existing resource group that OpenAI service is provisioned to}`
@@ -78,10 +79,12 @@ It will look like the following:
 
 > NOTE: You can also use existing Search and Storage Accounts.  See `./infra/main.parameters.json` for list of environment variables to pass to `azd env set` to configure those existing resources.
 
-#### Deploying or re-deploying a local clone of the repo:
+#### Deploying or re-deploying a local clone of the repo
+
 * Simply run `azd up`
 
-#### Running locally:
+#### Running locally
+
 1. Run `azd login`
 2. Change dir to `app`
 3. Run `./start.ps1` or `./start.sh` or run the "VS Code Task: Start App" to start the project locally.
@@ -101,6 +104,7 @@ Run the following if you want to give someone else access to completely deployed
 * Running locally: navigate to 127.0.0.1:5000
 
 Once in the web app:
+
 * Try different topics in chat or Q&A context. For chat, try follow up questions, clarifications, ask to simplify or elaborate on answer, etc.
 * Explore citations and sources
 * Click on "settings" to try different options, tweak prompts, etc.
@@ -112,6 +116,7 @@ Once in the web app:
 * [Azure OpenAI Service](https://learn.microsoft.com/azure/cognitive-services/openai/overview)
 
 ### Note
+
 >Note: The PDF documents used in this demo contain information generated using a language model (Azure OpenAI Service). The information contained in these documents is only for demonstration purposes and does not reflect the opinions or beliefs of Microsoft. Microsoft makes no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the information contained in this document. All rights reserved to Microsoft.
 
 ### FAQ
@@ -122,6 +127,6 @@ Once in the web app:
 
 ### Troubleshooting
 
-If you see this error while running `azd deploy`: `read /tmp/azd1992237260/backend_env/lib64: is a directory`, then delete the `./app/backend/backend_env folder` and re-run the `azd deploy` command.  This issue is being tracked here: https://github.com/Azure/azure-dev/issues/1237
+If you see this error while running `azd deploy`: `read /tmp/azd1992237260/backend_env/lib64: is a directory`, then delete the `./app/backend/backend_env folder` and re-run the `azd deploy` command.  This issue is being tracked here: <https://github.com/Azure/azure-dev/issues/1237>
 
-If the web app fails to deploy and you receive a '404 Not Found' message in your browser, run 'azd deploy'. 
+If the web app fails to deploy and you receive a '404 Not Found' message in your browser, run `azd deploy`.
diff --git a/app/backend/app.py b/app/backend/app.py
@@ -77,7 +77,7 @@ def static_file(path):
 @app.route("/content/<path>")
 def content_file(path):
     blob = blob_container.get_blob_client(path).download_blob()
-    if not blob.properties or "content_settings" not in blob.properties:
+    if not blob.properties or not blob.properties.has_key("content_settings"):
         abort(404)
     mime_type = blob.properties["content_settings"]["content_type"]
     if mime_type == "application/octet-stream":
diff --git a/app/backend/approaches/chatreadretrieveread.py b/app/backend/approaches/chatreadretrieveread.py
@@ -6,15 +6,18 @@
 from approaches.approach import Approach
 from text import nonewlines
 
-# Simple retrieve-then-read implementation, using the Cognitive Search and OpenAI APIs directly. It first retrieves
-# top documents from search, then constructs a prompt with them, and then uses OpenAI to generate an completion 
-# (answer) with that prompt.
 class ChatReadRetrieveReadApproach(Approach):
+    """
+    Simple retrieve-then-read implementation, using the Cognitive Search and OpenAI APIs directly. It first retrieves
+    top documents from search, then constructs a prompt with them, and then uses OpenAI to generate an completion
+    (answer) with that prompt.
+    """
+
     prompt_prefix = """<|im_start|>system
 Assistant helps the company employees with their healthcare plan questions, and questions about the employee handbook. Be brief in your answers.
 Answer ONLY with the facts listed in the list of sources below. If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below. If asking a clarifying question to the user would help, ask the question.
 For tabular information return it as an html table. Do not return markdown format.
-Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brakets to reference the source, e.g. [info1.txt]. Don't combine sources, list each source separately, e.g. [info1.txt][info2.pdf].
+Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. Use square brackets to reference the source, e.g. [info1.txt]. Don't combine sources, list each source separately, e.g. [info1.txt][info2.pdf].
 {follow_up_questions_prompt}
 {injected_prompt}
 Sources:
@@ -112,5 +115,5 @@ def get_chat_history_as_text(self, history: Sequence[dict[str, str]], include_la
         for h in reversed(history if include_last_turn else history[:-1]):
             history_text = """<|im_start|>user""" + "\n" + h["user"] + "\n" + """<|im_end|>""" + "\n" + """<|im_start|>assistant""" + "\n" + (h.get("bot", "") + """<|im_end|>""" if h.get("bot") else "") + "\n" + history_text
             if len(history_text) > approx_max_tokens*4:
-                break
-        return history_text
+                break    
+        return history_text
diff --git a/app/backend/approaches/readretrieveread.py b/app/backend/approaches/readretrieveread.py
@@ -6,18 +6,22 @@
 from langchain.callbacks.manager import CallbackManager, Callbacks
 from langchain.chains import LLMChain
 from langchain.agents import Tool, ZeroShotAgent, AgentExecutor
-from langchain.llms.openai import AzureOpenAI
 from langchainadapters import HtmlCallbackHandler
 from text import nonewlines
 from lookuptool import CsvLookupTool
 from typing import Any
 
-# Attempt to answer questions by iteratively evaluating the question to see what information is missing, and once all information
-# is present then formulate an answer. Each iteration consists of two parts: first use GPT to see if we need more information, 
-# second if more data is needed use the requested "tool" to retrieve it. The last call to GPT answers the actual question.
-# This is inspired by the MKRL paper[1] and applied here using the implementation in Langchain.
-# [1] E. Karpas, et al. arXiv:2205.00445
 class ReadRetrieveReadApproach(Approach):
+    """
+    Attempt to answer questions by iteratively evaluating the question to see what information is missing, and once all information
+    is present then formulate an answer. Each iteration consists of two parts:
+     1. use GPT to see if we need more information
+     2. if more data is needed, use the requested "tool" to retrieve it.
+    The last call to GPT answers the actual question.
+    This is inspired by the MKRL paper[1] and applied here using the implementation in Langchain.
+
+    [1] E. Karpas, et al. arXiv:2205.00445
+    """
 
     template_prefix = \
 "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. " \
diff --git a/app/backend/approaches/retrievethenread.py b/app/backend/approaches/retrievethenread.py
@@ -5,10 +5,13 @@
 from text import nonewlines
 from typing import Any
 
-# Simple retrieve-then-read implementation, using the Cognitive Search and OpenAI APIs directly. It first retrieves
-# top documents from search, then constructs a prompt with them, and then uses OpenAI to generate an completion 
-# (answer) with that prompt.
+
 class RetrieveThenReadApproach(Approach):
+    """
+    Simple retrieve-then-read implementation, using the Cognitive Search and OpenAI APIs directly. It first retrieves
+    top documents from search, then constructs a prompt with them, and then uses OpenAI to generate an completion
+    (answer) with that prompt.
+    """
 
     template = \
 "You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. " + \
diff --git a/app/backend/requirements.txt b/app/backend/requirements.txt
@@ -1,4 +1,4 @@
-azure-identity==1.13.0b3
+azure-identity==1.13.0
 Flask==2.2.5
 langchain==0.0.187
 openai==0.26.4
diff --git a/app/start.sh b/app/start.sh
@@ -17,7 +17,7 @@ if [ $? -ne 0 ]; then
 fi
 
 echo 'Creating python virtual environment "backend/backend_env"'
-python -m venv backend/backend_env
+python3 -m venv backend/backend_env
 
 echo ""
 echo "Restoring backend python packages"
diff --git a/infra/core/ai/cognitiveservices.bicep b/infra/core/ai/cognitiveservices.bicep
@@ -30,9 +30,9 @@ resource deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01
     model: deployment.model
     raiPolicyName: contains(deployment, 'raiPolicyName') ? deployment.raiPolicyName : null
   }
-  sku: {
+  sku: contains(deployment, 'sku') ? deployment.sku : {
     name: 'Standard'
-    capacity: deployment.capacity
+    capacity: 20
   }
 }]
 
diff --git a/infra/main.bicep b/infra/main.bicep
@@ -37,10 +37,10 @@ param formRecognizerResourceGroupLocation string = location
 
 param formRecognizerSkuName string = 'S0'
 
-param gptDeploymentName string = ''
+param gptDeploymentName string // Set in main.parameters.json
 param gptDeploymentCapacity int = 30
 param gptModelName string = 'text-davinci-003'
-param chatGptDeploymentName string = ''
+param chatGptDeploymentName string // Set in main.parameters.json
 param chatGptDeploymentCapacity int = 30
 param chatGptModelName string = 'gpt-35-turbo'
 
@@ -50,8 +50,6 @@ param principalId string = ''
 var abbrs = loadJsonContent('abbreviations.json')
 var resourceToken = toLower(uniqueString(subscription().id, environmentName, location))
 var tags = { 'azd-env-name': environmentName }
-var gptDeployment = empty(gptDeploymentName) ? 'davinci' : gptDeploymentName
-var chatGptDeployment = empty(chatGptDeploymentName) ? 'chat' : chatGptDeploymentName
 
 // Organize resources in a resource group
 resource resourceGroup 'Microsoft.Resources/resourceGroups@2021-04-01' = {
@@ -111,8 +109,8 @@ module backend 'core/host/appservice.bicep' = {
       AZURE_OPENAI_SERVICE: openAi.outputs.name
       AZURE_SEARCH_INDEX: searchIndexName
       AZURE_SEARCH_SERVICE: searchService.outputs.name
-      AZURE_OPENAI_GPT_DEPLOYMENT: gptDeployment
-      AZURE_OPENAI_CHATGPT_DEPLOYMENT: chatGptDeployment
+      AZURE_OPENAI_GPT_DEPLOYMENT: gptDeploymentName
+      AZURE_OPENAI_CHATGPT_DEPLOYMENT: chatGptDeploymentName
     }
   }
 }
@@ -129,22 +127,28 @@ module openAi 'core/ai/cognitiveservices.bicep' = {
     }
     deployments: [
       {
-        name: gptDeployment
+        name: gptDeploymentName
         model: {
           format: 'OpenAI'
           name: gptModelName
           version: '1'
         }
-        capacity: gptDeploymentCapacity
+        sku: {
+          name: 'Standard'
+          capacity: gptDeploymentCapacity
+        }
       }
       {
-        name: chatGptDeployment
+        name: chatGptDeploymentName
         model: {
           format: 'OpenAI'
           name: chatGptModelName
           version: '0301'
         }
-        capacity: chatGptDeploymentCapacity
+        sku: {
+          name: 'Standard'
+          capacity: chatGptDeploymentCapacity
+        }
       }
     ]
   }
@@ -315,8 +319,8 @@ output AZURE_RESOURCE_GROUP string = resourceGroup.name
 
 output AZURE_OPENAI_SERVICE string = openAi.outputs.name
 output AZURE_OPENAI_RESOURCE_GROUP string = openAiResourceGroup.name
-output AZURE_OPENAI_GPT_DEPLOYMENT string = gptDeployment
-output AZURE_OPENAI_CHATGPT_DEPLOYMENT string = chatGptDeployment
+output AZURE_OPENAI_GPT_DEPLOYMENT string = gptDeploymentName
+output AZURE_OPENAI_CHATGPT_DEPLOYMENT string = chatGptDeploymentName
 
 output AZURE_FORMRECOGNIZER_SERVICE string = formRecognizer.outputs.name
 output AZURE_FORMRECOGNIZER_RESOURCE_GROUP string = formRecognizerResourceGroup.name
diff --git a/infra/main.parameters.json b/infra/main.parameters.json
@@ -45,10 +45,10 @@
       "value": "${AZURE_STORAGE_RESOURCE_GROUP}"
     },
     "chatGptDeploymentName": {
-      "value": "${AZURE_OPENAI_CHATGPT_DEPLOYMENT}"
+      "value": "${AZURE_OPENAI_CHATGPT_DEPLOYMENT=chat}"
     },
     "gptDeploymentName": {
-      "value": "${AZURE_OPENAI_GPT_DEPLOYMENT}"
+      "value": "${AZURE_OPENAI_GPT_DEPLOYMENT=davinci}"
     }
   }
 }
diff --git a/notebooks/requirements.txt b/notebooks/requirements.txt
@@ -1,4 +1,4 @@
-azure-identity==1.12.0
+azure-identity==1.13.0
 langchain==0.0.187
 openai==0.26.4
 azure-search-documents==11.4.0b3
diff --git a/scripts/prepdocs.sh b/scripts/prepdocs.sh
@@ -12,7 +12,7 @@ $(azd env get-values)
 EOF
 
 echo 'Creating python virtual environment "scripts/.venv"'
-python -m venv scripts/.venv
+python3 -m venv scripts/.venv
 
 echo 'Installing dependencies from "requirements.txt" into virtual environment'
 ./scripts/.venv/bin/python -m pip install -r scripts/requirements.txt
diff --git a/scripts/requirements.txt b/scripts/requirements.txt
@@ -1,5 +1,5 @@
 pypdf==3.5.0
-azure-identity==1.13.0b4
+azure-identity==1.13.0
 azure-search-documents==11.4.0b3
 azure-ai-formrecognizer==3.2.1
 azure-storage-blob==12.14.1

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-azure-identity==1.13.0b3`
	`1`	`+azure-identity==1.13.0`
`2`	`2`	`Flask==2.2.5`
`3`	`3`	`langchain==0.0.187`
`4`	`4`	`openai==0.26.4`
Original file line number	Diff line number	Diff line change
`@@ -30,9 +30,9 @@ resource deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-05-01`
`30`	`30`	`model: deployment.model`
`31`	`31`	`raiPolicyName: contains(deployment, 'raiPolicyName') ? deployment.raiPolicyName : null`
`32`	`32`	`}`
`33`		`- sku: {`
	`33`	`+ sku: contains(deployment, 'sku') ? deployment.sku : {`
`34`	`34`	`name: 'Standard'`
`35`		`- capacity: deployment.capacity`
	`35`	`+ capacity: 20`
`36`	`36`	`}`
`37`	`37`	`}]`
`38`	`38`
Original file line number	Diff line number	Diff line change
`@@ -45,10 +45,10 @@`
`45`	`45`	`"value": "${AZURE_STORAGE_RESOURCE_GROUP}"`
`46`	`46`	`},`
`47`	`47`	`"chatGptDeploymentName": {`
`48`		`- "value": "${AZURE_OPENAI_CHATGPT_DEPLOYMENT}"`
	`48`	`+ "value": "${AZURE_OPENAI_CHATGPT_DEPLOYMENT=chat}"`
`49`	`49`	`},`
`50`	`50`	`"gptDeploymentName": {`
`51`		`- "value": "${AZURE_OPENAI_GPT_DEPLOYMENT}"`
	`51`	`+ "value": "${AZURE_OPENAI_GPT_DEPLOYMENT=davinci}"`
`52`	`52`	`}`
`53`	`53`	`}`
`54`	`54`	`}`