Add section about productionizing (#577)

pamelafox · web-flow · commit 9cdbc1c543d0 · 2023-09-01T12:21:34.000-07:00
* Remove defaults for getenv

* Remove print

* missing output

* readme section

* Update README with productionizing tips

* Add networking section

* Review feedback from comments
diff --git a/README.md b/README.md
@@ -19,6 +19,7 @@
   - [Enabling authentication](#enabling-authentication)
 - [Using the app](#using-the-app)
 - [Running locally](#running-locally)
+- [Productionizing](#productionizing)
 - [Resources](#resources)
   - [Note](#note)
   - [FAQ](#faq)
@@ -47,7 +48,7 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
 
 > **IMPORTANT:** In order to deploy and run this example, you'll need an **Azure subscription with access enabled for the Azure OpenAI service**. You can request access [here](https://aka.ms/oaiapply). You can also visit [here](https://azure.microsoft.com/free/cognitive-search/) to get some free Azure credits to get you started.
 
-## Azure deployment 
+## Azure deployment
 
 ### Cost estimation
 
@@ -166,7 +167,7 @@ To then limit access to a specific set of users or groups, you can follow the st
 
 ## Running locally
 
-You can only run locally **after** having successfully run the `azd up` command.
+You can only run locally **after** having successfully run the `azd up` command. If you haven't yet, follow the steps in [Azure deployment](#azure-deployment) above.
 
 1. Run `azd auth login`
 2. Change dir to `app`
@@ -183,6 +184,44 @@ Once in the web app:
 * Explore citations and sources
 * Click on "settings" to try different options, tweak prompts, etc.
 
+## Productionizing
+
+This sample is designed to be a starting point for your own production application,
+but you should do a thorough review of the security and performance before deploying
+to production. Here are some things to consider:
+
+* **OpenAI Capacity**: The default TPM (tokens per minute) is set to 30K. That is equivalent
+  to approximately 30 conversations per minute (assuming 1K per user message/response).
+  You can increase the capacity by changing the `chatGptDeploymentCapacity` and `embeddingDeploymentCapacity`
+  parameters in `infra/main.bicep` to your account's maximum capacity.
+  You can also view the Quotas tab in [Azure OpenAI studio](https://oai.azure.com/)
+  to understand how much capacity you have.
+* **Azure Storage**: The default storage account uses the `Standard_LRS` SKU.
+  To improve your resiliency, we recommend using `Standard_ZRS` for production deployments,
+  which you can specify using the `sku` property under the `storage` module in `infra/main.bicep`.
+* **Azure Cognitive Search**: The default search service uses the `Standard` SKU
+  with the free semantic search option, which gives you 1000 free queries a month.
+  Assuming your app will experience more than 1000 questions, you should either change `semanticSearch`
+  to "standard" or disable semantic search entirely in the `/app/backend/approaches` files.
+  If you see errors about search service capacity being exceeded, you may find it helpful to increase
+  the number of replicas by changing `replicaCount` in `infra/core/search/search-services.bicep`
+  or manually scaling it from the Azure Portal.
+* **Azure App Service**: The default app service plan uses the `Basic` SKU with 1 CPU core and 1.75 GB RAM.
+  We recommend using a Premium level SKU, starting with 1 CPU core.
+  You can use auto-scaling rules or scheduled scaling rules,
+  and scale up the maximum/minimum based on load.
+* **Authentication**: By default, the deployed app is publicly accessible.
+  We recommend restricting access to authenticated users.
+  See [Enabling authentication](#enabling-authentication) above for how to enable authentication.
+* **Networking**: We recommend deploying inside a Virtual Network. If the app is only for
+  internal enterprise use, use a private DNS zone. Also consider using Azure API Management (APIM)
+  for firewalls and other forms of protection.
+  For more details, read [Azure OpenAI Landing Zone reference architecture](https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102).
+* **Loadtesting**: We recommend running a loadtest for your expected number of users.
+  You can use the [locust tool](https://docs.locust.io/) with the `locustfile.py` in this sample
+  or set up a loadtest with Azure Load Testing.
+
+
 ## Resources
 
 * [Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search](https://aka.ms/entgptsearchblog)