You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -47,7 +48,7 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
47
48
48
49
> **IMPORTANT:** In order to deploy and run this example, you'll need an **Azure subscription with access enabled for the Azure OpenAI service**. You can request access [here](https://aka.ms/oaiapply). You can also visit [here](https://azure.microsoft.com/free/cognitive-search/) to get some free Azure credits to get you started.
49
50
50
-
## Azure deployment
51
+
## Azure deployment
51
52
52
53
### Cost estimation
53
54
@@ -166,7 +167,7 @@ To then limit access to a specific set of users or groups, you can follow the st
166
167
167
168
## Running locally
168
169
169
-
You can only run locally **after** having successfully run the `azd up` command.
170
+
You can only run locally **after** having successfully run the `azd up` command. If you haven't yet, follow the steps in [Azure deployment](#azure-deployment) above.
170
171
171
172
1. Run `azd auth login`
172
173
2. Change dir to `app`
@@ -183,6 +184,44 @@ Once in the web app:
183
184
* Explore citations and sources
184
185
* Click on "settings" to try different options, tweak prompts, etc.
185
186
187
+
## Productionizing
188
+
189
+
This sample is designed to be a starting point for your own production application,
190
+
but you should do a thorough review of the security and performance before deploying
191
+
to production. Here are some things to consider:
192
+
193
+
***OpenAI Capacity**: The default TPM (tokens per minute) is set to 30K. That is equivalent
194
+
to approximately 30 conversations per minute (assuming 1K per user message/response).
195
+
You can increase the capacity by changing the `chatGptDeploymentCapacity` and `embeddingDeploymentCapacity`
196
+
parameters in `infra/main.bicep` to your account's maximum capacity.
197
+
You can also view the Quotas tab in [Azure OpenAI studio](https://oai.azure.com/)
198
+
to understand how much capacity you have.
199
+
***Azure Storage**: The default storage account uses the `Standard_LRS` SKU.
200
+
To improve your resiliency, we recommend using `Standard_ZRS` for production deployments,
201
+
which you can specify using the `sku` property under the `storage` module in `infra/main.bicep`.
202
+
***Azure Cognitive Search**: The default search service uses the `Standard` SKU
203
+
with the free semantic search option, which gives you 1000 free queries a month.
204
+
Assuming your app will experience more than 1000 questions, you should either change `semanticSearch`
205
+
to "standard" or disable semantic search entirely in the `/app/backend/approaches` files.
206
+
If you see errors about search service capacity being exceeded, you may find it helpful to increase
207
+
the number of replicas by changing `replicaCount` in `infra/core/search/search-services.bicep`
208
+
or manually scaling it from the Azure Portal.
209
+
***Azure App Service**: The default app service plan uses the `Basic` SKU with 1 CPU core and 1.75 GB RAM.
210
+
We recommend using a Premium level SKU, starting with 1 CPU core.
211
+
You can use auto-scaling rules or scheduled scaling rules,
212
+
and scale up the maximum/minimum based on load.
213
+
***Authentication**: By default, the deployed app is publicly accessible.
214
+
We recommend restricting access to authenticated users.
215
+
See [Enabling authentication](#enabling-authentication) above for how to enable authentication.
216
+
***Networking**: We recommend deploying inside a Virtual Network. If the app is only for
217
+
internal enterprise use, use a private DNS zone. Also consider using Azure API Management (APIM)
218
+
for firewalls and other forms of protection.
219
+
For more details, read [Azure OpenAI Landing Zone reference architecture](https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102).
220
+
***Loadtesting**: We recommend running a loadtest for your expected number of users.
221
+
You can use the [locust tool](https://docs.locust.io/) with the `locustfile.py` in this sample
222
+
or set up a loadtest with Azure Load Testing.
223
+
224
+
186
225
## Resources
187
226
188
227
*[Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search](https://aka.ms/entgptsearchblog)
0 commit comments