Skip to content

Commit 457c4d2

Browse files
authored
Make it possible to deploy on free App Service and free Search Service tier (#1166)
* Draft of freer deployment * Update readme * Finsh docs * Clean up prepdocs * Set semanticSearch accordingly * Making the UI honor semantic search being turned off * Use azd env vars for vectors, PDF * Tests * Honor vectors and search in deployed version * Fix dependency issues with keyvault, provide details for free account deployment * Preamble re free accts * Adding environment variable for semantic ranker level, default to free * Fix conditional deployment issue
1 parent 62c5ae8 commit 457c4d2

File tree

16 files changed

+419
-116
lines changed

16 files changed

+419
-116
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,8 @@ However, you can try the [Azure pricing calculator](https://azure.com/e/8ffbe5b1
9595
- Azure Blob Storage: Standard tier with ZRS (Zone-redundant storage). Pricing per storage and read operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
9696
- Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)
9797

98-
To reduce costs, you can switch to free SKUs for Azure App Service and Azure AI Document Intelligence by changing the parameters file under the `infra` folder. There are some limits to consider; for example, the free Azure AI Document Intelligence resource only analyzes the first 2 pages of each document. You can also reduce costs associated with the Azure AI Document Intelligence by reducing the number of documents in the `data` folder, or by removing the postprovision hook in `azure.yaml` that runs the `prepdocs.py` script.
98+
To reduce costs, you can switch to free SKUs for various services, but those SKUs have limitations.
99+
See this guide on [deploying with minimal costs](docs/deploy_lowcost.md) for more details.
99100

100101
⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use,
101102
either by deleting the resource group in the Portal or running `azd down`.

app/backend/app.py

Lines changed: 33 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
import mimetypes
66
import os
77
from pathlib import Path
8-
from typing import Any, AsyncGenerator, Dict, cast
8+
from typing import Any, AsyncGenerator, Dict, Union, cast
99

10+
from azure.core.credentials import AzureKeyCredential
11+
from azure.core.credentials_async import AsyncTokenCredential
1012
from azure.core.exceptions import ResourceNotFoundError
1113
from azure.identity.aio import DefaultAzureCredential, get_bearer_token_provider
1214
from azure.keyvault.secrets.aio import SecretClient
@@ -46,6 +48,8 @@
4648
CONFIG_GPT4V_DEPLOYED,
4749
CONFIG_OPENAI_CLIENT,
4850
CONFIG_SEARCH_CLIENT,
51+
CONFIG_SEMANTIC_RANKER_DEPLOYED,
52+
CONFIG_VECTOR_SEARCH_ENABLED,
4953
)
5054
from core.authentication import AuthenticationHelper
5155
from decorators import authenticated, authenticated_path
@@ -192,7 +196,13 @@ def auth_setup():
192196

193197
@bp.route("/config", methods=["GET"])
194198
def config():
195-
return jsonify({"showGPT4VOptions": current_app.config[CONFIG_GPT4V_DEPLOYED]})
199+
return jsonify(
200+
{
201+
"showGPT4VOptions": current_app.config[CONFIG_GPT4V_DEPLOYED],
202+
"showSemanticRankerOption": current_app.config[CONFIG_SEMANTIC_RANKER_DEPLOYED],
203+
"showVectorOption": current_app.config[CONFIG_VECTOR_SEARCH_ENABLED],
204+
}
205+
)
196206

197207

198208
@bp.before_app_serving
@@ -202,6 +212,7 @@ async def setup_clients():
202212
AZURE_STORAGE_CONTAINER = os.environ["AZURE_STORAGE_CONTAINER"]
203213
AZURE_SEARCH_SERVICE = os.environ["AZURE_SEARCH_SERVICE"]
204214
AZURE_SEARCH_INDEX = os.environ["AZURE_SEARCH_INDEX"]
215+
SEARCH_SECRET_NAME = os.getenv("SEARCH_SECRET_NAME")
205216
VISION_SECRET_NAME = os.getenv("VISION_SECRET_NAME")
206217
AZURE_KEY_VAULT_NAME = os.getenv("AZURE_KEY_VAULT_NAME")
207218
# Shared by all OpenAI deployments
@@ -232,6 +243,7 @@ async def setup_clients():
232243

233244
AZURE_SEARCH_QUERY_LANGUAGE = os.getenv("AZURE_SEARCH_QUERY_LANGUAGE", "en-us")
234245
AZURE_SEARCH_QUERY_SPELLER = os.getenv("AZURE_SEARCH_QUERY_SPELLER", "lexicon")
246+
AZURE_SEARCH_SEMANTIC_RANKER = os.getenv("AZURE_SEARCH_SEMANTIC_RANKER", "free").lower()
235247

236248
USE_GPT4V = os.getenv("USE_GPT4V", "").lower() == "true"
237249

@@ -241,16 +253,31 @@ async def setup_clients():
241253
# If you encounter a blocking error during a DefaultAzureCredential resolution, you can exclude the problematic credential by using a parameter (ex. exclude_shared_token_cache_credential=True)
242254
azure_credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
243255

256+
# Fetch any necessary secrets from Key Vault
257+
vision_key = None
258+
search_key = None
259+
if AZURE_KEY_VAULT_NAME and (VISION_SECRET_NAME or SEARCH_SECRET_NAME):
260+
key_vault_client = SecretClient(
261+
vault_url=f"https://{AZURE_KEY_VAULT_NAME}.vault.azure.net", credential=azure_credential
262+
)
263+
vision_key = (await key_vault_client.get_secret(VISION_SECRET_NAME)).value
264+
search_key = (await key_vault_client.get_secret(SEARCH_SECRET_NAME)).value
265+
await key_vault_client.close()
266+
244267
# Set up clients for AI Search and Storage
268+
search_credential: Union[AsyncTokenCredential, AzureKeyCredential] = (
269+
AzureKeyCredential(search_key) if search_key else azure_credential
270+
)
245271
search_client = SearchClient(
246272
endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
247273
index_name=AZURE_SEARCH_INDEX,
248-
credential=azure_credential,
274+
credential=search_credential,
249275
)
250276
search_index_client = SearchIndexClient(
251277
endpoint=f"https://{AZURE_SEARCH_SERVICE}.search.windows.net",
252-
credential=azure_credential,
278+
credential=search_credential,
253279
)
280+
254281
blob_client = BlobServiceClient(
255282
account_url=f"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net", credential=azure_credential
256283
)
@@ -267,15 +294,6 @@ async def setup_clients():
267294
require_access_control=AZURE_ENFORCE_ACCESS_CONTROL,
268295
)
269296

270-
vision_key = None
271-
if VISION_SECRET_NAME and AZURE_KEY_VAULT_NAME: # Cognitive vision keys are stored in keyvault
272-
key_vault_client = SecretClient(
273-
vault_url=f"https://{AZURE_KEY_VAULT_NAME}.vault.azure.net", credential=azure_credential
274-
)
275-
vision_secret = await key_vault_client.get_secret(VISION_SECRET_NAME)
276-
vision_key = vision_secret.value
277-
await key_vault_client.close()
278-
279297
# Used by the OpenAI SDK
280298
openai_client: AsyncOpenAI
281299

@@ -301,6 +319,8 @@ async def setup_clients():
301319
current_app.config[CONFIG_AUTH_CLIENT] = auth_helper
302320

303321
current_app.config[CONFIG_GPT4V_DEPLOYED] = bool(USE_GPT4V)
322+
current_app.config[CONFIG_SEMANTIC_RANKER_DEPLOYED] = AZURE_SEARCH_SEMANTIC_RANKER != "disabled"
323+
current_app.config[CONFIG_VECTOR_SEARCH_ENABLED] = os.getenv("USE_VECTORS", "").lower() != "false"
304324

305325
# Various approaches to integrate GPT and external knowledge, most applications will use a single one of these patterns
306326
# or some derivative, here we include several for exploration purposes

app/backend/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,7 @@
77
CONFIG_BLOB_CONTAINER_CLIENT = "blob_container_client"
88
CONFIG_AUTH_CLIENT = "auth_client"
99
CONFIG_GPT4V_DEPLOYED = "gpt4v_deployed"
10+
CONFIG_SEMANTIC_RANKER_DEPLOYED = "semantic_ranker_deployed"
11+
CONFIG_VECTOR_SEARCH_ENABLED = "vector_search_enabled"
1012
CONFIG_SEARCH_CLIENT = "search_client"
1113
CONFIG_OPENAI_CLIENT = "openai_client"

app/frontend/src/api/models.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,4 +80,6 @@ export type ChatAppRequest = {
8080

8181
export type Config = {
8282
showGPT4VOptions: boolean;
83+
showSemanticRankerOption: boolean;
84+
showVectorOption: boolean;
8385
};

app/frontend/src/pages/chat/Chat.tsx

Lines changed: 24 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -59,12 +59,20 @@ const Chat = () => {
5959
const [answers, setAnswers] = useState<[user: string, response: ChatAppResponse][]>([]);
6060
const [streamedAnswers, setStreamedAnswers] = useState<[user: string, response: ChatAppResponse][]>([]);
6161
const [showGPT4VOptions, setShowGPT4VOptions] = useState<boolean>(false);
62+
const [showSemanticRankerOption, setShowSemanticRankerOption] = useState<boolean>(false);
63+
const [showVectorOption, setShowVectorOption] = useState<boolean>(false);
6264

6365
const getConfig = async () => {
6466
const token = client ? await getToken(client) : undefined;
6567

6668
configApi(token).then(config => {
6769
setShowGPT4VOptions(config.showGPT4VOptions);
70+
setUseSemanticRanker(config.showSemanticRankerOption);
71+
setShowSemanticRankerOption(config.showSemanticRankerOption);
72+
setShowVectorOption(config.showVectorOption);
73+
if (!config.showVectorOption) {
74+
setRetrievalMode(RetrievalMode.Text);
75+
}
6876
});
6977
};
7078

@@ -374,12 +382,15 @@ const Chat = () => {
374382
onChange={onRetrieveCountChange}
375383
/>
376384
<TextField className={styles.chatSettingsSeparator} label="Exclude category" onChange={onExcludeCategoryChanged} />
377-
<Checkbox
378-
className={styles.chatSettingsSeparator}
379-
checked={useSemanticRanker}
380-
label="Use semantic ranker for retrieval"
381-
onChange={onUseSemanticRankerChange}
382-
/>
385+
386+
{showSemanticRankerOption && (
387+
<Checkbox
388+
className={styles.chatSettingsSeparator}
389+
checked={useSemanticRanker}
390+
label="Use semantic ranker for retrieval"
391+
onChange={onUseSemanticRankerChange}
392+
/>
393+
)}
383394
<Checkbox
384395
className={styles.chatSettingsSeparator}
385396
checked={useSemanticCaptions}
@@ -405,11 +416,13 @@ const Chat = () => {
405416
/>
406417
)}
407418

408-
<VectorSettings
409-
showImageOptions={useGPT4V && showGPT4VOptions}
410-
updateVectorFields={(options: VectorFieldOptions[]) => setVectorFieldList(options)}
411-
updateRetrievalMode={(retrievalMode: RetrievalMode) => setRetrievalMode(retrievalMode)}
412-
/>
419+
{showVectorOption && (
420+
<VectorSettings
421+
showImageOptions={useGPT4V && showGPT4VOptions}
422+
updateVectorFields={(options: VectorFieldOptions[]) => setVectorFieldList(options)}
423+
updateRetrievalMode={(retrievalMode: RetrievalMode) => setRetrievalMode(retrievalMode)}
424+
/>
425+
)}
413426

414427
{useLogin && (
415428
<Checkbox

app/frontend/src/pages/oneshot/OneShot.tsx

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ export function Component(): JSX.Element {
3333
const [useOidSecurityFilter, setUseOidSecurityFilter] = useState<boolean>(false);
3434
const [useGroupsSecurityFilter, setUseGroupsSecurityFilter] = useState<boolean>(false);
3535
const [showGPT4VOptions, setShowGPT4VOptions] = useState<boolean>(false);
36+
const [showSemanticRankerOption, setShowSemanticRankerOption] = useState<boolean>(false);
37+
const [showVectorOption, setShowVectorOption] = useState<boolean>(false);
3638

3739
const lastQuestionRef = useRef<string>("");
3840

@@ -50,6 +52,12 @@ export function Component(): JSX.Element {
5052

5153
configApi(token).then(config => {
5254
setShowGPT4VOptions(config.showGPT4VOptions);
55+
setUseSemanticRanker(config.showSemanticRankerOption);
56+
setShowSemanticRankerOption(config.showSemanticRankerOption);
57+
setShowVectorOption(config.showVectorOption);
58+
if (!config.showVectorOption) {
59+
setRetrievalMode(RetrievalMode.Text);
60+
}
5361
});
5462
};
5563

@@ -237,12 +245,16 @@ export function Component(): JSX.Element {
237245
onChange={onRetrieveCountChange}
238246
/>
239247
<TextField className={styles.oneshotSettingsSeparator} label="Exclude category" onChange={onExcludeCategoryChanged} />
240-
<Checkbox
241-
className={styles.oneshotSettingsSeparator}
242-
checked={useSemanticRanker}
243-
label="Use semantic ranker for retrieval"
244-
onChange={onUseSemanticRankerChange}
245-
/>
248+
249+
{showSemanticRankerOption && (
250+
<Checkbox
251+
className={styles.oneshotSettingsSeparator}
252+
checked={useSemanticRanker}
253+
label="Use semantic ranker for retrieval"
254+
onChange={onUseSemanticRankerChange}
255+
/>
256+
)}
257+
246258
<Checkbox
247259
className={styles.oneshotSettingsSeparator}
248260
checked={useSemanticCaptions}
@@ -262,11 +274,13 @@ export function Component(): JSX.Element {
262274
/>
263275
)}
264276

265-
<VectorSettings
266-
showImageOptions={useGPT4V && showGPT4VOptions}
267-
updateVectorFields={(options: VectorFieldOptions[]) => setVectorFieldList(options)}
268-
updateRetrievalMode={(retrievalMode: RetrievalMode) => setRetrievalMode(retrievalMode)}
269-
/>
277+
{showVectorOption && (
278+
<VectorSettings
279+
showImageOptions={useGPT4V && showGPT4VOptions}
280+
updateVectorFields={(options: VectorFieldOptions[]) => setVectorFieldList(options)}
281+
updateRetrievalMode={(retrievalMode: RetrievalMode) => setRetrievalMode(retrievalMode)}
282+
/>
283+
)}
270284

271285
{useLogin && (
272286
<Checkbox

docs/deploy_lowcost.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Deploying with minimal costs
2+
3+
This AI RAG chat application is designed to be easily deployed using the Azure Developer CLI, which provisions the infrastructure according to the Bicep files in the `infra` folder. Those files describe each of the Azure resources needed, and configures their SKU (pricing tier) and other parameters. Many Azure services offer a free tier, but the infrastructure files in this project do *not* default to the free tier as there are often limitations in that tier.
4+
5+
However, if your goal is to minimize costs while prototyping your application, follow these steps below _before_ deploying the application.
6+
7+
1. Use the free tier of App Service:
8+
9+
```shell
10+
azd env set AZURE_APP_SERVICE_SKU F1
11+
```
12+
13+
Limitation: You are only allowed a certain number of free App Service instances per region. If you have exceeded your limit in a region, you will get an error during the provisioning stage. If that happens, you can run `azd down`, then `azd env new` to create a new environment with a new region.
14+
15+
2. Use the free tier of Azure AI Search:
16+
17+
```shell
18+
azd env set AZURE_SEARCH_SERVICE_SKU free
19+
```
20+
21+
Limitations:
22+
1. You are only allowed one free search service across all regions.
23+
If you have one already, either delete that service or follow instructions to
24+
reuse your [existing search service](../README.md#existing-azure-ai-search-resource).
25+
2. The free tier does not support semantic ranker, so the app UI will no longer display
26+
the option to use the semantic ranker. Note that will generally result in [decreased search relevance](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167).
27+
3. The free tier does not support Managed Identity (keyless API access),
28+
so the Bicep will use Azure Key Vault to securely store the key instead.
29+
30+
3. Use the free tier of Azure Document Intelligence (used in analyzing PDFs):
31+
32+
```shell
33+
azd env set AZURE_FORMRECOGNIZER_SKU F0
34+
```
35+
36+
Limitation: The free tier will only scan the first two pages of each PDF.
37+
In our sample documents, those first two pages are just title pages,
38+
so you won't be able to get answers from the documents.
39+
You can either use your own documents that are only 2-pages long,
40+
or you can use a local Python package for PDF parsing by setting:
41+
42+
```shell
43+
azd env set USE_LOCAL_PDF_PARSER true
44+
```
45+
46+
3. Turn off Azure Monitor (Application Insights):
47+
48+
```shell
49+
azd env set AZURE_USE_APPLICATION_INSIGHTS false
50+
```
51+
52+
Application Insights is quite inexpensive already, so turning this off may not be worth the costs saved,
53+
but it is an option for those who want to minimize costs.
54+
55+
4. Disable vector search:
56+
57+
```shell
58+
azd env set USE_VECTORS false
59+
```
60+
61+
By default, the application computes vector embeddings for documents during the data ingestion phase,
62+
and then computes a vector embedding for user questions asked in the application.
63+
Those computations require an embedding model, which incurs costs per tokens used. The costs are fairly low,
64+
so the benefits of vector search would typically outweigh the costs, but it is possible to disable vector support.
65+
If you do so, the application will fall back to a keyword search, which is less accurate.
66+
67+
5. Once you've made the desired customizations, follow the steps in [to run `azd up`](../README.md#deploying-from-scratch).
68+
69+
## Deploying from an Azure free account
70+
71+
There are additional limitations for Azure free accounts (as opposed to "Pay-as-you-go" accounts which have billing enabled).
72+
73+
As of January 2024, Azure free accounts cannot sign up for Azure OpenAI access.
74+
You can instead sign up for an openai.com account. Follow these [directions to specify your OpenAI host and key](../README.md#openaicom-openai).
75+
76+
## Reducing costs locally
77+
78+
To save costs for local development, you could use an OpenAI-compatible model.
79+
Follow steps in [local development guide](localdev.md#using-a-local-openai-compatible-api).

infra/core/search/search-services.bicep

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ resource search 'Microsoft.Search/searchServices@2021-04-01-preview' = {
4040
name: name
4141
location: location
4242
tags: tags
43-
identity: {
43+
// The free tier does not support managed identity
44+
identity: (sku.name == 'free') ? null : {
4445
type: 'SystemAssigned'
4546
}
4647
properties: {

0 commit comments

Comments
 (0)