fix: Add deployment options and VM credentials configuration to LOCAL_DEPLOYMENT.md

Pavan-Microsoft · Pavan-Microsoft · commit 291224b0f049 · 2025-09-22T22:46:01.000+05:30
diff --git a/docs/LOCAL_DEPLOYMENT.md b/docs/LOCAL_DEPLOYMENT.md
@@ -32,6 +32,45 @@ azd env set USE_KEY_VAULT false
 
 Also please refer to the section on [setting up RBAC auth](#authenticate-using-rbac).
 
+## Deployment Options & Steps
+
+### Sandbox or WAF Aligned Deployment Options
+
+The [`infra`](../infra) folder of the Chat With Your Data Solution Accelerator contains the [`main.bicep`](../infra/main.bicep) Bicep script, which defines all Azure infrastructure components for this solution.
+
+By default, the `azd up` command uses the [`main.parameters.json`](../infra/main.parameters.json) file to deploy the solution. This file is pre-configured for a **sandbox environment** — ideal for development and proof-of-concept scenarios, with minimal security and cost controls for rapid iteration.
+
+For **production deployments**, the repository also provides [`main.waf.parameters.json`](../infra/main.waf.parameters.json), which applies a [Well-Architected Framework (WAF) aligned](https://learn.microsoft.com/en-us/azure/well-architected/) configuration. This option enables additional Azure best practices for reliability, security, cost optimization, operational excellence, and performance efficiency, such as:
+
+  - Enhanced network security (e.g., Network protection with private endpoints)
+  - Stricter access controls and managed identities
+  - Logging, monitoring, and diagnostics enabled by default
+  - Resource tagging and cost management recommendations
+
+**How to choose your deployment configuration:**
+
+* Use the default `main.parameters.json` file for a **sandbox/dev environment**
+* For a **WAF-aligned, production-ready deployment**, copy the contents of `main.waf.parameters.json` into `main.parameters.json` before running `azd up`
+
+---
+
+### VM Credentials Configuration
+
+By default, the solution sets the VM administrator username and password from environment variables.
+
+To set your own VM credentials before deployment, use:
+
+```sh
+azd env set AZURE_ENV_VM_ADMIN_USERNAME <your-username>
+azd env set AZURE_ENV_VM_ADMIN_PASSWORD <your-password>
+```
+
+> [!TIP]
+> Always review and adjust parameter values (such as region, capacity, security settings and log analytics workspace configuration) to match your organization's requirements before deploying. For production, ensure you have sufficient quota and follow the principle of least privilege for all identities and role assignments.
+
+> [!IMPORTANT]
+> The WAF-aligned configuration is under active development. More Azure Well-Architected recommendations will be added in future updates.
+
 ## Detailed Development Container setup instructions
 
 The solution contains a [development container](https://code.visualstudio.com/docs/remote/containers) with all the required tooling to develop and deploy the accelerator. To deploy the Chat With Your Data accelerator using the provided development container you will also need:
@@ -175,56 +214,90 @@ Execute the above [shell command](#L81) to run the function locally. You may nee
 
 | App Setting | Value | Note |
 | --- | --- | ------------- |
-|AZURE_SEARCH_SERVICE||The URL of your Azure AI Search resource. e.g. https://<search-service>.search.windows.net|
-|AZURE_SEARCH_INDEX||The name of your Azure AI Search Index|
-|AZURE_SEARCH_KEY||An **admin key** for your Azure AI Search resource|
-|AZURE_SEARCH_USE_SEMANTIC_SEARCH|False|Whether or not to use semantic search|
-|AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG|default|The name of the semantic search configuration to use if using semantic search.|
-|AZURE_SEARCH_TOP_K|5|The number of documents to retrieve from Azure AI Search.|
-|AZURE_SEARCH_ENABLE_IN_DOMAIN|True|Limits responses to only queries relating to your data.|
+|ADVANCED_IMAGE_PROCESSING_MAX_IMAGES | 1 | The maximum number of images to pass to the vision model in a single request|
+|APPLICATIONINSIGHTS_CONNECTION_STRING||The Application Insights connection string to store the application logs|
+|APP_ENV | Prod | Application Environment (Prod, Dev, etc.)|
+|AZURE_AUTH_TYPE | keys | The default is to use API keys. Change the value to 'rbac' to authenticate using Role Based Access Control. For more information refer to section [Authenticate using RBAC](#authenticate-using-rbac)|
+|AZURE_BLOB_ACCOUNT_KEY||The key of the Azure Blob Storage for storing the original documents to be processed|
+|AZURE_BLOB_ACCOUNT_NAME||The name of the Azure Blob Storage for storing the original documents to be processed|
+|AZURE_BLOB_CONTAINER_NAME||The name of the Container in the Azure Blob Storage for storing the original documents to be processed|
+|AZURE_CLIENT_ID | | Client ID for Azure authentication (required for LangChain AzureSearch vector store)|
+|AZURE_COMPUTER_VISION_ENDPOINT | | The endpoint of the Azure Computer Vision service (if useAdvancedImageProcessing=true)|
+|AZURE_COMPUTER_VISION_VECTORIZE_IMAGE_API_VERSION | 2024-02-01 | The API version for Azure Computer Vision Vectorize Image|
+|AZURE_COMPUTER_VISION_VECTORIZE_IMAGE_MODEL_VERSION | 2023-04-15 | The model version for Azure Computer Vision Vectorize Image|
+|AZURE_CONTENT_SAFETY_ENDPOINT | | The endpoint of the Azure AI Content Safety service|
+|AZURE_CONTENT_SAFETY_KEY | | The key of the Azure AI Content Safety service|
+|AZURE_COSMOSDB_ACCOUNT_NAME | | The name of the Azure Cosmos DB account (when using CosmosDB)|
+|AZURE_COSMOSDB_CONVERSATIONS_CONTAINER_NAME | | The name of the Azure Cosmos DB conversations container (when using CosmosDB)|
+|AZURE_COSMOSDB_DATABASE_NAME | | The name of the Azure Cosmos DB database (when using CosmosDB)|
+|AZURE_COSMOSDB_ENABLE_FEEDBACK | true | Whether to enable feedback functionality in Cosmos DB|
+|AZURE_FORM_RECOGNIZER_ENDPOINT||The name of the Azure Form Recognizer for extracting the text from the documents|
+|AZURE_FORM_RECOGNIZER_KEY||The key of the Azure Form Recognizer for extracting the text from the documents|
+|AZURE_KEY_VAULT_ENDPOINT | | The endpoint of the Azure Key Vault for storing secrets|
+|AZURE_OPENAI_API_KEY||One of the API keys of your Azure OpenAI resource|
+|AZURE_OPENAI_API_VERSION|2024-02-01|API version when using Azure OpenAI on your data|
+|AZURE_OPENAI_EMBEDDING_MODEL|text-embedding-ada-002|The name of your Azure OpenAI embeddings model deployment|
+|AZURE_OPENAI_EMBEDDING_MODEL_NAME|text-embedding-ada-002|The name of the embeddings model (can be found in Azure AI Foundry)|
+|AZURE_OPENAI_EMBEDDING_MODEL_VERSION|2|The version of the embeddings model to use (can be found in Azure AI Foundry)|
+|AZURE_OPENAI_MAX_TOKENS|1000|The maximum number of tokens allowed for the generated answer.|
+|AZURE_OPENAI_MODEL||The name of your model deployment|
+|AZURE_OPENAI_MODEL_NAME|gpt-4.1|The name of the model|
+|AZURE_OPENAI_MODEL_VERSION|2024-05-13|The version of the model to use|
+|AZURE_OPENAI_RESOURCE||the name of your Azure OpenAI resource|
+|AZURE_OPENAI_STOP_SEQUENCE||Up to 4 sequences where the API will stop generating further tokens. Represent these as a string joined with "|", e.g. `"stop1|stop2|stop3"`|
+|AZURE_OPENAI_STREAM | true | Whether or not to stream responses from Azure OpenAI|
+|AZURE_OPENAI_SYSTEM_MESSAGE|You are an AI assistant that helps people find information.|A brief description of the role and tone the model should use|
+|AZURE_OPENAI_TEMPERATURE|0|What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. A value of 0 is recommended when using your data.|
+|AZURE_OPENAI_TOP_P|1.0|An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. We recommend setting this to 1.0 when using your data.|
+|AZURE_POSTGRESQL_DATABASE_NAME | postgres | The name of the Azure PostgreSQL database (when using PostgreSQL)|
+|AZURE_POSTGRESQL_HOST_NAME | | The hostname of the Azure PostgreSQL server (when using PostgreSQL)|
+|AZURE_POSTGRESQL_USER | | The username for Azure PostgreSQL authentication (when using PostgreSQL)|
+|AZURE_SEARCH_CHUNK_COLUMN | chunk | Field from your Azure AI Search index that contains chunk information|
 |AZURE_SEARCH_CONTENT_COLUMN||List of fields in your Azure AI Search index that contains the text content of your documents to use when formulating a bot response. Represent these as a string joined with "|", e.g. `"product_description|product_manual"`|
 |AZURE_SEARCH_CONTENT_VECTOR_COLUMN||Field from your Azure AI Search index for storing the content's Vector embeddings|
+|AZURE_SEARCH_CONVERSATIONS_LOG_INDEX | conversations | The name of the Azure AI Search conversations log index|
+|AZURE_SEARCH_DATASOURCE_NAME | | The name of the Azure AI Search datasource|
 |AZURE_SEARCH_DIMENSIONS|1536| Azure OpenAI Embeddings dimensions. 1536 for `text-embedding-ada-002`. A full list of dimensions can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models). |
+|AZURE_SEARCH_ENABLE_IN_DOMAIN|True|Limits responses to only queries relating to your data.|
 |AZURE_SEARCH_FIELDS_ID|id|`AZURE_SEARCH_FIELDS_ID`: Field from your Azure AI Search index that gives a unique idenitfier of the document chunk. `id` if you don't have a specific requirement.|
+|AZURE_SEARCH_FIELDS_METADATA|metadata|Field from your Azure AI Search index that contains metadata for the document. `metadata` if you don't have a specific requirement.|
+|AZURE_SEARCH_FIELDS_TAG|tag|Field from your Azure AI Search index that contains tags for the document. `tag` if you don't have a specific requirement.|
 |AZURE_SEARCH_FILENAME_COLUMN||`AZURE_SEARCH_FILENAME_COLUMN`: Field from your Azure AI Search index that gives a unique idenitfier of the source of your data to display in the UI.|
-|AZURE_SEARCH_TITLE_COLUMN||Field from your Azure AI Search index that gives a relevant title or header for your data content to display in the UI.|
+|AZURE_SEARCH_FILTER||Filter to apply to search queries.|
+|AZURE_SEARCH_INDEX||The name of your Azure AI Search Index|
+|AZURE_SEARCH_INDEXER_NAME | | The name of the Azure AI Search indexer|
+|AZURE_SEARCH_INDEX_IS_PRECHUNKED | false | Whether the search index is prechunked|
+|AZURE_SEARCH_KEY||An **admin key** for your Azure AI Search resource|
+|AZURE_SEARCH_LAYOUT_TEXT_COLUMN|layoutText|Field from your Azure AI Search index that contains the layout-aware text content of your documents. `layoutText` if you don't have a specific requirement.|
+|AZURE_SEARCH_OFFSET_COLUMN | offset | Field from your Azure AI Search index that contains offset information|
+|AZURE_SEARCH_SEMANTIC_SEARCH_CONFIG|default|The name of the semantic search configuration to use if using semantic search.|
+|AZURE_SEARCH_SERVICE||The URL of your Azure AI Search resource. e.g. https://<search-service>.search.windows.net|
 |AZURE_SEARCH_SOURCE_COLUMN|source|Field from your Azure AI Search index that identifies the source of your data. `source` if you don't have a specific requirement.|
 |AZURE_SEARCH_TEXT_COLUMN|text|Field from your Azure AI Search index that contains the main text content of your documents. `text` if you don't have a specific requirement.|
-|AZURE_SEARCH_LAYOUT_TEXT_COLUMN|layoutText|Field from your Azure AI Search index that contains the layout-aware text content of your documents. `layoutText` if you don't have a specific requirement.|
+|AZURE_SEARCH_TITLE_COLUMN||Field from your Azure AI Search index that gives a relevant title or header for your data content to display in the UI.|
+|AZURE_SEARCH_TOP_K|5|The number of documents to retrieve from Azure AI Search.|
 |AZURE_SEARCH_URL_COLUMN||Field from your Azure AI Search index that contains a URL for the document, e.g. an Azure Blob Storage URI. This value is not currently used.|
-|AZURE_SEARCH_FIELDS_TAG|tag|Field from your Azure AI Search index that contains tags for the document. `tag` if you don't have a specific requirement.|
-|AZURE_SEARCH_FIELDS_METADATA|metadata|Field from your Azure AI Search index that contains metadata for the document. `metadata` if you don't have a specific requirement.|
-|AZURE_SEARCH_FILTER||Filter to apply to search queries.|
 |AZURE_SEARCH_USE_INTEGRATED_VECTORIZATION ||Whether to use [Integrated Vectorization](https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization)|
-|AZURE_OPENAI_RESOURCE||the name of your Azure OpenAI resource|
-|AZURE_OPENAI_MODEL||The name of your model deployment|
-|AZURE_OPENAI_MODEL_NAME|gpt-4.1|The name of the model|
-|AZURE_OPENAI_MODEL_VERSION|2024-05-13|The version of the model to use|
-|AZURE_OPENAI_API_KEY||One of the API keys of your Azure OpenAI resource|
-|AZURE_OPENAI_EMBEDDING_MODEL|text-embedding-ada-002|The name of your Azure OpenAI embeddings model deployment|
-|AZURE_OPENAI_EMBEDDING_MODEL_NAME|text-embedding-ada-002|The name of the embeddings model (can be found in Azure AI Foundry)|
-|AZURE_OPENAI_EMBEDDING_MODEL_VERSION|2|The version of the embeddings model to use (can be found in Azure AI Foundry)|
-|AZURE_OPENAI_TEMPERATURE|0|What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. A value of 0 is recommended when using your data.|
-|AZURE_OPENAI_TOP_P|1.0|An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. We recommend setting this to 1.0 when using your data.|
-|AZURE_OPENAI_MAX_TOKENS|1000|The maximum number of tokens allowed for the generated answer.|
-|AZURE_OPENAI_STOP_SEQUENCE||Up to 4 sequences where the API will stop generating further tokens. Represent these as a string joined with "|", e.g. `"stop1|stop2|stop3"`|
-|AZURE_OPENAI_SYSTEM_MESSAGE|You are an AI assistant that helps people find information.|A brief description of the role and tone the model should use|
-|AZURE_OPENAI_API_VERSION|2024-02-01|API version when using Azure OpenAI on your data|
+|AZURE_SEARCH_USE_SEMANTIC_SEARCH|False|Whether or not to use semantic search|
+|AZURE_SPEECH_RECOGNIZER_LANGUAGES | en-US,fr-FR,de-DE,it-IT | Comma-separated list of languages to recognize from speech input|
+|AZURE_SPEECH_REGION_ENDPOINT | | The regional endpoint of the Azure Speech service|
+|AZURE_SPEECH_SERVICE_KEY | | The key of the Azure Speech service|
+|AZURE_SPEECH_SERVICE_NAME | | The name of the Azure Speech service|
+|AZURE_SPEECH_SERVICE_REGION | | The region (location) of the Azure Speech service|
 |AzureWebJobsStorage||The connection string to the Azure Blob Storage for the Azure Functions Batch processing|
 |BACKEND_URL||The URL for the Backend Batch Azure Function. Use http://localhost:7071 for local execution|
+|CONVERSATION_FLOW | custom | Chat conversation type: custom or byod (Bring Your Own Data)|
+|DATABASE_TYPE | PostgreSQL | The type of database to deploy (cosmos or postgres)|
 |DOCUMENT_PROCESSING_QUEUE_NAME|doc-processing|The name of the Azure Queue to handle the Batch processing|
-|AZURE_BLOB_ACCOUNT_NAME||The name of the Azure Blob Storage for storing the original documents to be processed|
-|AZURE_BLOB_ACCOUNT_KEY||The key of the Azure Blob Storage for storing the original documents to be processed|
-|AZURE_BLOB_CONTAINER_NAME||The name of the Container in the Azure Blob Storage for storing the original documents to be processed|
-|AZURE_FORM_RECOGNIZER_ENDPOINT||The name of the Azure Form Recognizer for extracting the text from the documents|
-|AZURE_FORM_RECOGNIZER_KEY||The key of the Azure Form Recognizer for extracting the text from the documents|
-|APPLICATIONINSIGHTS_CONNECTION_STRING||The Application Insights connection string to store the application logs|
+|FUNCTION_KEY | | The function key for accessing the backend Azure Function|
+|LOGLEVEL | INFO | The log level for application logging (CRITICAL, ERROR, WARN, INFO, DEBUG)|
+|MANAGED_IDENTITY_CLIENT_ID | | The client ID of the user-assigned managed identity|
+|MANAGED_IDENTITY_RESOURCE_ID | | The resource ID of the user-assigned managed identity|
+|OPEN_AI_FUNCTIONS_SYSTEM_PROMPT | | System prompt for OpenAI functions orchestration|
 |ORCHESTRATION_STRATEGY | openai_function | Orchestration strategy. Use Azure OpenAI Functions (openai_function), Semantic Kernel (semantic_kernel),  LangChain (langchain) or Prompt Flow (prompt_flow) for messages orchestration. If you are using a new model version 0613 select any strategy, if you are using a 0314 model version select "langchain". Note that both `openai_function` and `semantic_kernel` use OpenAI function calling. Prompt Flow option is still in development and does not support RBAC or integrated vectorization as of yet.|
-|AZURE_CONTENT_SAFETY_ENDPOINT | | The endpoint of the Azure AI Content Safety service |
-|AZURE_CONTENT_SAFETY_KEY | | The key of the Azure AI Content Safety service|
-|AZURE_SPEECH_SERVICE_KEY | | The key of the Azure Speech service|
-|AZURE_SPEECH_SERVICE_REGION | | The region (location) of the Azure Speech service|
-|AZURE_AUTH_TYPE | keys | The default is to use API keys. Change the value to 'rbac' to authenticate using Role Based Access Control. For more information refer to section [Authenticate using RBAC](#authenticate-using-rbac)
+|SEMANTIC_KERNEL_SYSTEM_PROMPT | | System prompt used by the Semantic Kernel orchestration|
+|USE_ADVANCED_IMAGE_PROCESSING | false | Whether to enable the use of a vision LLM and Computer Vision for embedding images|
+|USE_KEY_VAULT | true | Whether to use Azure Key Vault for storing secrets|
 
 ## Bicep