fix: Refactor data processing with manual scripts to resolve deployment script SFI issue #687

Pavan-Microsoft · 2025-12-12T10:44:48Z

Purpose

This pull request introduces several infrastructure and documentation updates to improve deployment clarity, environment setup, and configuration flexibility. The most significant changes include the removal of the Key Vault module and associated deployment scripts from the Bicep template, expanded documentation for data processing scripts, and updates to Python version requirements. Additionally, new outputs and parameters have been added to the Bicep template to support integration with AI and storage services.

Infrastructure changes:

Removed the Key Vault module and its configuration from infra/main.bicep, simplifying the deployment and secret management approach.
Removed deployment scripts for uploading demo data, creating search indexes, and creating SQL users/roles from infra/main.bicep, delegating these tasks to manual or external script execution.
Added new Bicep parameters and outputs for Content Understanding API version, backend user managed identity, storage account/container names, and AI Foundry resource IDs to facilitate integration and downstream automation. [1] [2] [3] [4]
Updated the DNS zone index and removed unused variables for clarity in infra/main.bicep. [1] [2]

Documentation and setup improvements:

Updated documentation in DeploymentGuide.md and CustomizeData.md to instruct users to run new process_sample_data.sh and process_custom_data.sh scripts for data processing, including detailed parameter instructions. [1] [2]
Clarified Python version requirements to explicitly support Python 3.9 through 3.11 in all relevant documentation. [1] [2]
Enhanced post-deployment instructions in azure.yaml to guide users on processing sample data via Bash scripts.

Dev environment updates:

Added the mssql-odbc-driver feature (version 17) to the devcontainer configuration for improved SQL Server connectivity.
Updated setup_env.sh to reference the new data processing scripts and ensure correct permissions.

Does this introduce a breaking change?

Yes
No

Golden Path Validation

I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

I have validated the deployment process successfully and all services are running as expected with this change.

- Introduced `04_cu_process_custom_data.py` for processing custom data and integrating with Azure services. - Removed obsolete `azure_credential_utils.py` as its functionality is now integrated elsewhere. - Updated `content_understanding_client.py` to improve error handling. - Created `process_custom_data_scripts.sh` for streamlined script execution and dependency management. - Enhanced `process_data_scripts.sh` to include additional parameters and improved error handling. - Refactored `run_create_index_scripts.sh` to support Azure authentication and role assignment. - Deleted `run_create_index_scripts_manual.sh` as its functionality is now covered in the updated script. - Adjusted `run_process_data_scripts.sh` to reference the new Bicep file for custom data processing.

…sing

…variable

…ole assignments, and error handling; remove run_process_data_scripts.sh

…ove obsolete PowerShell script

…x/Linux

…ss-platform support

…improve virtual environment handling

…mands in Azure YAML and update SQL output directory path in Python script

…rove error handling in bash script for enabling public access

…lt dependencies and streamline parameter handling - Removed Key Vault related parameters and configurations from Bicep templates. - Updated Python scripts to accept command line arguments for necessary endpoints and models instead of retrieving them from Key Vault. - Modified shell scripts to pass new parameters to Python scripts for improved flexibility and clarity. - Cleaned up unused variables and consolidated logic for better maintainability.

…essing data - Introduced a new script `process_custom_data.sh` to manage public network access for Azure resources and execute data processing. - Implemented functions to enable and restore public access for Storage Account, AI Foundry, CU Foundry, and SQL Server. - Added error handling and logging for network access changes. - Refactored existing `process_sample_data.sh` to remove deployment output retrieval logic, now handled in `process_custom_data.sh`. - Removed SQL table creation logic from `run_create_index_scripts.sh` to streamline the process.

…in data processing script

…ing script

…custom_data.sh

…t processing script

…le data with new parameters

…Azure services and Content Understanding API

…SSQL ODBC driver and correct script permissions

…ion to process_custom_data.sh

… improve SQL Server public access feedback in scripts

…rivate endpoint management and remove secrets export configuration

…ed stability

…edge-Mining-Solution-Accelerator into pk-km-sampledata-manual

…rameters.json

…buntu

…Debian/Ubuntu" This reverts commit 7e51dd3.

…n and code

…ation

… cleaner code

github-actions · 2025-12-18T14:36:22Z

🎉 This PR is included in version 3.17.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Pavan-Microsoft added 21 commits December 10, 2025 15:50

feat: Add optimized SQL insert script generation for DataFrame proces…

2113a0e

…sing

refactor: Clean up code formatting and remove unused AUDIO_DIRECTORY …

1b9e5b9

…variable

Refactor run_create_index_scripts.sh: Enhance Azure authentication, r…

df45bfc

…ole assignments, and error handling; remove run_process_data_scripts.sh

feat: Add script to assign SQL roles to Azure Managed Identities; rem…

ac75442

…ove obsolete PowerShell script

feat: Add virtual environment activation messages for Windows and Uni…

47ebc0f

…x/Linux

feat: Update virtual environment path and activation handling for cro…

cc2abf5

…ss-platform support

feat: Enhance script to support dynamic Python command detection and …

a9fe3dd

…improve virtual environment handling

feat: Add informative messages for running sample data processing com…

98f06b0

…mands in Azure YAML and update SQL output directory path in Python script

feat: Refactor SQL connection handling to use Azure CLI token and imp…

5d86e4e

…rove error handling in bash script for enabling public access

refactor: Simplify user creation logic in assign_sql_roles function

b96eab7

feat: Update SQL Server parameter to use fully qualified domain name …

466ec86

…in data processing script

fix: Correct variable naming for Azure AI API version in data process…

06cd33c

…ing script

feat: Add Content Understanding analyzer creation scripts to process_…

45b70f3

…custom_data.sh

feat: Update Azure OpenAI API version to 2024-12-01-preview in conten…

cf0ca27

…t processing script

feat: Update scripts and documentation for processing custom and samp…

81995a5

…le data with new parameters

feat: Update scripts and documentation to support new parameters for …

00669d7

…Azure services and Content Understanding API

feat: Update devcontainer configuration and setup script to include M…

f1b0781

…SSQL ODBC driver and correct script permissions

Refactor code structure for improved readability and maintainability

2fedc88

Pavan-Microsoft requested review from Avijit-Microsoft, Prajwal-Microsoft, Roopan-Microsoft, Vinay-Microsoft, aniaroramsft, brittneek, nchandhi and toherman-msft as code owners December 12, 2025 10:44

Pavan-Microsoft marked this pull request as draft December 12, 2025 11:04

Pavan-Microsoft added 8 commits December 12, 2025 16:35

feat: Add Python virtual environment setup and requirements installat…

96fabfa

…ion to process_custom_data.sh

feat: Add dependency management for cognitive service deployments and…

a604f09

… improve SQL Server public access feedback in scripts

feat: Refactor SQL Server and Cognitive Services modules to enhance p…

7e72f30

…rivate endpoint management and remove secrets export configuration

feat: Remove unused private DNS zone for vaultcore from main.bicep

697b6ce

feat: Remove secrets export configuration from AI services module

a683a56

feat: Update AzureCliCredential to include process timeout for improv…

04e541a

…ed stability

Merge branch 'dev' of https://github.com/microsoft/Conversation-Knowl…

da41343

…edge-Mining-Solution-Accelerator into pk-km-sampledata-manual

feat: Remove tags parameter from main.parameters.json and main.waf.pa…

6465691

…rameters.json

Pavan-Microsoft marked this pull request as ready for review December 15, 2025 08:15

Pavan-Microsoft requested a review from dgp10801 as a code owner December 15, 2025 08:15

Pavan-Microsoft marked this pull request as draft December 15, 2025 09:31

Pavan-Microsoft added 4 commits December 15, 2025 15:08

feat: Enhance install script for ODBC Driver installation on Debian/U…

7e51dd3

…buntu

Revert "feat: Enhance install script for ODBC Driver installation on …

b25b0f4

…Debian/Ubuntu" This reverts commit 7e51dd3.

feat: Update ODBC Driver references to version 18 across documentatio…

782d633

…n and code

feat: Remove branch parameter from azd init command in install script

86ffcf0

Pavan-Microsoft marked this pull request as ready for review December 15, 2025 10:44

Pavan-Microsoft added 2 commits December 15, 2025 16:16

feat: Update mssql-odbc-driver version to 18 in devcontainer configur…

2a8f324

…ation

fix: Remove unnecessary blank lines in get_db_connection function for…

cf1acb3

… cleaner code

Avijit-Microsoft approved these changes Dec 15, 2025

View reviewed changes

Avijit-Microsoft merged commit ea9960e into dev Dec 15, 2025
5 checks passed

github-actions bot added the released label Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Refactor data processing with manual scripts to resolve deployment script SFI issue #687

fix: Refactor data processing with manual scripts to resolve deployment script SFI issue #687

Uh oh!

Pavan-Microsoft commented Dec 12, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants