Skip to content

Conversation

@Pavan-Microsoft
Copy link
Contributor

Purpose

This pull request introduces several improvements and refactors to the data processing and deployment scripts, with a focus on making the data processing pipeline more robust, asynchronous, and modular. The most significant changes involve refactoring the conversation processing script to use asynchronous operations for embedding generation, introducing agent creation and invocation for topic mining and mapping, and correcting parameter orders in deployment documentation.

Key changes include:

Data Processing Pipeline Improvements

  • Refactored 03_cu_process_data_text.py to use asynchronous embedding generation with EmbeddingsClient and to process files and insert data into the database and Azure Search using async operations, improving performance and reliability. [1] [2] [3] [4] [5]
  • Added async agent creation for topic mining and mapping using AIProjectClient, and replaced the synchronous GPT-4 topic mining call with an asynchronous agent-based approach, enhancing modularity and maintainability.
  • Introduced new command-line argument --solution_name to allow dynamic naming of agents, ensuring agent names are unique per solution deployment. [1] [2]

Dependency and Import Refactoring

  • Cleaned up imports in 00_create_sample_data_files.py and 03_cu_process_data_text.py for clarity and removed unused or redundant imports. [1] [2]
  • Moved the import of Azure Search clients in 00_create_sample_data_files.py to the top of the file and removed duplicate imports at the end.

Output and Logging Enhancements

  • Standardized and clarified print/log statements for better readability and debugging, and removed timestamped filenames in favor of consistent output naming. [1] [2] [3] [4] [5]

Documentation and Deployment Script Fixes

  • Corrected the order of parameters in deployment and customization documentation to match the expected script arguments, preventing deployment issues and confusion. [1] [2]

Minor Quality-of-Life Improvements

  • Improved exception handling by catching specific exceptions and providing clearer error messages during SQL and embedding operations. [1] [2]
  • Added spacing and formatting fixes in Python scripts for consistency. [1] [2] [3]

These changes collectively modernize the data ingestion and processing workflow, make the codebase more maintainable, and ensure smoother deployment and operation in Azure environments.

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Deployment and sample/custom dta processing

- Updated 00_create_sample_data_files.py to improve CSV and JSON export functions, ensuring better error handling and code readability.
- Modified 01_create_search_index.py to include additional whitespace for consistency.
- Enhanced 03_cu_process_data_text.py by implementing asynchronous processing for embeddings and agent creation, improving performance and scalability.
- Updated 04_cu_process_custom_data.py to streamline the search index creation process and improve error handling.
- Adjusted requirements.txt to include new agent framework dependencies and ensure compatibility.
- Enhanced process_sample_data.sh and run_create_index_scripts.sh to support new solution_name parameter for better configuration management.
@Pavan-Microsoft Pavan-Microsoft marked this pull request as ready for review December 30, 2025 05:28
@Avijit-Microsoft Avijit-Microsoft merged commit b050197 into km-agentframework-v2 Dec 30, 2025
4 checks passed
@Harmanpreet-Microsoft Harmanpreet-Microsoft deleted the psl-pk-afv2-deploy branch January 2, 2026 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants