refactor: Conversation Processing with Agent-Based Topic Mining #710

Pavan-Microsoft · 2025-12-29T11:49:15Z

Purpose

This pull request introduces several improvements and refactors to the data processing and deployment scripts, with a focus on making the data processing pipeline more robust, asynchronous, and modular. The most significant changes involve refactoring the conversation processing script to use asynchronous operations for embedding generation, introducing agent creation and invocation for topic mining and mapping, and correcting parameter orders in deployment documentation.

Key changes include:

Data Processing Pipeline Improvements

Refactored 03_cu_process_data_text.py to use asynchronous embedding generation with EmbeddingsClient and to process files and insert data into the database and Azure Search using async operations, improving performance and reliability. [1] [2] [3] [4] [5]
Added async agent creation for topic mining and mapping using AIProjectClient, and replaced the synchronous GPT-4 topic mining call with an asynchronous agent-based approach, enhancing modularity and maintainability.
Introduced new command-line argument --solution_name to allow dynamic naming of agents, ensuring agent names are unique per solution deployment. [1] [2]

Dependency and Import Refactoring

Cleaned up imports in 00_create_sample_data_files.py and 03_cu_process_data_text.py for clarity and removed unused or redundant imports. [1] [2]
Moved the import of Azure Search clients in 00_create_sample_data_files.py to the top of the file and removed duplicate imports at the end.

Output and Logging Enhancements

Standardized and clarified print/log statements for better readability and debugging, and removed timestamped filenames in favor of consistent output naming. [1] [2] [3] [4] [5]

Documentation and Deployment Script Fixes

Corrected the order of parameters in deployment and customization documentation to match the expected script arguments, preventing deployment issues and confusion. [1] [2]

Minor Quality-of-Life Improvements

Improved exception handling by catching specific exceptions and providing clearer error messages during SQL and embedding operations. [1] [2]
Added spacing and formatting fixes in Python scripts for consistency. [1] [2] [3]

These changes collectively modernize the data ingestion and processing workflow, make the codebase more maintainable, and ensure smoother deployment and operation in Azure environments.

Does this introduce a breaking change?

Yes
No

Golden Path Validation

I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Deployment and sample/custom dta processing

- Updated 00_create_sample_data_files.py to improve CSV and JSON export functions, ensuring better error handling and code readability. - Modified 01_create_search_index.py to include additional whitespace for consistency. - Enhanced 03_cu_process_data_text.py by implementing asynchronous processing for embeddings and agent creation, improving performance and scalability. - Updated 04_cu_process_custom_data.py to streamline the search index creation process and improve error handling. - Adjusted requirements.txt to include new agent framework dependencies and ensure compatibility. - Enhanced process_sample_data.sh and run_create_index_scripts.sh to support new solution_name parameter for better configuration management.

…using names

…clarity

…data

…processing scripts

…scripts

…proved readability

Pavan-Microsoft added 9 commits December 23, 2025 13:27

fix: Update script variables and enhance processing with agent framework

1eec912

fix: Correct agent deletion by using agent names instead of IDs

ca03047

fix: Refine topic mapping instructions and improve agent deletion by …

5d244e2

…using names

fix: Update topic guidelines to include additional common topics for …

78e5e5e

…clarity

fix: Enhance topic mining by extracting common topics from processed …

0a802ea

…data

fix: Improve exception handling and clean up code formatting in data …

08efc7e

…processing scripts

fix: Correct parameter order in configuration examples for clarity

c8af804

fix: Simplify log message for custom data processing script

fa5ea9b

Pavan-Microsoft requested review from Avijit-Microsoft, Prajwal-Microsoft, Roopan-Microsoft, Vinay-Microsoft, aniaroramsft, brittneek, dgp10801, nchandhi and toherman-msft as code owners December 29, 2025 11:49

Pavan-Microsoft marked this pull request as draft December 29, 2025 11:58

Pavan-Microsoft added 4 commits December 29, 2025 20:15

fix: Implement batch insert for processed records in data processing …

cb74296

…scripts

fix: Update agent deletion to use delete_version method for consistency

b59f6ea

fix: Remove unnecessary blank lines in data processing scripts for im…

97d39ad

…proved readability

fix: Simplify query formatting in data processing scripts for clarity

c75ac0e

Pavan-Microsoft marked this pull request as ready for review December 30, 2025 05:28

Avijit-Microsoft approved these changes Dec 30, 2025

View reviewed changes

Avijit-Microsoft merged commit b050197 into km-agentframework-v2 Dec 30, 2025
4 checks passed

Harmanpreet-Microsoft deleted the psl-pk-afv2-deploy branch January 2, 2026 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Conversation Processing with Agent-Based Topic Mining #710

refactor: Conversation Processing with Agent-Based Topic Mining #710

Uh oh!

Pavan-Microsoft commented Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants