refactor: Conversation Processing with Agent-Based Topic Mining #710
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request introduces several improvements and refactors to the data processing and deployment scripts, with a focus on making the data processing pipeline more robust, asynchronous, and modular. The most significant changes involve refactoring the conversation processing script to use asynchronous operations for embedding generation, introducing agent creation and invocation for topic mining and mapping, and correcting parameter orders in deployment documentation.
Key changes include:
Data Processing Pipeline Improvements
03_cu_process_data_text.pyto use asynchronous embedding generation withEmbeddingsClientand to process files and insert data into the database and Azure Search using async operations, improving performance and reliability. [1] [2] [3] [4] [5]AIProjectClient, and replaced the synchronous GPT-4 topic mining call with an asynchronous agent-based approach, enhancing modularity and maintainability.--solution_nameto allow dynamic naming of agents, ensuring agent names are unique per solution deployment. [1] [2]Dependency and Import Refactoring
00_create_sample_data_files.pyand03_cu_process_data_text.pyfor clarity and removed unused or redundant imports. [1] [2]00_create_sample_data_files.pyto the top of the file and removed duplicate imports at the end.Output and Logging Enhancements
Documentation and Deployment Script Fixes
Minor Quality-of-Life Improvements
These changes collectively modernize the data ingestion and processing workflow, make the codebase more maintainable, and ensure smoother deployment and operation in Azure environments.
Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Deployment and sample/custom dta processing