Draft
Conversation
…ndling - Added a new function `_sync_model_catalog` in `run.py` to regenerate the model catalog from the inference server artifact, improving model management. - Updated `ContainersView` to include model status from the catalog, enhancing the response data structure. - Introduced a mapping for backend model types to frontend constants, allowing for better integration and display of model types in the UI. - Refactored frontend components to utilize the new model type and status information, improving user experience and clarity in model deployment status. - Enhanced the `FirstStepForm` to group models by status, providing a clearer overview of model compatibility and deployment readiness.
- Introduced `display_model_type` to the `ModelImpl` class for better representation of model types. - Updated `ContainersView` to include `display_model_type` in the response data structure. - Enhanced model synchronization logic to incorporate `display_model_type` from the inference server. - Refactored frontend components to group models by `display_model_type`, improving clarity in model selection and compatibility visualization. - Updated `SelectionSteps` interface to include `display_model_type` for consistent data handling across components.
- Added `device_id` parameter to `wait_for_frontend_and_open_browser` and updated related functions to support device-specific model deployment. - Introduced a new `deployment_store.py` for thread-safe JSON file storage of model deployment records, replacing the previous Django ORM model. - Updated `run_container` function to handle device-specific deployments and maintain a pending record for in-progress deployments. - Enhanced `DeployView` to accept `device_id` for model deployment requests, improving flexibility in deployment configurations. - Refactored Docker-related configurations in `docker-compose` files for better readability and maintainability. - Added new `VoicePipelineView` for handling voice processing workflows, integrating STT, LLM, and TTS functionalities. - Updated model type configurations to include new types for TTS and VLM, enhancing model management capabilities.
- Improved container matching logic in `update_deploy_cache` to prioritize exact name matches for model implementations, with a fallback to longest substring matches. - Added `messages_to_prompt` function to convert chat messages into a plain text prompt for model requests. - Implemented `get_model_name_from_container` to query the vLLM API for the exact model name loaded in a container, enhancing model identification. - Updated `InferenceView` and `AgentView` to utilize the new model name retrieval and message formatting functions, improving data handling for model requests. - Refactored service route determination in `map_service_route` to consider model capabilities, ensuring appropriate routing for chat and completion models. - Enhanced error handling in streaming functions to log HTTP errors more effectively, improving debugging capabilities.
- Changed the path for mounting workflow logs in the Docker Compose file to point to the new artifacts directory, ensuring proper access to deployment logs for the inference server.
- Introduced new endpoints for device state and reset functionality in the backend, allowing for unified device state retrieval and reset operations. - Updated `SystemResourceService` to include methods for extracting device state and telemetry data, improving the accuracy of device status reporting. - Refactored frontend components to utilize the new device state context, enhancing the user interface with real-time device status updates and improved error handling. - Implemented a reset dialog in the frontend to manage device resets, providing users with clear feedback during the reset process. - Updated routing to include new device state and reset paths, ensuring seamless integration with existing API structures. - Enhanced error handling and logging throughout the device management process for better debugging and user experience.
- Introduced a new JSON file `models_from_inference_server.json` containing detailed configurations for 60 models, including their names, types, device configurations, inference engines, and environment variables. - Each model entry includes metadata such as version, docker image, service routes, and parameter counts, enhancing the model management capabilities within the inference server. - This addition supports improved integration and deployment of various model types, including chat, speech recognition, and image generation.
- Added an exception for the new JSON file `models_from_inference_server.json` to ensure it is tracked by Git, facilitating better management of model configurations within the inference server.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.