Skip to content

Anirud/add all inf server model types#636

Draft
anirudTT wants to merge 8 commits intodevfrom
anirud/add-all-inf-server-model-types
Draft

Anirud/add all inf server model types#636
anirudTT wants to merge 8 commits intodevfrom
anirud/add-all-inf-server-model-types

Conversation

@anirudTT
Copy link
Collaborator

No description provided.

…ndling

- Added a new function `_sync_model_catalog` in `run.py` to regenerate the model catalog from the inference server artifact, improving model management.
- Updated `ContainersView` to include model status from the catalog, enhancing the response data structure.
- Introduced a mapping for backend model types to frontend constants, allowing for better integration and display of model types in the UI.
- Refactored frontend components to utilize the new model type and status information, improving user experience and clarity in model deployment status.
- Enhanced the `FirstStepForm` to group models by status, providing a clearer overview of model compatibility and deployment readiness.
- Introduced `display_model_type` to the `ModelImpl` class for better representation of model types.
- Updated `ContainersView` to include `display_model_type` in the response data structure.
- Enhanced model synchronization logic to incorporate `display_model_type` from the inference server.
- Refactored frontend components to group models by `display_model_type`, improving clarity in model selection and compatibility visualization.
- Updated `SelectionSteps` interface to include `display_model_type` for consistent data handling across components.
- Added `device_id` parameter to `wait_for_frontend_and_open_browser` and updated related functions to support device-specific model deployment.
- Introduced a new `deployment_store.py` for thread-safe JSON file storage of model deployment records, replacing the previous Django ORM model.
- Updated `run_container` function to handle device-specific deployments and maintain a pending record for in-progress deployments.
- Enhanced `DeployView` to accept `device_id` for model deployment requests, improving flexibility in deployment configurations.
- Refactored Docker-related configurations in `docker-compose` files for better readability and maintainability.
- Added new `VoicePipelineView` for handling voice processing workflows, integrating STT, LLM, and TTS functionalities.
- Updated model type configurations to include new types for TTS and VLM, enhancing model management capabilities.
- Improved container matching logic in `update_deploy_cache` to prioritize exact name matches for model implementations, with a fallback to longest substring matches.
- Added `messages_to_prompt` function to convert chat messages into a plain text prompt for model requests.
- Implemented `get_model_name_from_container` to query the vLLM API for the exact model name loaded in a container, enhancing model identification.
- Updated `InferenceView` and `AgentView` to utilize the new model name retrieval and message formatting functions, improving data handling for model requests.
- Refactored service route determination in `map_service_route` to consider model capabilities, ensuring appropriate routing for chat and completion models.
- Enhanced error handling in streaming functions to log HTTP errors more effectively, improving debugging capabilities.
- Changed the path for mounting workflow logs in the Docker Compose file to point to the new artifacts directory, ensuring proper access to deployment logs for the inference server.
- Introduced new endpoints for device state and reset functionality in the backend, allowing for unified device state retrieval and reset operations.
- Updated `SystemResourceService` to include methods for extracting device state and telemetry data, improving the accuracy of device status reporting.
- Refactored frontend components to utilize the new device state context, enhancing the user interface with real-time device status updates and improved error handling.
- Implemented a reset dialog in the frontend to manage device resets, providing users with clear feedback during the reset process.
- Updated routing to include new device state and reset paths, ensuring seamless integration with existing API structures.
- Enhanced error handling and logging throughout the device management process for better debugging and user experience.
- Introduced a new JSON file `models_from_inference_server.json` containing detailed configurations for 60 models, including their names, types, device configurations, inference engines, and environment variables.
- Each model entry includes metadata such as version, docker image, service routes, and parameter counts, enhancing the model management capabilities within the inference server.
- This addition supports improved integration and deployment of various model types, including chat, speech recognition, and image generation.
- Added an exception for the new JSON file `models_from_inference_server.json` to ensure it is tracked by Git, facilitating better management of model configurations within the inference server.
@anirudTT anirudTT marked this pull request as draft February 26, 2026 18:43
@rfatimaTT rfatimaTT marked this pull request as ready for review March 3, 2026 19:27
@anirudTT anirudTT marked this pull request as draft March 10, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet