Skip to content

Anirud/muti chip demos#667

Open
anirudTT wants to merge 15 commits intodevfrom
anirud/muti-chip-demos
Open

Anirud/muti chip demos#667
anirudTT wants to merge 15 commits intodevfrom
anirud/muti-chip-demos

Conversation

@anirudTT
Copy link
Collaborator

@anirudTT anirudTT commented Mar 2, 2026

No description provided.

…ndling

- Added a new function `_sync_model_catalog` in `run.py` to regenerate the model catalog from the inference server artifact, improving model management.
- Updated `ContainersView` to include model status from the catalog, enhancing the response data structure.
- Introduced a mapping for backend model types to frontend constants, allowing for better integration and display of model types in the UI.
- Refactored frontend components to utilize the new model type and status information, improving user experience and clarity in model deployment status.
- Enhanced the `FirstStepForm` to group models by status, providing a clearer overview of model compatibility and deployment readiness.
- Introduced `display_model_type` to the `ModelImpl` class for better representation of model types.
- Updated `ContainersView` to include `display_model_type` in the response data structure.
- Enhanced model synchronization logic to incorporate `display_model_type` from the inference server.
- Refactored frontend components to group models by `display_model_type`, improving clarity in model selection and compatibility visualization.
- Updated `SelectionSteps` interface to include `display_model_type` for consistent data handling across components.
- Added `device_id` parameter to `wait_for_frontend_and_open_browser` and updated related functions to support device-specific model deployment.
- Introduced a new `deployment_store.py` for thread-safe JSON file storage of model deployment records, replacing the previous Django ORM model.
- Updated `run_container` function to handle device-specific deployments and maintain a pending record for in-progress deployments.
- Enhanced `DeployView` to accept `device_id` for model deployment requests, improving flexibility in deployment configurations.
- Refactored Docker-related configurations in `docker-compose` files for better readability and maintainability.
- Added new `VoicePipelineView` for handling voice processing workflows, integrating STT, LLM, and TTS functionalities.
- Updated model type configurations to include new types for TTS and VLM, enhancing model management capabilities.
- Improved container matching logic in `update_deploy_cache` to prioritize exact name matches for model implementations, with a fallback to longest substring matches.
- Added `messages_to_prompt` function to convert chat messages into a plain text prompt for model requests.
- Implemented `get_model_name_from_container` to query the vLLM API for the exact model name loaded in a container, enhancing model identification.
- Updated `InferenceView` and `AgentView` to utilize the new model name retrieval and message formatting functions, improving data handling for model requests.
- Refactored service route determination in `map_service_route` to consider model capabilities, ensuring appropriate routing for chat and completion models.
- Enhanced error handling in streaming functions to log HTTP errors more effectively, improving debugging capabilities.
- Changed the path for mounting workflow logs in the Docker Compose file to point to the new artifacts directory, ensuring proper access to deployment logs for the inference server.
- Introduced new endpoints for device state and reset functionality in the backend, allowing for unified device state retrieval and reset operations.
- Updated `SystemResourceService` to include methods for extracting device state and telemetry data, improving the accuracy of device status reporting.
- Refactored frontend components to utilize the new device state context, enhancing the user interface with real-time device status updates and improved error handling.
- Implemented a reset dialog in the frontend to manage device resets, providing users with clear feedback during the reset process.
- Updated routing to include new device state and reset paths, ensuring seamless integration with existing API structures.
- Enhanced error handling and logging throughout the device management process for better debugging and user experience.
- Introduced a new JSON file `models_from_inference_server.json` containing detailed configurations for 60 models, including their names, types, device configurations, inference engines, and environment variables.
- Each model entry includes metadata such as version, docker image, service routes, and parameter counts, enhancing the model management capabilities within the inference server.
- This addition supports improved integration and deployment of various model types, including chat, speech recognition, and image generation.
- Added an exception for the new JSON file `models_from_inference_server.json` to ensure it is tracked by Git, facilitating better management of model configurations within the inference server.
- Introduced a new `ChipSlotAllocator` class to manage automatic chip slot allocation based on current deployments and model requirements.
- Added a `ChipStatusView` API endpoint to retrieve current chip slot occupancy status.
- Implemented a new `ChipConfigStep` component in the frontend for users to select chip configurations during model deployment.
- Enhanced existing views and components to support chip allocation logic, including error handling for multi-chip conflicts.
- Updated routing and state management to accommodate the new chip allocation features, improving overall deployment flexibility and user experience.
- Added unique service port assignment per chip slot in the `run_container` function to prevent port conflicts during multi-chip deployments.
- Updated the `ModelDeployment` record creation to reflect the actual service port used, improving deployment accuracy.
- Integrated `device_id` into various frontend components, including models table and deployed card, to display chip slot information.
- Modified polling intervals for chip status updates in the frontend to every 7 minutes, optimizing performance and reducing unnecessary API calls.
- Enhanced the `ChipConfigStep` and `ModelsDeployedCard` components to visualize chip status and support multi-chip board configurations, improving user experience.
- Added unique service port assignment per chip slot in the `run_container` function to prevent port conflicts during multi-chip deployments.
- Updated `ModelDeployment` record creation to reflect the actual service port used, improving deployment accuracy.
- Implemented a cleanup function for stale 'starting' records in the health monitor to prevent blocking chip slots.
- Enhanced the frontend to display chip slot status, including occupied details and available slots, improving user experience during model deployment.
- Refactored deployment logic to ensure accurate slot management and error handling for ongoing deployments.
- Added app/.env-old to the .gitignore file to prevent tracking of old environment configuration files, ensuring cleaner version control and reducing clutter in the repository.
- Updated `setup_tt_inference_server` to include a `--pull-branch` option for re-downloading artifacts from the configured branch.
- Added a new `OpenAIAudioSpeechView` to support OpenAI-compatible audio/speech requests, allowing for both enqueue-style and direct audio retrieval.
- Implemented fallback logic in `TtsInferenceView` to retry with the `/v1/audio/speech` endpoint when the initial enqueue request returns a 404 error.
- Enhanced health check responses to include a "starting" status for models that are still loading.
- Updated frontend components to integrate TTS functionality, including a new TTS inference API call and navigation links for TTS features.
- Added tests for fallback behavior in TTS inference views to ensure reliability during model loading scenarios.
- Updated the TTSDemo component to include a download feature for generated audio files.
- Implemented keyboard shortcuts for generating audio and adding new lines in the text area.
- Enhanced the layout with a new Card component and improved animations using framer-motion.
- Improved accessibility and user experience with updated labels and instructions for text input and model selection.
- Refactored the overall structure for better readability and maintainability.
…stamp

- Introduced manual compatibility overrides for specific model names to ensure they are always shown as compatible, even when device configurations do not match.
- Updated the `models_from_inference_server.json` file to reflect the new generation timestamp, ensuring accurate tracking of model configurations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant