Skip to content

Merge mlx#454

Open
filipinascimento wants to merge 6 commits intoresemble-ai:masterfrom
filipinascimento:merge_mlx
Open

Merge mlx#454
filipinascimento wants to merge 6 commits intoresemble-ai:masterfrom
filipinascimento:merge_mlx

Conversation

@filipinascimento
Copy link

This pull request introduces significant improvements for device compatibility (especially for Apple Silicon/MPS), simplifies dependency management, and enhances robustness in model loading and audio processing. It also adds example code for both TTS and voice conversion and cleans up some code for clarity and reliability.

Device compatibility and model loading improvements:

  • Updated device detection logic throughout the codebase to better support Apple Silicon (M1/M2/M3/M4) using "mps" and to gracefully fall back to CPU if CUDA or MPS are unavailable (example_for_mac.py, gradio_tts_app.py, gradio_vc_app.py, src/chatterbox/tts.py, src/chatterbox/vc.py). [1] [2] [3] [4] [5] [6]
  • Improved model loading by ensuring correct use of map_location for both PyTorch and safetensors, preventing device mismatch errors when loading checkpoints on different hardware (src/chatterbox/tts.py, src/chatterbox/vc.py). [1] [2] [3]

Dependency and codebase simplification:

  • Relaxed version constraints in pyproject.toml for major dependencies (such as numpy, torch, torchaudio, etc.) to improve compatibility and make installation easier.
  • Removed unused or unnecessary imports and replaced custom or external linear layers with standard nn.Linear in the transformer modules, simplifying the code and reducing dependency on external packages (src/chatterbox/models/s3gen/matcha/transformer.py). [1] [2] [3]

Audio and data processing robustness:

  • Ensured all audio tensors are explicitly converted to float to prevent type errors during processing in both the tokenizer and voice encoder modules (src/chatterbox/models/s3tokenizer/s3tokenizer.py, src/chatterbox/models/voice_encoder/voice_encoder.py). [1] [2] [3]

Examples and documentation:

  • Added a comprehensive example_for_mac.py script demonstrating TTS (with default and custom voices) and voice conversion, with clear instructions for Mac users.
  • Minor import cleanup in example_tts.py for consistency.

Logging and minor fixes:

  • Added logger warnings alongside print statements for better error tracking in tokenizer and loudness normalization routines (src/chatterbox/tts_turbo.py). [1] [2]
  • Corrected punctuation normalization logic for better text preprocessing (src/chatterbox/tts.py).
  • Fixed logic for handling HuggingFace token usage in model downloads (src/chatterbox/tts_turbo.py).

sinjab-ctrl and others added 6 commits June 1, 2025 09:18
- Add MPS device detection and support for Apple Silicon Macs (M1/M2/M3/M4)

- Update model loading to properly handle MPS devices with map_location

- Modify TTS and VC classes to support MPS backend

- Update example files and Gradio apps to use MPS when available

- Add comprehensive Mac example with TTS and VC features

- Tested on M4 Pro Mac mini with macOS Sequoia 15.5

This change enables native Metal acceleration for Chatterbox on Apple Silicon Macs,

improving performance and reducing memory usage compared to CPU-only operation.

Tested on:

- Hardware: Mac mini (2024) with Apple M4 Pro

- Memory: 64 GB

- OS: macOS Sequoia 15.5
- Add MPS device support with proper availability checks

- Switch to safetensors format for model loading

- Improve device handling for CUDA/CPU/MPS

- Add helpful error messages for device availability

- Update dependencies in pyproject.toml
…loat in S3Tokenizer, VoiceEncoder, and ChatterboxVC
Copilot AI review requested due to automatic review settings January 31, 2026 22:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances device compatibility for Apple Silicon (MPS), improves model loading robustness across different hardware platforms, relaxes dependency constraints, simplifies the transformer implementation, and adds comprehensive example code for Mac users. The changes address device detection, safetensors loading with proper map_location handling, audio tensor type conversion, and provide better error handling through logging.

Changes:

  • Added MPS device detection with CUDA fallback logic in TTS/VC classes and Gradio apps
  • Improved model loading by properly handling map_location for both PyTorch and safetensors formats
  • Relaxed version constraints in dependencies and removed LoRACompatibleLinear in favor of standard nn.Linear
  • Added explicit float conversion for audio tensors and enhanced logging in error paths
  • Created comprehensive example_for_mac.py with TTS and voice conversion demonstrations

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/chatterbox/vc.py Enhanced device detection with CUDA fallback and fixed map_location handling for safetensors
src/chatterbox/tts.py Added device compatibility checks and updated map_location for model loading; attempted punctuation normalization fix
src/chatterbox/tts_turbo.py Added duplicate logger warnings alongside print statements; fixed HuggingFace token handling
src/chatterbox/models/voice_encoder/voice_encoder.py Added explicit float conversion for mel spectrograms
src/chatterbox/models/s3tokenizer/s3tokenizer.py Ensured audio tensors are converted to float during preprocessing
src/chatterbox/models/s3gen/matcha/transformer.py Replaced LoRACompatibleLinear with standard nn.Linear
pyproject.toml Relaxed version constraints and added peft dependency
gradio_vc_app.py Updated device detection to prioritize MPS over CUDA
gradio_tts_app.py Enhanced device detection and added conditional CUDA seed setting
example_tts.py Minor import reordering for consistency
example_for_mac.py Complete rewrite with three comprehensive examples for TTS and voice conversion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +50
("\"", "\""),
("\"", "\""),
("'", "'"),
("'", "'"),
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The punctuation normalization is replacing curly quotes with themselves instead of normalizing them to straight quotes. Lines 47-48 replace curly double quotes (U+201C and U+201D) with themselves, and lines 49-50 replace curly single quotes (U+2018 and U+2019) with themselves. This defeats the purpose of normalization. These should be replaced with straight quotes: " and ' respectively.

Suggested change
("\"", "\""),
("\"", "\""),
("'", "'"),
("'", "'"),
("", "\""),
("", "\""),
("", "'"),
("", "'"),

Copilot uses AI. Check for mistakes.
"torch",
"torchaudio",
"transformers",
"peft",
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peft dependency was added to pyproject.toml but there are no imports or usage of peft anywhere in the codebase. This adds an unnecessary dependency. Consider removing it unless it's being used indirectly by other dependencies.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants