Merge mlx by filipinascimento · Pull Request #454 · resemble-ai/chatterbox

filipinascimento · 2026-01-31T22:13:04Z

This pull request introduces significant improvements for device compatibility (especially for Apple Silicon/MPS), simplifies dependency management, and enhances robustness in model loading and audio processing. It also adds example code for both TTS and voice conversion and cleans up some code for clarity and reliability.

Device compatibility and model loading improvements:

Updated device detection logic throughout the codebase to better support Apple Silicon (M1/M2/M3/M4) using "mps" and to gracefully fall back to CPU if CUDA or MPS are unavailable (example_for_mac.py, gradio_tts_app.py, gradio_vc_app.py, src/chatterbox/tts.py, src/chatterbox/vc.py). [1] [2] [3] [4] [5] [6]
Improved model loading by ensuring correct use of map_location for both PyTorch and safetensors, preventing device mismatch errors when loading checkpoints on different hardware (src/chatterbox/tts.py, src/chatterbox/vc.py). [1] [2] [3]

Dependency and codebase simplification:

Relaxed version constraints in pyproject.toml for major dependencies (such as numpy, torch, torchaudio, etc.) to improve compatibility and make installation easier.
Removed unused or unnecessary imports and replaced custom or external linear layers with standard nn.Linear in the transformer modules, simplifying the code and reducing dependency on external packages (src/chatterbox/models/s3gen/matcha/transformer.py). [1] [2] [3]

Audio and data processing robustness:

Ensured all audio tensors are explicitly converted to float to prevent type errors during processing in both the tokenizer and voice encoder modules (src/chatterbox/models/s3tokenizer/s3tokenizer.py, src/chatterbox/models/voice_encoder/voice_encoder.py). [1] [2] [3]

Examples and documentation:

Added a comprehensive example_for_mac.py script demonstrating TTS (with default and custom voices) and voice conversion, with clear instructions for Mac users.
Minor import cleanup in example_tts.py for consistency.

Logging and minor fixes:

Added logger warnings alongside print statements for better error tracking in tokenizer and loudness normalization routines (src/chatterbox/tts_turbo.py). [1] [2]
Corrected punctuation normalization logic for better text preprocessing (src/chatterbox/tts.py).
Fixed logic for handling HuggingFace token usage in model downloads (src/chatterbox/tts_turbo.py).

- Add MPS device detection and support for Apple Silicon Macs (M1/M2/M3/M4) - Update model loading to properly handle MPS devices with map_location - Modify TTS and VC classes to support MPS backend - Update example files and Gradio apps to use MPS when available - Add comprehensive Mac example with TTS and VC features - Tested on M4 Pro Mac mini with macOS Sequoia 15.5 This change enables native Metal acceleration for Chatterbox on Apple Silicon Macs, improving performance and reducing memory usage compared to CPU-only operation. Tested on: - Hardware: Mac mini (2024) with Apple M4 Pro - Memory: 64 GB - OS: macOS Sequoia 15.5

- Add MPS device support with proper availability checks - Switch to safetensors format for model loading - Improve device handling for CUDA/CPU/MPS - Add helpful error messages for device availability - Update dependencies in pyproject.toml

…o remove the need for HF token.

…loat in S3Tokenizer, VoiceEncoder, and ChatterboxVC

…ard classes

Copilot

Pull request overview

This PR enhances device compatibility for Apple Silicon (MPS), improves model loading robustness across different hardware platforms, relaxes dependency constraints, simplifies the transformer implementation, and adds comprehensive example code for Mac users. The changes address device detection, safetensors loading with proper map_location handling, audio tensor type conversion, and provide better error handling through logging.

Changes:

Added MPS device detection with CUDA fallback logic in TTS/VC classes and Gradio apps
Improved model loading by properly handling map_location for both PyTorch and safetensors formats
Relaxed version constraints in dependencies and removed LoRACompatibleLinear in favor of standard nn.Linear
Added explicit float conversion for audio tensors and enhanced logging in error paths
Created comprehensive example_for_mac.py with TTS and voice conversion demonstrations

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/chatterbox/vc.py	Enhanced device detection with CUDA fallback and fixed map_location handling for safetensors
src/chatterbox/tts.py	Added device compatibility checks and updated map_location for model loading; attempted punctuation normalization fix
src/chatterbox/tts_turbo.py	Added duplicate logger warnings alongside print statements; fixed HuggingFace token handling
src/chatterbox/models/voice_encoder/voice_encoder.py	Added explicit float conversion for mel spectrograms
src/chatterbox/models/s3tokenizer/s3tokenizer.py	Ensured audio tensors are converted to float during preprocessing
src/chatterbox/models/s3gen/matcha/transformer.py	Replaced LoRACompatibleLinear with standard nn.Linear
pyproject.toml	Relaxed version constraints and added peft dependency
gradio_vc_app.py	Updated device detection to prioritize MPS over CUDA
gradio_tts_app.py	Enhanced device detection and added conditional CUDA seed setting
example_tts.py	Minor import reordering for consistency
example_for_mac.py	Complete rewrite with three comprehensive examples for TTS and voice conversion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-31T22:19:46Z

src/chatterbox/tts.py

+        ("\"", "\""),
+        ("\"", "\""),
+        ("'", "'"),
+        ("'", "'"),


The punctuation normalization is replacing curly quotes with themselves instead of normalizing them to straight quotes. Lines 47-48 replace curly double quotes (U+201C and U+201D) with themselves, and lines 49-50 replace curly single quotes (U+2018 and U+2019) with themselves. This defeats the purpose of normalization. These should be replaced with straight quotes: " and ' respectively.

Suggested change

("\"", "\""),

("\"", "\""),

("'", "'"),

("'", "'"),

("“", "\""),

("”", "\""),

("‘", "'"),

("’", "'"),

Copilot · 2026-01-31T22:19:47Z

pyproject.toml

+    "torch",
+    "torchaudio",
+    "transformers",
+    "peft",


The peft dependency was added to pyproject.toml but there are no imports or usage of peft anywhere in the codebase. This adds an unnecessary dependency. Consider removing it unless it's being used indirectly by other dependencies.

sinjab-ctrl and others added 6 commits June 1, 2025 09:18

Add logging for tokenizer length warning and norm_loudness errors als…

7286d1b

…o remove the need for HF token.

Merge metal-mps-support into merge_mlx

3f141c1

Refactor dependencies in pyproject.toml and ensure tensor types are f…

9c8c94b

…loat in S3Tokenizer, VoiceEncoder, and ChatterboxVC

Replace LoRACompatibleLinear with nn.Linear in SnakeBeta and FeedForw…

daecf07

…ard classes

Copilot AI review requested due to automatic review settings January 31, 2026 22:13

Copilot started reviewing on behalf of filipinascimento January 31, 2026 22:13 View session

Copilot AI reviewed Jan 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge mlx#454

Merge mlx#454
filipinascimento wants to merge 6 commits intoresemble-ai:masterfrom
filipinascimento:merge_mlx

filipinascimento commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 31, 2026

Uh oh!

Copilot AI Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

filipinascimento commented Jan 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants