Add contiguity check to try_as_slice() and contiguous() function#310
Open
ymote wants to merge 19 commits intooxiglade:mainfrom
Open
Add contiguity check to try_as_slice() and contiguous() function#310ymote wants to merge 19 commits intooxiglade:mainfrom
ymote wants to merge 19 commits intooxiglade:mainfrom
Conversation
release: 0.25.2
Major changes: - Rename mlx-lm to mlx-rs-lm to avoid confusion with official Python package - Update mlx-c submodule from v0.3.0 to v0.4.1 (MLX v0.30.1) - Fix 128-token performance threshold bug from MLX v0.29.1 Performance optimizations: - Add async evaluation pipelining in Generate iterator (main speedup) - Fix KVCache dtype bug: use input dtype instead of hardcoded f32 - Add periodic mlx_clear_cache() every 256 tokens API changes for MLX v0.30.1: - Update SDPA to use single Array mask instead of VectorArray - Add optional int/dtype helpers for quantization functions - Patch Metal 3.2 for macOS Tahoe beta compatibility - Disable NAX feature to avoid __isPlatformVersionAtLeast link error Results: Rust now achieves ~42.3 tok/s vs Python's ~42.9 tok/s (1.4% gap) Previously: ~40.2 tok/s vs ~42.9 tok/s (6.8% gap)
- Remove unused 'y' field from GenerateState::Decode (was cloned but never used) - Add explicit eval() on first token to match Python behavior Results: Rust now ~2% faster than Python (42.2 vs 41.4 tok/s)
The Rust nn::Rope implementation was reshaping input from [B, H, L, D] to [B*H, L, D] before calling fast::rope. This caused the underlying MLX rope kernel to only process the first "sequence" correctly when L=1 (decode phase), producing zeros for all other attention heads. Python's nn.RoPE does not reshape and passes input directly to mx.fast.rope. This fix aligns Rust behavior with Python. This resolves the Mixtral generation producing garbled output during autoregressive decoding.
Two bugs fixed: 1. Remove duplicate eval() in try_item() (mlx-rs/src/array/mod.rs) - try_item() was calling eval() twice due to copy-paste error - This caused 15% overhead on every .item() call 2. Fix StreamOrDevice::default() to use task-local stream (mlx-rs/src/stream.rs) - StreamOrDevice::default() was using Stream::new() which ignores task-local streams - Changed to use Stream::task_local_or_default() - This enables proper async pipelining with with_new_default_stream() Performance impact (Mixtral-8x7B-4bit): - Before: ~35.6 tok/s - After: ~45.1 tok/s - Improvement: ~27% Rust now achieves parity with Python mlx-lm (~45.6 tok/s, <1% gap).
Add complete voice cloning pipeline without Python dependencies: - Voice cloning API with zero-shot and few-shot modes - Text processing: Chinese G2P (pypinyin), English G2P (CMU dict) - Mixed Chinese/English text support with language segmentation - Models: BERT, HuBERT, T2S (text-to-semantic), VITS vocoder - Audio: WAV I/O, mel spectrogram, resampling - Architecture documentation with SSML roadmap Performance: ~27x faster than Python for mixed text synthesis Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement complete Paraformer-large (220M) model port: - SAN-M encoder (50 layers with FSMN memory) - CIF predictor for non-autoregressive length prediction - Bidirectional decoder (16 layers) - Add MelFrontend for FunASR-compatible feature extraction: - 80-bin mel spectrogram with pre-emphasis (0.97) - LFR stacking (7 frames, stride 6) - CMVN normalization with robust multi-line parser - Deprecate duplicate frontend in audio.rs - Achieve 18.9x real-time on Apple Silicon (RTF 0.053) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix Mistral example to use pre-quantized models (41% faster) - Add proper async pipelining to Mistral Generate iterator - Add GLM4-MoE benchmark example with async pipelining - Add comprehensive performance comparison documentation Benchmark results vs Python mlx-lm: - Qwen3-30B-A3B-4bit (MoE): +0.5% (98.3 vs 97.8 tok/s) - GLM-4.5-Air-3bit (MoE): +5.8% (45.3 vs 42.8 tok/s) - Mixtral-8x7B-4bit (MoE): -3.5% (44.5 vs 46.1 tok/s) - Mistral-7B-4bit: -11% (74.2 vs 83.5 tok/s) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move Mistral from examples/mistral to a proper mistral-mlx crate that shares infrastructure with other -mlx crates via mlx-lm-core. - Add mistral-mlx crate with pre-quantized model support - Add async pipelining in Generate iterator - Add benchmark example achieving 82.8 tok/s (matches Python's 83.5) - Add InvalidConfig error variant to mlx-lm-core Performance improvement: 11% gap -> 0.8% gap vs Python mlx-lm Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Updated benchmark results showing Mistral-7B-4bit achieving 82.8 tok/s with the new mistral-mlx crate using mlx-lm-core shared infrastructure, reducing the gap from -38% to just -0.8% vs Python mlx-lm. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement G2PW (Grapheme-to-Phoneme for Written Chinese) using ONNX Runtime for context-aware polyphonic character pronunciation. Key changes: - Add g2pw.rs module with ONNX Runtime integration - Use BERT-based model for context-aware disambiguation - Integrate G2PW into chinese_g2p() preprocessing pipeline - Add lingua-rs for ML-based language detection The polyphonic character "行" is now correctly pronounced based on context: - 几行代码 → háng (lines of code) - 银行存款 → háng (bank) - 行走江湖 → xíng (walk) - 举行会议 → xíng (hold/conduct) Dependencies: - ort (ONNX Runtime) for model inference - lingua for language detection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, unknown English words like "Rashbass" were spelled out letter by letter (R-A-S-H-B-A-S-S), producing unintelligible speech. Now uses rule-based grapheme-to-phoneme conversion that handles: - Common digraphs: ch, sh, th, ph, ng, etc. - Double consonants: ss, tt, ll, etc. - Vowel patterns: ai, ea, ee, oo, ou, etc. - Magic-e pattern: a_e, i_e, o_e - Context-dependent consonants: c before e/i/y → S Example: "Rashbass" → R AE1 SH B AE1 S (sounds like "RASH-BASS") Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable CoreML execution provider for the G2PW ONNX model, allowing it to run on Apple Silicon GPU and Neural Engine (ANE) instead of CPU. Changes: - Add `coreml` feature to ort dependency - Configure CoreML with ComputeUnits::All (GPU + ANE + CPU) - Add model cache directory for faster subsequent loads - Use NeuralNetwork format for better compatibility This significantly improves G2PW inference performance on Apple Silicon. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add number_to_english_phonemes() for converting numbers like "2050" to spoken English phonemes (e.g., "twenty fifty") - Add number_to_english_word() for number-to-word conversion - Update english_g2p() to handle digits by accumulating and converting them - Update segment_by_language() to include digits in English segments This fixes an issue where English numbers in mixed Chinese-English text (e.g., "The World in 2050") were being silently dropped during TTS. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created MODEL_USER_GUIDE.md covering all available models: - LLM models: Mistral, Mixtral, Qwen3, Qwen3 MoE, GLM-4, GLM-4.5 MoE - Image generation: FLUX.2 Klein, Qwen-Image - Speech/Audio: FunASR Paraformer (ASR), GPT-SoVITS (TTS) Includes usage examples, recommended models, performance tips, and API usage patterns for each model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace custom number_to_english_word() with num2en crate, which is the Rust equivalent of Python's inflect library used by g2p_en. This ensures number pronunciation matches Python exactly: - 2050 → "two thousand fifty" (was "twenty fifty") - 2001 → "two thousand one" (was "twenty one") - 123 → "one hundred twenty-three" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Text normalization fixes for TTS:
1. Strip leading/trailing punctuation in voice_clone.rs
- Trailing commas caused BERT feature misalignment
- Beginning of audio was being skipped (e.g., "从区域上" missed)
- Now strips: , . ! ? , 。 ! ? 、 ; :
2. Improved segment_by_language() in preprocessor.rs
- Digits are now context-dependent:
- "126.4亿斤" stays together as Chinese
- "Room 404" stays together as English
- Looks ahead to determine if number is followed by CJK
3. Decimal and percentage conversion already in place:
- "163.6" → "一百六十三点六"
- "70%" → "百分之七十"
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Performance comparison between Rust MLX and Python MPS: - Rust is 2x faster at synthesis (4.9s vs 9.6s for 91 chars) - Rust runs at 4x realtime, Python at 2.3x realtime - Model loading 79x faster in Rust (49ms vs 3.9s) Updated TTS skill with: - Performance benchmark results and commands - Voice options (doubao, luoxiang) - New fixes: boundary punctuation, number segmentation, num2en Added voice_clone.rs example with multi-voice support. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BREAKING CHANGE: try_as_slice() now returns NotContiguous error for non-contiguous arrays (e.g., after index() or transpose_axes()). Previously, as_slice() would return raw memory ignoring strides, causing silent data corruption when the array was not contiguous. This was a significant source of bugs when working with indexed or transposed arrays. Changes: - Add is_contiguous() method to Array to check memory layout - Add NotContiguous error variant to AsSliceError - Update try_as_slice() to check contiguity before returning slice - Add contiguous() function to make non-contiguous arrays contiguous Migration guide: If your code used as_slice() on indexed/transposed arrays, it was likely producing incorrect results. To fix: 1. Call contiguous() first: let c = ops::contiguous(&arr)?; let slice = c.as_slice::<f32>(); 2. Or use reshape to force contiguity: let n = arr.size() as i32; let c = arr.reshape(&[n])?.reshape(arr.shape())?; let slice = c.as_slice::<f32>(); Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Collaborator
|
@ymote this seems to do more than just what the title suggests |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
is_contiguous()method toArrayto check if the array has contiguous memory layoutNotContiguouserror variant toAsSliceErrortry_as_slice()to check contiguity before returning slice (BREAKING CHANGE)contiguous()function toopsmodule to make non-contiguous arrays contiguousMotivation
Previously,
as_slice()would return raw memory ignoring strides, causing silent data corruption when the array was not contiguous. This was a significant source of bugs when working with indexed or transposed arrays.For example:
The slice would contain elements in the wrong order because
transpose()creates a strided view, butas_slice()ignored the strides.Changes
Array::is_contiguous()Checks if the array is contiguous (row-major/C-style) by verifying strides match expected values.
AsSliceError::NotContiguousNew error variant returned when calling
try_as_slice()on a non-contiguous array.try_as_slice()updateNow checks contiguity after evaluation and returns
NotContiguouserror if the array is not contiguous.ops::contiguous()New function that returns a contiguous copy of the array if needed.
Migration Guide
If your code used
as_slice()on indexed/transposed arrays, it was likely producing incorrect results. To fix:Test Plan
is_contiguous()with various array shapestry_as_slice()returningNotContiguouserrorcontiguous()function🤖 Generated with Claude Code