add readme and docs

mm65x · mm65x · commit 5e895fce987a · 2026-03-08T13:37:18.000Z
diff --git a/README.md b/README.md
@@ -100,6 +100,7 @@ for result in model.generate("Hello from MLX-Audio!", voice="af_heart"):
 | **Voxtral** | Mistral's speech model | Multiple | [mlx-community/Voxtral-Mini-3B-2507-bf16](https://huggingface.co/mlx-community/Voxtral-Mini-3B-2507-bf16) |
 | **Voxtral Realtime** | Mistral's 4B streaming STT | Multiple | [4bit](https://huggingface.co/mlx-community/Voxtral-Mini-4B-Realtime-2602-4bit), [fp16](https://huggingface.co/mlx-community/Voxtral-Mini-4B-Realtime-2602-fp16) |
 | **VibeVoice-ASR** | Microsoft's 9B ASR with diarization & timestamps | Multiple | [mlx-community/VibeVoice-ASR-bf16](https://huggingface.co/mlx-community/VibeVoice-ASR-bf16) |
+| **Moonshine** | Useful Sensors' lightweight ASR | EN | [README](mlx_audio/stt/models/moonshine/README.md) |
 
 
 ### Voice Activity Detection / Speaker Diarization (VAD)
diff --git a/mlx_audio/stt/models/moonshine/README.md b/mlx_audio/stt/models/moonshine/README.md
@@ -0,0 +1,29 @@
+# Moonshine
+
+MLX implementation of Useful Sensors' Moonshine, a lightweight ASR model that processes raw audio through a learned conv frontend rather than mel spectrograms.
+
+## Available Models
+
+| Model | Parameters | Description |
+|-------|------------|-------------|
+| [UsefulSensors/moonshine-tiny](https://huggingface.co/UsefulSensors/moonshine-tiny) | 27M | Smallest variant |
+| [UsefulSensors/moonshine-base](https://huggingface.co/UsefulSensors/moonshine-base) | 61M | Larger, more accurate |
+
+## Python Usage
+
+```python
+from mlx_audio.stt import load
+
+model = load("UsefulSensors/moonshine-tiny")
+
+result = model.generate("audio.wav")
+print(result.text)
+```
+
+## Architecture
+
+- 3 layer conv frontend (strides 64, 3, 2) with GroupNorm
+- Transformer encoder with RoPE (6 layers tiny, 8 layers base)
+- Transformer decoder with cross attention and SwiGLU (6 layers tiny, 8 layers base)
+- Byte level BPE tokenizer (32k vocab)
+- 16kHz raw audio input (no mel spectrogram)