tts : add SNAC decoder architecture support for Orpheus TTS #318
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
Summary
This PR adds foundational architecture support for SNAC (Multi-Scale Neural Audio Codec) decoder to enable Orpheus TTS models in llama.cpp. This addresses issue #208.
Note: This PR contains only the architecture infrastructure and does not include model loading, forward pass implementation, or TTS tool integration. It cannot run SNAC models yet but provides the foundation for those components.
Changes
Architecture Registration
LLM_ARCH_SNAC_DEC
architecture enum and registered "snac-dec" namesrc/llama-arch.h
andsrc/llama-arch.cpp
Tensor Definitions (27 new tensor types)
Decoder tensors:
SNAC_DEC_CONV_IN
,SNAC_DEC_CONV_OUT
SNAC_DEC_ATTN_NORM
,SNAC_DEC_ATTN_Q/K/V/OUT
SNAC_DEC_BLK_CONV_UP
,SNAC_DEC_BLK_CONV1/2/3
,SNAC_DEC_BLK_SNAKE_ALPHA
Vector quantizer tensors (4 levels):
SNAC_VQ_IN_PROJ
,SNAC_VQ_OUT_PROJ
SNAC_VQ_CODEBOOK
Encoder tensors (included for completeness, not needed for TTS inference):
SNAC_ENC_*
prefixModel Conversion
Implemented
SnacDecModel
class inconvert_hf_to_gguf.py
:_g
or_v
)codebook_size
,decoder_rates
,latent_dim
,decoder_dim
Documentation
Added
docs/SNAC_IMPLEMENTATION.md
with:Review Focus Areas
SnacDecModel
class is missing a@ModelBase.register()
decorator. Without this, the conversion class won't be invoked. Need to determine the correct HuggingFace architecture name to register.Other items to review:
_g
and_v
suffixes is correct for SNAC's weight norm implementationllama-arch.cpp
mappings - note the use of%d
for block indices vs{bid}
in PythonTesting Status
❌ Not tested with actual models yet - this is infrastructure-only
To test after merging:
Next Steps
Remaining work tracked in
docs/SNAC_IMPLEMENTATION.md
:@ModelBase.register()
decorator toSnacDecModel
llama-model.cpp
llama.cpp
(convolutions, Snake activation, attention)References
Link to Devin run: https://app.devin.ai/sessions/f86c58111acb4011894cbaad18a50e62
Requested by: Jake Cosme ([email protected]) (@jakexcosme)