Skip to content

Conversation

devin-ai-integration[bot]
Copy link

Make sure to read the contributing guidelines before submitting a PR

Summary

This PR adds foundational architecture support for SNAC (Multi-Scale Neural Audio Codec) decoder to enable Orpheus TTS models in llama.cpp. This addresses issue #208.

Note: This PR contains only the architecture infrastructure and does not include model loading, forward pass implementation, or TTS tool integration. It cannot run SNAC models yet but provides the foundation for those components.

Changes

Architecture Registration

  • Added LLM_ARCH_SNAC_DEC architecture enum and registered "snac-dec" name
  • Added to src/llama-arch.h and src/llama-arch.cpp

Tensor Definitions (27 new tensor types)

Decoder tensors:

  • Input/output convolutions: SNAC_DEC_CONV_IN, SNAC_DEC_CONV_OUT
  • Optional attention: SNAC_DEC_ATTN_NORM, SNAC_DEC_ATTN_Q/K/V/OUT
  • Decoder blocks (4 blocks): SNAC_DEC_BLK_CONV_UP, SNAC_DEC_BLK_CONV1/2/3, SNAC_DEC_BLK_SNAKE_ALPHA

Vector quantizer tensors (4 levels):

  • Projections: SNAC_VQ_IN_PROJ, SNAC_VQ_OUT_PROJ
  • Codebooks: SNAC_VQ_CODEBOOK

Encoder tensors (included for completeness, not needed for TTS inference):

  • Similar structure to decoder with SNAC_ENC_* prefix

Model Conversion

Implemented SnacDecModel class in convert_hf_to_gguf.py:

  • Skips weight normalization parameters (tensors ending in _g or _v)
  • Extracts hyperparameters from config: codebook_size, decoder_rates, latent_dim, decoder_dim
  • Sets vocab to none (audio codec, not text model)
  • Marks as non-causal attention

Documentation

Added docs/SNAC_IMPLEMENTATION.md with:

  • Architecture overview and component descriptions
  • Tensor naming conventions
  • Model conversion instructions
  • TODO list for remaining implementation work
  • Snake activation implementation notes

Review Focus Areas

⚠️ Critical: The SnacDecModel class is missing a @ModelBase.register() decorator. Without this, the conversion class won't be invoked. Need to determine the correct HuggingFace architecture name to register.

Other items to review:

  1. Tensor naming conventions: Match against actual SNAC model checkpoints from HuggingFace (haven't been tested yet)
  2. Weight normalization handling: Verify that skipping _g and _v suffixes is correct for SNAC's weight norm implementation
  3. Encoder tensors: Should these be included given they're not needed for TTS inference?
  4. Hyperparameter defaults: Verify defaults match standard SNAC configs (24kHz model uses these values)
  5. Tensor mappings in C++: Review the llama-arch.cpp mappings - note the use of %d for block indices vs {bid} in Python

Testing Status

Not tested with actual models yet - this is infrastructure-only

To test after merging:

# Download SNAC model
git clone https://huggingface.co/hubertsiuzdak/snac_24khz

# Convert to GGUF (will fail until @ModelBase.register added)
python convert_hf_to_gguf.py snac_24khz --outfile snac-24khz-f16.gguf --outtype f16

Next Steps

Remaining work tracked in docs/SNAC_IMPLEMENTATION.md:

  1. Add @ModelBase.register() decorator to SnacDecModel
  2. Implement model loading in llama-model.cpp
  3. Implement forward pass in llama.cpp (convolutions, Snake activation, attention)
  4. Integrate with TTS tool
  5. Test with Orpheus TTS models

References


Link to Devin run: https://app.devin.ai/sessions/f86c58111acb4011894cbaad18a50e62
Requested by: Jake Cosme ([email protected]) (@jakexcosme)

- Add LLM_ARCH_SNAC_DEC architecture enum and name mapping
- Define 27 SNAC-specific tensor types for decoder and quantizer
- Add tensor name mappings in llama-arch.cpp
- Add SNAC_DEC to gguf constants with tensor enums and mappings
- Implement SnacDecModel class for model conversion
- Add comprehensive SNAC implementation documentation

This provides the foundational architecture support for SNAC audio codec.
Remaining work includes model loading, forward pass, and TTS tool integration.

Addresses issue #208

Co-Authored-By: Jake Cosme <[email protected]>
@devin-ai-integration
Copy link
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added documentation Improvements or additions to documentation python labels Oct 22, 2025
SNAC decoder doesn't use RoPE (it's an audio codec), so add it to the
LLAMA_ROPE_TYPE_NONE case alongside WAVTOKENIZER_DEC.

Co-Authored-By: Jake Cosme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants