Skip to content

Conversation

@MovLab2
Copy link

@MovLab2 MovLab2 commented Sep 28, 2025

Kokoro TTS Integration Improvements

Fixed Issues:

  • ✅ Torch 2.6 compatibility for Kokoro text-to-speech
  • ✅ JoyVASA model loading with weights_only=False fix
  • ✅ Hubert attention conflict resolution
  • ✅ Motion extractor output processing

New Features:

  • 📥 Added Kokoro model download helper script
  • 🎯 Improved error handling and compatibility
  • 🔧 Ready-to-use TTS integration

Testing:

  • Verified with 54 voice models
  • Full audio-to-animation pipeline working

Your Name added 3 commits September 28, 2025 01:30
Key fixes:
- JoyVASA pipeline: Add weights_only=False for torch 2.6 compatibility
- Hubert model: Resolve attention configuration conflict with SDPA
- Motion extractor: Restore working version with proper dimensions

Enables full text-to-video pipeline:
- Text  Kokoro audio  Hubert features  Mouth animation  Final video
- Supports 54 voices via Kokoro-82M model

Tested and working with torch 2.6.0+cu118
Core fixes:
- Fixed pasteback for realtime mode (removed 'not realtime' condition)
- Added proper mask creation for realtime pasteback operations
- Auto-resize large source images (1968x1968 -> 512x512) for performance
- Fixed pipe.run() parameter order (image, img_src, src_info)
- Safe display fallbacks and performance optimizations

Config improvements:
- Optimized realtime settings in trt_infer.yaml for better performance
- Enabled pasteback, stitching, and relative motion by default

Result: Now shows complete animated portrait instead of cropped face with much better realtime performance across both camera and Gradio interfaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant