Add Raven-avatar support to SillyTavern

It would be particularly interesting to introduce *Raven-avatar* to replace the discontinued *Talkinghead* mode of the *Character expressions* extension in *SillyTavern*. Basically, *Raven-avatar* **is** an upgraded version of that, with many new features.

Since *SillyTavern-Extras* (which provided *Talkinghead*) was discontinued, the *Talkinghead* feature has been removed from the *SillyTavern* frontend.

In https://github.com/SillyTavern/SillyTavern/issues/4034, it was decided that if the new avatar is added to *SillyTavern*, this should be done as a new extension.

- Minimally, the ST extension needs:
  - A canvas to render the video stream in, aligned to the bottom edge of the window (since the avatar is a [cowboy shot](https://danbooru.donmai.us/wiki_pages/cowboy_shot))
    - Get the image size from the video stream, it depends on upscaler settings.
  - Start, pause, resume, stop
    - Like the final version of *Talkinghead*, pause when the ST window is minimized, and resume when no longer minimized, to save GPU when the avatar is not visible.
  - Character emotion driver.
    - Trigger when more LLM output appears. Call the server's `classify` API to detect the AI character's emotion from recent LLM output (maybe the few last sentences). Then via the `avatar` API, set that emotion to the avatar.
    - The old *Talkinghead* used to auto-update its emotion whenever `classify` was called, but in *Raven-avatar*, these have been separated, for feature orthogonality. Also, we now support multiple avatar sessions simultaneously, so it's better to be explicit about which one to update.
  - Support for ST's `/emote` command (set the character's emotion manually)
  - GUI to adjust lipsync time offset (to allow the user to adjust AV sync as needed)
  - A settings GUI to select the character, the postprocessor settings, and the emotion templates.
 
- For cel blending and animefx, the extension needs to automatically send also the character's extra cels.

  Since a JS client can't scan for additional files (like the Python client does), maybe we need to pack the character and its extra cels into a single zip file. This needs some support on the server side to receive zipped character packages, but Python has zip libraries available, so this should be easy. Even when the avatar is used with the Python client, a zip file would be a cleaner way to distribute ready-made characters.

  For bonus points, we could allow optionally including also the postprocessor settings and the emotion templates in the character package.

  I think all the rest (actually configuring the postprocessor and customizing the emotion templates) can be done using `raven-avatar-settings-editor` and `raven-avatar-pose-editor`, so we don't need to replicate those in JS.

- The TTS feature of the *SillyTavern* frontend needs to co-operate with the avatar's lipsync mode.

  Lipsync requires timestamped phoneme data. Currently, the avatar can only get that from the [Kokoro-82M](https://github.com/hexgrad/kokoro) TTS that is also served by *Raven-server*.

  The lipsync driver that actually controls the avatar's mouth is implemented on the client side (in [`raven.client.tts`](https://github.com/Technologicat/raven/blob/main/raven/client/tts.py)), because that is where the audio playback occurs.

  I'm open to the idea of extending lipsync to support other TTS implementations, such as the newer, smaller and faster [Kitten-TTS](https://github.com/KittenML/KittenTTS) (but note https://github.com/KittenML/KittenTTS/issues/14), or to various RVC models. We could even extend this to support phoneme detection from bare speech audio (so that lipsync could work with arbitrary voice audio sources, not just TTS). But I don't have the development resources to do any of that, either.

  Long story short: when the avatar is enabled, *SillyTavern* should call the new `tts_speak_lipsynced` (in Raven's JS bindings, discussed above) instead of calling its TTS service normally. (It doesn't matter if the avatar is paused.) For a Python example, see how [`raven.avatar.settings_editor_app`](https://github.com/Technologicat/raven/blob/main/raven/avatar/settings_editor/app.py) uses `tts_speak_lipsynced`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Raven-avatar support to SillyTavern #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Raven-avatar support to SillyTavern #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions