Skip to content

Add Raven-avatar support to SillyTavern #4

@Technologicat

Description

@Technologicat

It would be particularly interesting to introduce Raven-avatar to replace the discontinued Talkinghead mode of the Character expressions extension in SillyTavern. Basically, Raven-avatar is an upgraded version of that, with many new features.

Since SillyTavern-Extras (which provided Talkinghead) was discontinued, the Talkinghead feature has been removed from the SillyTavern frontend.

In SillyTavern/SillyTavern#4034, it was decided that if the new avatar is added to SillyTavern, this should be done as a new extension.

  • Minimally, the ST extension needs:

    • A canvas to render the video stream in, aligned to the bottom edge of the window (since the avatar is a cowboy shot)
      • Get the image size from the video stream, it depends on upscaler settings.
    • Start, pause, resume, stop
      • Like the final version of Talkinghead, pause when the ST window is minimized, and resume when no longer minimized, to save GPU when the avatar is not visible.
    • Character emotion driver.
      • Trigger when more LLM output appears. Call the server's classify API to detect the AI character's emotion from recent LLM output (maybe the few last sentences). Then via the avatar API, set that emotion to the avatar.
      • The old Talkinghead used to auto-update its emotion whenever classify was called, but in Raven-avatar, these have been separated, for feature orthogonality. Also, we now support multiple avatar sessions simultaneously, so it's better to be explicit about which one to update.
    • Support for ST's /emote command (set the character's emotion manually)
    • GUI to adjust lipsync time offset (to allow the user to adjust AV sync as needed)
    • A settings GUI to select the character, the postprocessor settings, and the emotion templates.
  • For cel blending and animefx, the extension needs to automatically send also the character's extra cels.

    Since a JS client can't scan for additional files (like the Python client does), maybe we need to pack the character and its extra cels into a single zip file. This needs some support on the server side to receive zipped character packages, but Python has zip libraries available, so this should be easy. Even when the avatar is used with the Python client, a zip file would be a cleaner way to distribute ready-made characters.

    For bonus points, we could allow optionally including also the postprocessor settings and the emotion templates in the character package.

    I think all the rest (actually configuring the postprocessor and customizing the emotion templates) can be done using raven-avatar-settings-editor and raven-avatar-pose-editor, so we don't need to replicate those in JS.

  • The TTS feature of the SillyTavern frontend needs to co-operate with the avatar's lipsync mode.

    Lipsync requires timestamped phoneme data. Currently, the avatar can only get that from the Kokoro-82M TTS that is also served by Raven-server.

    The lipsync driver that actually controls the avatar's mouth is implemented on the client side (in raven.client.tts), because that is where the audio playback occurs.

    I'm open to the idea of extending lipsync to support other TTS implementations, such as the newer, smaller and faster Kitten-TTS (but note Timestamps KittenML/KittenTTS#14), or to various RVC models. We could even extend this to support phoneme detection from bare speech audio (so that lipsync could work with arbitrary voice audio sources, not just TTS). But I don't have the development resources to do any of that, either.

    Long story short: when the avatar is enabled, SillyTavern should call the new tts_speak_lipsynced (in Raven's JS bindings, discussed above) instead of calling its TTS service normally. (It doesn't matter if the avatar is paused.) For a Python example, see how raven.avatar.settings_editor_app uses tts_speak_lipsynced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions