Skip to content

Update start_stt.sh to match start_tts.sh#149

Merged
vvolhejn merged 2 commits intokyutai-labs:mainfrom
tleyden:patch-1
Nov 19, 2025
Merged

Update start_stt.sh to match start_tts.sh#149
vvolhejn merged 2 commits intokyutai-labs:mainfrom
tleyden:patch-1

Conversation

@tleyden
Copy link
Copy Markdown
Contributor

@tleyden tleyden commented Oct 30, 2025

Checklist

  • Read CONTRIBUTING.md, and accept the CLA by including the provided snippet. We will not accept PR without this.

I, @tleyden, confirm that I have read and understood the terms of the CLA of Kyutai-labs, as outlined in the repository's CONTRIBUTING.md, and I agree to be bound by these terms. The full CLA is provided as follows:

I, @tleyden, hereby grant to Kyutai-labs a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license to use, modify, distribute, and sublicense my Contributions. I understand and accept that Contributions are limited to modifications, improvements, or changes to the project’s source code submitted via pull requests. I accept that Kyutai-labs has full discretion to review, accept, reject, or request changes to any Contributions I submit, and that submitting a pull request does not guarantee its inclusion in the project. By submitting a Contribution, I grant Kyutai-labs a perpetual, worldwide license to use, modify, reproduce, distribute, and create derivative works based on my Contributions. I also agree to assign all patent rights for any inventions or improvements that arise from my Contributions, giving the Kyutai-labs full rights to file for and enforce patents. I understand that the Kyutai-labs may commercialize, relicense, or exploit the project and my Contributions without further notice or obligation to me. I confirm that my Contributions are original and that I have the legal right to grant this license. If my Contributions include third-party materials, I will ensure that I have the necessary permissions and will disclose this information. I accept that once my Contributions are integrated, they may be altered or removed at the Kyutai-labs’s discretion. I acknowledge that I am making these Contributions voluntarily and will not receive any compensation. Furthermore, I understand that all Contributions, including mine, are provided on an "as-is" basis, with no warranties. By submitting a pull request, I agree to be bound by these terms.

PR Description

While running this on runpod in a dockerless manner, I hit this error:

root@f6aeecd7dbf8:/workspace/unmute# ./dockerless/start_stt.sh
++ dirname ./dockerless/start_stt.sh
+ cd ./dockerless/..
+ export 'CXXFLAGS=-include cstdint'
+ CXXFLAGS='-include cstdint'
+ cargo install --features cuda moshi-server@0.6.4
     Ignored package `moshi-server v0.6.4` is already installed, use --force to override
+ moshi-server worker --config services/moshi-server/configs/stt.toml --port 8090
moshi-server: error while loading shared libraries: libpython3.12.so.1.0: cannot open shared object file: No such file or directory

I noticed that the start_stt.sh script was missing some stuff in start_tts.sh to source the python env, so this PR just copied that in.

It fixed the isuse for me, and now the stt server seems to be working:

+ cargo install --features cuda moshi-server@0.6.4
     Ignored package `moshi-server v0.6.4` is already installed, use --force to override
+ moshi-server worker --config services/moshi-server/configs/stt.toml --port 8090
model.safetensors [00:00:08] [██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 1.84 GiB/1.84 GiB 255.91 MiB/s (0s)
..kenizer_en_fr_audio_8000.model [00:00:00] [██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 117.56 KiB/117.56 KiB - (0s)
..torch-e351c8d8@125.safetensors [00:00:01] [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 366.83 MiB/366.83 MiB 246.60 MiB/s (0s)2025-10-30T20:15:36.822655Z  INFO moshi_server: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.4/src/main.rs:475: build_info=BuildInfo { build_timestamp: "2025-10-30T20:09:58.993193949Z", build_date: "2025-10-30", git_branch: "VERGEN_IDEMPOTENT_OUTPUT", git_timestamp: "VERGEN_IDEMPOTENT_OUTPUT", git_date: "VERGEN_IDEMPOTENT_OUTPUT", git_hash: "VERGEN_IDEMPOTENT_OUTPUT", git_describe: "VERGEN_IDEMPOTENT_OUTPUT", rustc_host_triple: "x86_64-unknown-linux-gnu", rustc_version: "1.91.0", cargo_target_triple: "x86_64-unknown-linux-gnu" }
2025-10-30T20:15:36.822700Z  INFO moshi_server: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.4/src/main.rs:549: starting worker num_workers=13
2025-10-30T20:15:44.599273Z  INFO moshi_server: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.4/src/main.rs:576: listening on http://0.0.0.0:8090
2025-10-30T20:15:44.599647Z  INFO moshi_server::batched_asr: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.4/src/batched_asr.rs:226: warming-up the asr
2025-10-30T20:15:49.577701Z  INFO moshi_server::batched_asr: /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/moshi-server-0.6.4/src/batched_asr.rs:228: starting asr loop 1

@vvolhejn
Copy link
Copy Markdown
Collaborator

Thanks! The reason the text-to-speech needs Python is that the actual model is called from Python and the Rust code is a thin wrapper, but for the speech-to-text it's all Rust. But we still need libpython because it's the same executable, so if you don't have a Python installation at all, it'll fail. Maybe just add a note about this, perhaps:

# We need libpython because the TTS uses a Python component. STT and TTS have the same executable, so we need
# to have libpython even if we don't end up using it. For simplicity, we use the same code as for TTS, even though
# you don't need to install any of these Python packages if you're only using the STT.

@tleyden
Copy link
Copy Markdown
Contributor Author

tleyden commented Oct 31, 2025

Maybe Moshi is looking for a particular version of python? Looks like it expects 3.12: libpython3.12.so.1.0

The default python installed on the machine seems to be python 3.11.

root@f14635edad9f:/workspace/unmute# uname -a
Linux f14635edad9f 6.8.0-71-generic #71-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 22 16:52:38 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
root@f14635edad9f:/workspace/unmute# python
Python 3.11.10 (main, Sep  7 2024, 18:35:41) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
root@f14635edad9f:/workspace/unmute#

@vvolhejn
Copy link
Copy Markdown
Collaborator

vvolhejn commented Nov 3, 2025

I don't understand the Rust codebase that deeply but I don't think it's hardcoded. Searching for "3.12" doesn't reveal anything Rust-related. Could it be that you built the moshi-server binary with a different Python environment than you were then using to run it? The Python environment used is determined at build time, not at run time, and if you don't manually uninstall (I think cargo uninstall moshi-server should do it), it will stick to the Python env from build time.

So for example, if you first run this command with a virtualenv, you'll get a binary that assumes that version of Python, and then if you try to run it without that virtualenv, it'll fail.

I'm happy to merge this fix though, it doesn't hurt and it seems like it can make things more reliable in some cases.

@vvolhejn vvolhejn mentioned this pull request Nov 3, 2025
1 task
@tleyden
Copy link
Copy Markdown
Contributor Author

tleyden commented Nov 3, 2025

So for example, if you first run this command with a virtualenv, you'll get a binary that assumes that version of Python, and then if you try to run it without that virtualenv, it'll fail.

That's probably what happened. I may have been doing some things inside virtualenv, and some things outside of the virtualenv, and it caused some mis-alignment of dependencies.

I do think this change might make the behavior more deterministic, since it forces everything to happen within the context of the virtualenv.

@vvolhejn
Copy link
Copy Markdown
Collaborator

vvolhejn commented Nov 3, 2025

Makes sense. Could you just add the note I mentioned above to clarify it's needed because of the TTS and not the STT?

Thanks for all the PRs!

@tleyden
Copy link
Copy Markdown
Contributor Author

tleyden commented Nov 3, 2025

No problem! Excited to get it running, it's such an incredibly powerful tool.

Yes I'll update the comment, and I'll also remove anything that's not needed. (but will need to retest it first)

I think this might be the only line needed:

source .venv/bin/activate

@vvolhejn
Copy link
Copy Markdown
Collaborator

vvolhejn commented Nov 3, 2025

I think you'll need at least uv venv before that to make sure the venv exists, and I'm not sure how it behaves when a pyproject.toml is not present - but I guess you'll see :)

Updated comments for clarity on Python dependencies and environment setup.
@tleyden
Copy link
Copy Markdown
Contributor Author

tleyden commented Nov 17, 2025

@vvolhejn Can you take another look? I removed the unneeded changes and updated the comment in this commit:

eb0c802

I also verified on a running server.

@vvolhejn
Copy link
Copy Markdown
Collaborator

Thank you!

@vvolhejn vvolhejn merged commit 6ef7869 into kyutai-labs:main Nov 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants