Skip to content

Add kokoro support#94

Merged
daavoo merged 7 commits intomainfrom
text-to-speech-model
Jan 17, 2025
Merged

Add kokoro support#94
daavoo merged 7 commits intomainfrom
text-to-speech-model

Conversation

@daavoo
Copy link
Contributor

@daavoo daavoo commented Jan 16, 2025

@daavoo daavoo requested a review from a team January 16, 2025 12:16
@daavoo daavoo marked this pull request as ready for review January 16, 2025 12:16
@stefanfrench
Copy link
Contributor

@daavoo When testing locally, I get ModuleNotFoundError: No module named 'phonemizer'

Do we need to update the dependencies?

@Kostis-S-Z
Copy link
Contributor

Kostis-S-Z commented Jan 16, 2025

Awesome stuff! I dont have many comments on the code.

As Stefan mentioned, there is the issue of updating the dependencies. Especially how to handle the espeak-ng dep.

And then there is another bigger question: How "fully" do we want to integrate kokoro? Or a better phrase is are we fully moving away from Oute and into kokoro? I am asking this due to the following issues:

  1. What should we do regarding dependencies? Should we make outetts optional? Should we completely remove outetts support, as we did with parler and bark?

  2. What should we do with testing? Should we add kokoro to test_model_loaders and test_text_to_speech?

  3. Should we update all the docs references, specifically in step-by-step to reference kokoro instead of oute?

  4. I dont think it makes sense for me to meticulously review the inference/kokoro/ files, as I am hoping we will soon replace them for the pip package 🤞

@daavoo daavoo force-pushed the text-to-speech-model branch from 7dd5199 to 5145045 Compare January 16, 2025 14:58
@Kostis-S-Z
Copy link
Contributor

Also kokoro seems to pad each audio clip with at least 1 second silence and then the final complete podcast has a bit too much silence between speakers. I would update here like this:

        audio_np = stack_audio_segments(
            st.session_state.audio, speech_model.sample_rate, silence_pad=0.0
        )

@daavoo
Copy link
Contributor Author

daavoo commented Jan 16, 2025

What I am thinking is:

  • We integrate kokoro only for demo purposes (until a stable version is released in pypi)
    We handle dependencies either in the notebook or the Dockerfile used for the HF Spaces.
    We don't update README, docs or tests until pypi release.
    We do update references in the demo notebook and app.
  • We keep outetts the default for the rest of the cases.

WDYT @stefanfrench @Kostis-S-Z

@stefanfrench
Copy link
Contributor

What I am thinking is:

  • We integrate kokoro only for demo purposes (until a stable version is released in pypi)
    We handle dependencies either in the notebook or the Dockerfile used for the HF Spaces.
    We don't update README, docs or tests until pypi release.
    We do update references in the demo notebook and app.
  • We keep outetts the default for the rest of the cases.

WDYT @stefanfrench @Kostis-S-Z

Okay I'm comfortable with that

@Kostis-S-Z
Copy link
Contributor

Kostis-S-Z commented Jan 17, 2025

What I am thinking is:

* We integrate `kokoro` only for demo purposes (until a stable version is released in pypi)
  We handle dependencies either in the notebook or the Dockerfile used for the HF Spaces.
  We don't update README, docs or tests until pypi release.
  We do update references in the demo notebook and app.

* We keep `outetts` the default for the rest of the cases.

WDYT @stefanfrench @Kostis-S-Z

Works for me! So you need to revert the changes in demo/app.py, example_data/config.yaml, src/document_to_podcast/cli.py, src/document_to_podcast/config.py and maybe also push this and then merge?

@daavoo daavoo force-pushed the text-to-speech-model branch from 5145045 to 44849c8 Compare January 17, 2025 11:06
@daavoo
Copy link
Contributor Author

daavoo commented Jan 17, 2025

Works for me! So you need to revert the changes in demo/app.py, example_data/config.yaml, src/document_to_podcast/cli.py, src/document_to_podcast/config.py and maybe also push this and then merge?

demo/app.py is used in the HF space, needs to use kokoro there. Rest is done

@daavoo daavoo self-assigned this Jan 17, 2025
Copy link
Contributor

@stefanfrench stefanfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes.

Tested on Colab, local app, and CLI - works as expected. Approved.

@daavoo daavoo merged commit 3e265b1 into main Jan 17, 2025
4 checks passed
@daavoo daavoo deleted the text-to-speech-model branch January 17, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants