@@ -4,6 +4,22 @@ Transform slides and speaker notes into video.
44
55[ ![ Demo video] ( https://transformrs.github.io/trv/demo.png )] ( https://transformrs.github.io/trv/demo.mp4 )
66
7+ ## Features
8+
9+ - 🔒 Fully offline generation of audio via the Kokoro text-to-speech model.
10+ - 🛠️ Version control friendly - store your video source in git.
11+ - 🚀 Caching of audio files to avoid redundant API calls.
12+ - 🚀 Caching of video files for quick re-builds.
13+ - 🚀 A development mode with a built-in web server for fast feedback.
14+ - 🌐 Support for multiple languages and voices.
15+ - 🚀 Small file sizes for easy sharing and hosting.
16+
17+ ## Installation
18+
19+ ``` raw
20+ $ cargo install trv
21+ ```
22+
723## Usage
824
925This tool is designed to work with [ Typst] ( https://github.com/typst/typst ) presentations.
@@ -29,7 +45,15 @@ To create a video, create a Typst presentation with speaker notes (we show only
2945]
3046```
3147
32- Next, run the following command:
48+ Next, we can work on the video with the following command:
49+
50+ ``` raw
51+ $ trv watch examples/first.typ
52+ ```
53+
54+ This will start a local web server that will automatically update the video as you make changes to the presentation.
55+
56+ Once everything looks good, we can build the final video with the following command:
3357
3458``` raw
3559$ trv build examples/first.typ
@@ -63,10 +87,10 @@ $ trv --input=presentation.typ
6387
6488
6589To create a video without an API key nor an internet connection, you can self-host [ Kokoros] ( https://github.com/lucasjinreal/Kokoros ) .
66- See the [ Offline section] ( #offline ) for more information.
90+ See the [ Kokoros section] ( #kokoros ) for more information.
6791Or for a state-of-the-art model with voice cloning capabilities, see the [ Zyphra Zonos section] ( #zyphra-zonos ) .
6892
69- ## Offline
93+ ## Kokoros
7094
7195To use Kokoros locally, the easiest way is to use the Docker image.
7296
@@ -82,13 +106,26 @@ $ docker run -it --rm -p 3000:3000 kokoros openai
82106
83107Then, you can use the Docker image as the provider:
84108
109+ ``` typ
110+ #import "@preview/polylux:0.4.0": *
111+
112+ // --- trv config:
113+ // provider = "openai-compatible(localhost:3000)"
114+ // model = "tts-1"
115+ // voice = "af_sky"
116+ // audio_format = "wav"
117+ // ---
118+
119+ ...
120+ ```
121+
85122``` raw
86- $ trv --input= presentation.typ --provider=openai-compatible(localhost:3000)
123+ $ trv build presentation.typ
87124```
88125
89- ## Via Google
126+ ## Google
90127
91- Google has some high-quality voices available via their API:
128+ My favourite text-to-speech engine is the one from Google.
92129
93130``` raw
94131$ export GOOGLE_KEY="<YOUR KEY>"
@@ -98,11 +135,6 @@ $ trv build examples/google.typ
98135
99136[ ![ Google demo video] ( https://transformrs.github.io/trv/google.png )] ( https://transformrs.github.io/trv/google.mp4 )
100137
101- See the [ Google section] ( #google ) for more information about the Google API.
102-
103- Google, meanwhile, has the best text-to-speech engine that I've found as part of Gemini 2.0 Flash Experimental.
104- However, audio output is not yet available via the API.
105-
106138## Zyphra Zonos
107139
108140To use the Zyphra Zonos model, you need 8 GB of VRAM.
@@ -132,7 +164,7 @@ So in practice, the Kokoro model is probably the better option for now.
132164
133165To create a portait video, like a YouTube Short, you can set the page to
134166
135- ``` typst
167+ ``` typ
136168#set page(width: 259.2pt, height: 460.8pt)
137169```
138170
@@ -141,24 +173,12 @@ This will automatically create slides with 1080 x 1920 resolution since Typst is
141173Next, ffmpeg will automatically scale the video to a height of 1920p so in this case the height will not be changed.
142174For landscape videos, it might scale the image down to 1920p.
143175
144- ## About Audio
145-
146- Audio is generated using the [ transformrs] ( https://github.com/transformrs/transformrs ) crate.
147- It supports multiple providers, including DeepInfra, OpenAI, and Google.
148-
149- So ` trv ` should also work with providers other than DeepInfra.
150- However, during testing, I got the best results with Kokoros or DeepInfra for the lowest price.
176+ ## Subtitles
151177
152- For example, OpenAI text-to-speech requires any video to contain a "clear disclosure" that the voice they are hearing is AI-generated.
178+ To add subtitles to the video, you can use OpenAI's [ ` whisper ` ] ( https://github.com/openai/whisper ) :
153179
154- ## Installation
155-
156- ``` sh
157- cargo install trv
180+ ``` raw
181+ $ whisper _out/out.mp4 -f srt --model small --language=en
158182```
159183
160- Or with [ ` cargo binstall ` ] ( https://github.com/cargo-bins/cargo-binstall ) :
161-
162- ``` sh
163- cargo binstall trv
164- ```
184+ This will create a ` out.srt ` file with the subtitles.
0 commit comments