Skip to content

Commit f35de72

Browse files
authored
Merge pull request #23 from grycap/dev-vicente-tts
2 parents 9391669 + d04f3a9 commit f35de72

File tree

12 files changed

+681
-0
lines changed

12 files changed

+681
-0
lines changed

crates/kokoro-tts/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Kokoro TTS Service for OSCAR
2+
3+
This service contains the configuration necessary to implement a Text-to-Speech (TTS) service using the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) model. The service is optimized to run asynchronously on the CPU only, allowing its deployment on infrastructures without a GPU, including AMD64 and ARM64 architectures.
4+
5+
Kokoro is an open-source TTS model with 82 million parameters. Despite its lightweight architecture, it offers comparable quality to larger models, while being significantly faster and more cost-effective. Thanks to its Apache-licensed weights, it can be deployed in any environment.
6+
7+
To run the service, a .json file must be used that contains both the message to be processed and the execution configuration parameters. This makes the service flexible for any environment. The service's input file must have the following structure.
8+
9+
```json
10+
{
11+
"model": "af_bella",
12+
"language": "en-gb",
13+
"message": "This is an audio sample generated using the kokoro-tts service.",
14+
"config": {
15+
"speed": 1.0,
16+
"volume": 3.1,
17+
"output": "wav"
18+
}
19+
}
20+
```
21+
22+
Description of the configuration parameters:
23+
24+
* model: Voice identifier (all available models in [voices](https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX/tree/main/voices)).
25+
* language: Language of the text to be processed ([lang_code](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md)).
26+
* speed: Speech speed (0.5 to 2).
27+
* volume: Output audio volume level.
28+
* output: Output audio file format (for example: "mp3","wav", "flac").
29+

crates/kokoro-tts/fdl.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
functions:
2+
oscar:
3+
- oscar-cluster:
4+
name: kokoro-tts
5+
memory: 3Gi
6+
cpu: '2.0'
7+
image: ghcr.io/grycap/kokoro-tts:lastest
8+
script: script.sh
9+
-ttslog_level: CRITICAL
10+
input:
11+
- storage_provider: minio.default
12+
path: kokoro-tts/input
13+
output:
14+
- storage_provider: minio.default
15+
path: kokoro-tts/output

crates/kokoro-tts/icon.png

1.32 MB
Loading

crates/kokoro-tts/input.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"model": "af_bella",
3+
"language": "en-gb",
4+
"message": "This is an audio sample generated using the kokoro-tts service.",
5+
"config": {
6+
"speed": 1.0,
7+
"volume": 3.1,
8+
"output": "wav"
9+
}
10+
}
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
{
2+
"@context": [
3+
"https://w3id.org/ro/crate/1.1/context"
4+
],
5+
"@graph": [
6+
{
7+
"@type": "CreativeWork",
8+
"@id": "ro-crate-metadata.json",
9+
"conformsTo": {
10+
"@id": "https://w3id.org/ro/crate/1.1"
11+
},
12+
"about": {
13+
"@id": "./"
14+
}
15+
},
16+
{
17+
"@id": "./",
18+
"@type": [
19+
"Dataset",
20+
"Service",
21+
"SoftwareApplication"
22+
],
23+
"datePublished": "2026-02-13",
24+
"URL": "https://github.com/grycap/oscar-hub/tree/main/crates/kokoro-tts",
25+
"name": "Kokoro TTS Service",
26+
"description": "The Kokoro TTS service is a speech synthesis solution that transforms text into multilingual audio optimized for CPU architectures (AMD/ARM). ",
27+
"license": {
28+
"@id": "https://www.apache.org/licenses/LICENSE-2.0"
29+
},
30+
"applicationCategory": "OSCAR Service",
31+
"memoryRequirements": "3 GiB",
32+
"processorRequirements": [
33+
"2 vCPU",
34+
"0 GPU"
35+
],
36+
"serviceType":"asynchronous",
37+
"isBasedOn": [
38+
{
39+
"@id": "https://huggingface.co/hexgrad/Kokoro-82M"
40+
}
41+
],
42+
"author": {
43+
"@id": "https://orcid.org/0000-0002-7335-3849"
44+
},
45+
"subjectOf": [
46+
{
47+
"@id": "#acceptance-test-async"
48+
}
49+
],
50+
"hasPart": [
51+
{
52+
"@id": "fdl.yml"
53+
},
54+
{
55+
"@id": "script.sh"
56+
},
57+
{
58+
"@id": "icon.png"
59+
},
60+
{
61+
"@id": "input.json"
62+
},
63+
{
64+
"@id": "#expected-audio"
65+
}
66+
]
67+
},
68+
{
69+
"@id": "fdl.yml",
70+
"@type": [
71+
"File",
72+
"SoftwareSourceCode"
73+
],
74+
"name": "OSCAR Service Definition",
75+
"url": "https://raw.githubusercontent.com/grycap/oscar-hub/refs/heads/main/crates/kokoro-tts/fdl.yml",
76+
"encodingFormat": "text/yaml"
77+
},
78+
{
79+
"@id": "script.sh",
80+
"@type": [
81+
"File",
82+
"SoftwareSourceCode"
83+
],
84+
"name": "OSCAR Service Script",
85+
"url": "https://raw.githubusercontent.com/grycap/oscar-hub/refs/heads/main/crates/kokoro-tts/script.sh",
86+
"encodingFormat": "text/x-shellscript"
87+
},
88+
{
89+
"@id": "icon.png",
90+
"@type": [
91+
"File",
92+
"ImageObject"
93+
],
94+
"name": "OSCAR Service Icon",
95+
"url": "https://raw.githubusercontent.com/grycap/oscar-hub/refs/heads/main/crates/kokoro-tts/icon.png",
96+
"encodingFormat": "image/png"
97+
},
98+
{
99+
"@id": "input.json",
100+
"@type": [
101+
"File"
102+
],
103+
"name": "Sample configuration file",
104+
"description": "Replace with a representative .json file containing the message to be processed and its configuration to run the kokoro-tts model.",
105+
"url": "https://raw.githubusercontent.com/grycap/oscar-hub/refs/heads/main/crates/kokoro-tts/input.json",
106+
"encodingFormat": "text/json"
107+
},
108+
{
109+
"@id": "#acceptance-test-async",
110+
"@type": "HowTo",
111+
"name": "Asynchronous Kokoro-tts acceptance test",
112+
"description": "Upload the sample file, wait for the audio file to be created, and then download the file in the specified format.",
113+
"tool": [
114+
{
115+
"@id": "#tool-oscar-cli"
116+
}
117+
],
118+
"supply": [
119+
{
120+
"@id": "#supply-sample-json"
121+
}
122+
],
123+
"step": [
124+
{
125+
"@id": "#step-async-put"
126+
},
127+
{
128+
"@id": "#step-async-wait"
129+
},
130+
{
131+
"@id": "#step-async-get"
132+
}
133+
]
134+
},
135+
{
136+
"@id": "#tool-oscar-cli",
137+
"@type": "HowToTool",
138+
"name": "OSCAR CLI",
139+
"item": {
140+
"@id": "#oscar-cli"
141+
}
142+
},
143+
{
144+
"@id": "#supply-sample-json",
145+
"@type": "HowToSupply",
146+
"name": "Sample json file",
147+
"item": {
148+
"@id": "input.json"
149+
}
150+
},
151+
{
152+
"@id": "#expected-audio",
153+
"@type": [
154+
"File"
155+
],
156+
"name": "Expected audio output",
157+
"description": "Audio file produced by the service that contains the message with the specified characteristics.",
158+
"encodingFormat": "audio/wav"
159+
},
160+
{
161+
"@id": "#step-async-put",
162+
"@type": "HowToStep",
163+
"position": 1,
164+
"text": "Upload the sample configuration file to the service's storage.",
165+
"potentialAction": {
166+
"@id": "#action-async-put"
167+
}
168+
},
169+
{
170+
"@id": "#step-async-wait",
171+
"@type": "HowToStep",
172+
"position": 2,
173+
"text": "Wait for the audio output according to specifications to occur",
174+
"timeRequired": "PT60S"
175+
},
176+
{
177+
"@id": "#step-async-get",
178+
"@type": "HowToStep",
179+
"position": 3,
180+
"text": "Download the latest output file and confirm that it is in the expected format.",
181+
"potentialAction": {
182+
"@id": "#action-async-get"
183+
}
184+
},
185+
{
186+
"@id": "#action-async-put",
187+
"@type": "TransferAction",
188+
"name": "service put-file",
189+
"object": {
190+
"@id": "input.json"
191+
},
192+
"target": {
193+
"@id": "#entry-async-put"
194+
},
195+
"additionalProperty": [
196+
{
197+
"@id": "#command-template-async-put"
198+
}
199+
]
200+
},
201+
{
202+
"@id": "#entry-async-put",
203+
"@type": "EntryPoint",
204+
"actionApplication": {
205+
"@id": "#oscar-cli"
206+
}
207+
},
208+
{
209+
"@id": "#command-template-async-put",
210+
"@type": "PropertyValue",
211+
"propertyID": "commandTemplate",
212+
"value": "oscar-cli service put-file kokoro-tts {source}"
213+
},
214+
{
215+
"@id": "#action-async-get",
216+
"@type": "TransferAction",
217+
"name": "service get-file",
218+
"target": {
219+
"@id": "#entry-async-get"
220+
},
221+
"result": {
222+
"@id": "#expected-json"
223+
},
224+
"additionalProperty": [
225+
{
226+
"@id": "#command-template-async-get"
227+
}
228+
]
229+
},
230+
{
231+
"@id": "#entry-async-get",
232+
"@type": "EntryPoint",
233+
"actionApplication": {
234+
"@id": "#oscar-cli"
235+
}
236+
},
237+
{
238+
"@id": "#command-template-async-get",
239+
"@type": "PropertyValue",
240+
"propertyID": "commandTemplate",
241+
"value": "oscar-cli service get-file kokoro-tts --download-latest-into {destination}"
242+
},
243+
{
244+
"@id": "#oscar-cli",
245+
"@type": "SoftwareApplication",
246+
"name": "OSCAR CLI",
247+
"url": "https://github.com/grycap/oscar-cli"
248+
},
249+
{
250+
"@id": "https://orcid.org/0000-0002-7335-3849",
251+
"@type": "Person",
252+
"affiliation": {
253+
"@id": "UPV"
254+
},
255+
"name": "Vicente Rodriguez"
256+
},
257+
{
258+
"@id": "https://www.apache.org/licenses/LICENSE-2.0",
259+
"@type": "CreativeWork",
260+
"name": "Apache License 2.0",
261+
"identifier": "SPDX:Apache-2.0"
262+
},
263+
{
264+
"@id": "https://huggingface.co/hexgrad/Kokoro-82M",
265+
"@type": "SoftwareApplication",
266+
"name": "Kokoro-82M",
267+
"description": "Kokoro is an open-weight TTS model with 82 million parameters."
268+
},
269+
{
270+
"@id": "UPV",
271+
"@type": "Organization",
272+
"name": "Universitat Politècnica de València",
273+
"url": "https://www.upv.es"
274+
}
275+
]
276+
}

crates/kokoro-tts/script.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
3+
echo "--- Initiating Kokoro TTS Processing ---"
4+
set -e
5+
FILENAME_BASE=$(basename "$INPUT_FILE_PATH" .json)
6+
OUTPUT_BASE="$TMP_OUTPUT_DIR/${FILENAME_BASE}"
7+
8+
python3 /app/kokoro_factory.py "$INPUT_FILE_PATH" "$OUTPUT_BASE"

crates/vosk-stt/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Vosk STT Service for OSCAR
2+
3+
[Vosk](https://alphacephei.com/vosk/) is a fully offline, open-source speech-to-text (STT) toolkit. Its architecture is notable for its efficiency, using lightweight models that enable near-instantaneous responses even on modest hardware, and for its exceptional robustness, capable of accurately processing background noise and technical terms. It also offers versatile multi-language support.
4+
5+
The Vosk STT service processes an audio file and returns the result as a text file. It is currently built on [models](https://alphacephei.com/vosk/models) to process English and Spanish audio. It cannot recognize the speaker's language, so the input file must have the following structure: **audio.wav** for Spanish audio and **audio_en.wav** for English audio. It supports any input audio file format. The output format is a .txt file containing the text generated by the speech recognition process.
6+
7+
8+

crates/vosk-stt/fdl.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
functions:
2+
oscar:
3+
- oscar-cluster:
4+
name: vosk-stt
5+
image: ghcr.io/grycap/vosk-stt:v1.0
6+
memory: 3Gi
7+
cpu: '2.0'
8+
script: script.sh
9+
log_level: CRITICAL
10+
input:
11+
- storage_provider: minio.default
12+
path: vosk-stt/input
13+
output:
14+
- storage_provider: minio.default
15+
path: vosk-stt/output

crates/vosk-stt/icon.png

1.49 MB
Loading

crates/vosk-stt/input_en.wav

184 KB
Binary file not shown.

0 commit comments

Comments
 (0)