You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. **Always review the auto-translated content.** While LLM-based translation performs well in most cases, it's strongly recommended to have a native speaker verify the accuracy and cultural appropriateness of the translated text.
228
228
@@ -234,5 +234,5 @@ More details can be found in {doc}`LLMs <LLMs>`.
234
234
1. Delete the original voice files in the `assets/` folder.
235
235
2. Regenerate the audio using a TTS voice that matches the target language.
236
236
237
-
See {doc}`text2voice <text2voice>` for details on how to configure voices and view the list of supported options.
237
+
See {doc}`Text-to-Voice <text2voice>` for details on how to configure voices and view the list of supported options.
`psyflow` supports **text-to-speech (TTS)** conversion to enhance accessibility and standardize instruction delivery across different languages. This would:
5
-
6
-
**Why it matters**
7
-
- Improves accessibility — especially for children, elderly, or low-literacy participants.
8
-
- Ensures consistent voice delivery across localized versions.
9
-
- Avoids the hassle of recording human voiceovers for each translation.
10
-
11
-
**How It Works**
3
+
`Psyflow` supports **text-to-speech (TTS)** conversion to enhance accessibility and standardize instruction delivery across different languages.
12
4
13
-
PsyFlow uses Microsoft's `edge-tts`, a cloud-based TTS API that converts text to audio (MP3). Voice files are:
5
+
**Why it matters**: Using text-to-speech improves accessibility—especially for children, elderly individuals, or participants with low literacy. It ensures consistent voice delivery across different language versions and eliminates the need to record human voiceovers for each translation. Moreover, by using standardized synthetic voices, it reduces variability introduced by different experimenters (主试), helping to maintain consistency across sessions and sites.
14
6
15
-
- Stored in the `assets/` folder.
16
-
- Automatically skipped if already generated (unless `overwrite=True`).
17
-
- Registered into `StimBank` as new `Sound` stimuli ready for playback.
7
+
**How It Works**: `Psyflow` uses Microsoft's `edge-tts`, a cloud-based TTS API that converts text to audio (MP3). The generated voice files are stored in the `assets/` folder, automatically skipped if they already exist (unless `overwrite=True` is specified), and registered into the `StimBank` as new `Sound` stimuli ready for playback.
18
8
19
-
> ⚠️ **Note**: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.
9
+
> **Note**: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.
This will create audio files like `instruction_text_voice.mp3` in `assets/`.
21
+
The resulting voices will be registered as `instruction_text_voice`, `good_bye_voice` in `StimBank`.
22
+
32
23
33
-
- This will create audio files like `instruction_text_voice.mp3` in `assets/`.
34
-
- The resulting voices will be registered as `instruction_text_voice`, `good_bye_voice` in `StimBank`.
24
+
>If you plan to use voice output, make sure to delete any previously generated audio files in the `assets/` folder before generating new ones. Additionally, choose a TTS voice that matches the language of the text to ensure natural and accurate pronunciation. By default, "zh-CN-XiaoxiaoNeural" is used.
35
25
36
-
If you plan to use voice:
37
-
1. Delete any previously generated audio in `assets/` before regenerating.
38
-
2. Choose a TTS voice that matches the language of the text.
39
26
40
27
---
41
28
@@ -50,10 +37,8 @@ stim_bank.add_voice(
50
37
voice="ja-JP-NanamiNeural"
51
38
)
52
39
```
40
+
The result will be registered as `welcome_voice` and available like any other stimulus.
53
41
54
-
- The result will be registered as `welcome_voice` and available like any other stimulus.
55
-
56
-
---
57
42
58
43
### Voice Selection
59
44
@@ -63,7 +48,7 @@ Use the built-in helper to explore available voices:
63
48
from psyflow.tts_utils import list_supported_voices
@@ -84,12 +69,16 @@ Alternatively, you can check the list of supported voices [here](https://gist.gi
84
69
85
70
- **Placeholder Limitation**: The TTS engine does **not** support dynamic text with placeholders such as `{duration}` or `{block_num}`. If your text includes placeholders, it will not be converted as expected — the synthesis may fail or result in unnatural output.
86
71
87
-
- **Overwrite**: Use `overwrite=True` to regenerate voice files even if they exist. However, be careful with this option, as it assumes you need to regenerate the voice every time you run the task ⚠️.
88
-
- **Voice Mismatch**: Always match the voice language to the text language to avoid unnatural pronunciation.
72
+
- **Internet Connection Required**: TTS generation relies on Microsoft’s cloud service and requires a stable internet connection. If you're offline or behind a restrictive network (e.g., with proxy issues), voice generation will fail.
73
+
74
+
- **Overwrite**: Use `overwrite=True` to regenerate voice files even if they exist. However, be careful with this option, as it assumes you need to regenerate the voice every time you run the task ⚠️.
75
+
76
+
- **Voice Mismatch**: Always match the voice language to the text language to avoid unnatural pronunciation. By default, "zh-CN-XiaoxiaoNeural" is used.
89
77
90
78
- **Preview Your Audio**: You can test output files manually in the `assets/` folder before running full experiments.
91
79
If a file is empty or not playable, it may cause the task to fail at runtime — try deleting and regenerating the voice file.
<h2>3. Important Notes on Task Localization<aclass="headerlink" href="#important-notes-on-task-localization" title="Link to this heading">¶</a></h2>
467
467
<olclass="arabic">
468
468
<li><p><strong>Always review the auto-translated content.</strong> While LLM-based translation performs well in most cases, it’s strongly recommended to have a native speaker verify the accuracy and cultural appropriateness of the translated text.</p></li>
469
469
<li><p><strong>Leverage text-to-speech (TTS) for multilingual audio delivery.</strong><br/>
@@ -473,7 +473,7 @@ <h2>⚠️ Important Notes ⚠️<a class="headerlink" href="#important-notes" t
473
473
<li><p>Delete the original voice files in the <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code> folder.</p></li>
474
474
<li><p>Regenerate the audio using a TTS voice that matches the target language.</p></li>
475
475
</ol>
476
-
<p>See <aclass="reference internal" href="text2voice.html"><spanclass="doc">text2voice</span></a> for details on how to configure voices and view the list of supported options.</p>
476
+
<p>See <aclass="reference internal" href="text2voice.html"><spanclass="doc">Text-to-Voice</span></a> for details on how to configure voices and view the list of supported options.</p>
477
477
</li>
478
478
</ol>
479
479
</section>
@@ -519,7 +519,7 @@ <h2>⚠️ Important Notes ⚠️<a class="headerlink" href="#important-notes" t
Copy file name to clipboardExpand all lines: docs/text2voice.html
+14-37Lines changed: 14 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -247,43 +247,25 @@
247
247
<articlerole="main" id="furo-main-content">
248
248
<sectionid="text-to-voice-conversion">
249
249
<h1>Text-to-Voice Conversion<aclass="headerlink" href="#text-to-voice-conversion" title="Link to this heading">¶</a></h1>
250
-
<p><codeclass="docutils literal notranslate"><spanclass="pre">psyflow</span></code> supports <strong>text-to-speech (TTS)</strong> conversion to enhance accessibility and standardize instruction delivery across different languages. This would:</p>
251
-
<p><strong>Why it matters</strong></p>
252
-
<ulclass="simple">
253
-
<li><p>Improves accessibility — especially for children, elderly, or low-literacy participants.</p></li>
254
-
<li><p>Ensures consistent voice delivery across localized versions.</p></li>
255
-
<li><p>Avoids the hassle of recording human voiceovers for each translation.</p></li>
256
-
</ul>
257
-
<p><strong>How It Works</strong></p>
258
-
<p>PsyFlow uses Microsoft’s <codeclass="docutils literal notranslate"><spanclass="pre">edge-tts</span></code>, a cloud-based TTS API that converts text to audio (MP3). Voice files are:</p>
259
-
<ulclass="simple">
260
-
<li><p>Stored in the <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code> folder.</p></li>
261
-
<li><p>Automatically skipped if already generated (unless <codeclass="docutils literal notranslate"><spanclass="pre">overwrite=True</span></code>).</p></li>
262
-
<li><p>Registered into <codeclass="docutils literal notranslate"><spanclass="pre">StimBank</span></code> as new <codeclass="docutils literal notranslate"><spanclass="pre">Sound</span></code> stimuli ready for playback.</p></li>
263
-
</ul>
250
+
<p><codeclass="docutils literal notranslate"><spanclass="pre">Psyflow</span></code> supports <strong>text-to-speech (TTS)</strong> conversion to enhance accessibility and standardize instruction delivery across different languages.</p>
251
+
<p><strong>Why it matters</strong>: Using text-to-speech improves accessibility—especially for children, elderly individuals, or participants with low literacy. It ensures consistent voice delivery across different language versions and eliminates the need to record human voiceovers for each translation. Moreover, by using standardized synthetic voices, it reduces variability introduced by different experimenters (主试), helping to maintain consistency across sessions and sites.</p>
252
+
<p><strong>How It Works</strong>: <codeclass="docutils literal notranslate"><spanclass="pre">Psyflow</span></code> uses Microsoft’s <codeclass="docutils literal notranslate"><spanclass="pre">edge-tts</span></code>, a cloud-based TTS API that converts text to audio (MP3). The generated voice files are stored in the <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code> folder, automatically skipped if they already exist (unless <codeclass="docutils literal notranslate"><spanclass="pre">overwrite=True</span></code> is specified), and registered into the <codeclass="docutils literal notranslate"><spanclass="pre">StimBank</span></code> as new <codeclass="docutils literal notranslate"><spanclass="pre">Sound</span></code> stimuli ready for playback.</p>
264
253
<blockquote>
265
-
<div><p>⚠️ <strong>Note</strong>: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.</p>
254
+
<div><p><strong>Note</strong>: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.</p>
266
255
</div></blockquote>
267
-
<sectionid="basic-usage">
268
-
<h2>Basic Usage<aclass="headerlink" href="#basic-usage" title="Link to this heading">¶</a></h2>
<li><p>This will create audio files like <codeclass="docutils literal notranslate"><spanclass="pre">instruction_text_voice.mp3</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code>.</p></li>
279
-
<li><p>The resulting voices will be registered as <codeclass="docutils literal notranslate"><spanclass="pre">instruction_text_voice</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">good_bye_voice</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">StimBank</span></code>.</p></li>
280
-
</ul>
281
-
<p>If you plan to use voice:</p>
282
-
<olclass="arabic simple">
283
-
<li><p>Delete any previously generated audio in <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code> before regenerating.</p></li>
284
-
<li><p>Choose a TTS voice that matches the language of the text.</p></li>
285
-
</ol>
286
-
</section>
264
+
<p>This will create audio files like <codeclass="docutils literal notranslate"><spanclass="pre">instruction_text_voice.mp3</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code>.
265
+
The resulting voices will be registered as <codeclass="docutils literal notranslate"><spanclass="pre">instruction_text_voice</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">good_bye_voice</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">StimBank</span></code>.</p>
266
+
<blockquote>
267
+
<div><p>If you plan to use voice output, make sure to delete any previously generated audio files in the <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code> folder before generating new ones. Additionally, choose a TTS voice that matches the language of the text to ensure natural and accurate pronunciation. By default, “zh-CN-XiaoxiaoNeural” is used.</p>
<li><p>The result will be registered as <codeclass="docutils literal notranslate"><spanclass="pre">welcome_voice</span></code> and available like any other stimulus.</p></li>
301
-
</ul>
281
+
<p>The result will be registered as <codeclass="docutils literal notranslate"><spanclass="pre">welcome_voice</span></code> and available like any other stimulus.</p>
302
282
</section>
303
-
<hrclass="docutils" />
304
283
<sectionid="voice-selection">
305
284
<h2>Voice Selection<aclass="headerlink" href="#voice-selection" title="Link to this heading">¶</a></h2>
306
285
<p>Use the built-in helper to explore available voices:</p>
@@ -358,8 +337,9 @@ <h2>Voice Selection<a class="headerlink" href="#voice-selection" title="Link to
358
337
<h2>Tips and Caveats<aclass="headerlink" href="#tips-and-caveats" title="Link to this heading">¶</a></h2>
359
338
<ulclass="simple">
360
339
<li><p><strong>Placeholder Limitation</strong>: The TTS engine does <strong>not</strong> support dynamic text with placeholders such as <codeclass="docutils literal notranslate"><spanclass="pre">{duration}</span></code> or <codeclass="docutils literal notranslate"><spanclass="pre">{block_num}</span></code>. If your text includes placeholders, it will not be converted as expected — the synthesis may fail or result in unnatural output.</p></li>
340
+
<li><p><strong>Internet Connection Required</strong>: TTS generation relies on Microsoft’s cloud service and requires a stable internet connection. If you’re offline or behind a restrictive network (e.g., with proxy issues), voice generation will fail.</p></li>
361
341
<li><p><strong>Overwrite</strong>: Use <codeclass="docutils literal notranslate"><spanclass="pre">overwrite=True</span></code> to regenerate voice files even if they exist. However, be careful with this option, as it assumes you need to regenerate the voice every time you run the task ⚠️.</p></li>
362
-
<li><p><strong>Voice Mismatch</strong>: Always match the voice language to the text language to avoid unnatural pronunciation.</p></li>
342
+
<li><p><strong>Voice Mismatch</strong>: Always match the voice language to the text language to avoid unnatural pronunciation. By default, “zh-CN-XiaoxiaoNeural” is used.</p></li>
363
343
<li><p><strong>Preview Your Audio</strong>: You can test output files manually in the <codeclass="docutils literal notranslate"><spanclass="pre">assets/</span></code> folder before running full experiments.<br/>
364
344
If a file is empty or not playable, it may cause the task to fail at runtime — try deleting and regenerating the voice file.</p></li>
365
345
</ul>
@@ -404,10 +384,7 @@ <h2>Tips and Caveats<a class="headerlink" href="#tips-and-caveats" title="Link t
0 commit comments