Skip to content

Commit 034537c

Browse files
committed
tweaks
1 parent d7d804c commit 034537c

File tree

10 files changed

+57
-102
lines changed

10 files changed

+57
-102
lines changed

docs/.doctrees/environment.pickle

-1.28 KB
Binary file not shown.
60 Bytes
Binary file not shown.

docs/.doctrees/text2voice.doctree

-520 Bytes
Binary file not shown.

docs/_sources/localization.md.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ translated_config = client.translate_config(
222222
```
223223
More details can be found in {doc}`LLMs <LLMs>`.
224224

225-
### ⚠️ Important Notes ⚠️
225+
### 3. Important Notes on Task Localization
226226

227227
1. **Always review the auto-translated content.** While LLM-based translation performs well in most cases, it's strongly recommended to have a native speaker verify the accuracy and cultural appropriateness of the translated text.
228228

@@ -234,5 +234,5 @@ More details can be found in {doc}`LLMs <LLMs>`.
234234
1. Delete the original voice files in the `assets/` folder.
235235
2. Regenerate the audio using a TTS voice that matches the target language.
236236

237-
See {doc}`text2voice <text2voice>` for details on how to configure voices and view the list of supported options.
237+
See {doc}`Text-to-Voice <text2voice>` for details on how to configure voices and view the list of supported options.
238238

docs/_sources/text2voice.md.txt

Lines changed: 17 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,28 @@
1-
21
## Text-to-Voice Conversion
32

4-
`psyflow` supports **text-to-speech (TTS)** conversion to enhance accessibility and standardize instruction delivery across different languages. This would:
5-
6-
**Why it matters**
7-
- Improves accessibility — especially for children, elderly, or low-literacy participants.
8-
- Ensures consistent voice delivery across localized versions.
9-
- Avoids the hassle of recording human voiceovers for each translation.
10-
11-
**How It Works**
3+
`Psyflow` supports **text-to-speech (TTS)** conversion to enhance accessibility and standardize instruction delivery across different languages.
124

13-
PsyFlow uses Microsoft's `edge-tts`, a cloud-based TTS API that converts text to audio (MP3). Voice files are:
5+
**Why it matters**: Using text-to-speech improves accessibility—especially for children, elderly individuals, or participants with low literacy. It ensures consistent voice delivery across different language versions and eliminates the need to record human voiceovers for each translation. Moreover, by using standardized synthetic voices, it reduces variability introduced by different experimenters (主试), helping to maintain consistency across sessions and sites.
146

15-
- Stored in the `assets/` folder.
16-
- Automatically skipped if already generated (unless `overwrite=True`).
17-
- Registered into `StimBank` as new `Sound` stimuli ready for playback.
7+
**How It Works**: `Psyflow` uses Microsoft's `edge-tts`, a cloud-based TTS API that converts text to audio (MP3). The generated voice files are stored in the `assets/` folder, automatically skipped if they already exist (unless `overwrite=True` is specified), and registered into the `StimBank` as new `Sound` stimuli ready for playback.
188

19-
> ⚠️ **Note**: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.
9+
> **Note**: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.
2010

2111

22-
### Basic Usage
23-
24-
#### Convert Existing Text Stimuli to Voice
12+
### Convert Existing Text Stimuli to Voice
2513

2614
```python
2715
from psyflow import StimBank
2816
stim_bank = StimBank(config)
2917
stim_bank.convert_to_voice(keys=["instruction_text", "good_bye"],
3018
voice="zh-CN-YunyangNeural")
3119
```
20+
This will create audio files like `instruction_text_voice.mp3` in `assets/`.
21+
The resulting voices will be registered as `instruction_text_voice`, `good_bye_voice` in `StimBank`.
22+
3223

33-
- This will create audio files like `instruction_text_voice.mp3` in `assets/`.
34-
- The resulting voices will be registered as `instruction_text_voice`, `good_bye_voice` in `StimBank`.
24+
>If you plan to use voice output, make sure to delete any previously generated audio files in the `assets/` folder before generating new ones. Additionally, choose a TTS voice that matches the language of the text to ensure natural and accurate pronunciation. By default, "zh-CN-XiaoxiaoNeural" is used.
3525

36-
If you plan to use voice:
37-
1. Delete any previously generated audio in `assets/` before regenerating.
38-
2. Choose a TTS voice that matches the language of the text.
3926

4027
---
4128

@@ -50,10 +37,8 @@ stim_bank.add_voice(
5037
voice="ja-JP-NanamiNeural"
5138
)
5239
```
40+
The result will be registered as `welcome_voice` and available like any other stimulus.
5341

54-
- The result will be registered as `welcome_voice` and available like any other stimulus.
55-
56-
---
5742

5843
### Voice Selection
5944

@@ -63,7 +48,7 @@ Use the built-in helper to explore available voices:
6348
from psyflow.tts_utils import list_supported_voices
6449

6550
# Print all voices
66-
list_supported_voices(filter_lang="ja", human_readable=True)
51+
list_supported_voices(human_readable=True)
6752

6853
# Print all Japanese voices
6954
list_supported_voices(filter_lang="ja", human_readable=True)
@@ -84,12 +69,16 @@ Alternatively, you can check the list of supported voices [here](https://gist.gi
8469

8570
- **Placeholder Limitation**: The TTS engine does **not** support dynamic text with placeholders such as `{duration}` or `{block_num}`. If your text includes placeholders, it will not be converted as expected — the synthesis may fail or result in unnatural output.
8671

87-
- **Overwrite**: Use `overwrite=True` to regenerate voice files even if they exist. However, be careful with this option, as it assumes you need to regenerate the voice every time you run the task ⚠️.
88-
- **Voice Mismatch**: Always match the voice language to the text language to avoid unnatural pronunciation.
72+
- **Internet Connection Required**: TTS generation relies on Microsoft’s cloud service and requires a stable internet connection. If you're offline or behind a restrictive network (e.g., with proxy issues), voice generation will fail.
73+
74+
- **Overwrite**: Use `overwrite=True` to regenerate voice files even if they exist. However, be careful with this option, as it assumes you need to regenerate the voice every time you run the task ⚠️.
75+
76+
- **Voice Mismatch**: Always match the voice language to the text language to avoid unnatural pronunciation. By default, "zh-CN-XiaoxiaoNeural" is used.
8977

9078
- **Preview Your Audio**: You can test output files manually in the `assets/` folder before running full experiments.
9179
If a file is empty or not playable, it may cause the task to fail at runtime — try deleting and regenerating the voice file.
9280

9381

9482

9583

84+

docs/localization.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -462,8 +462,8 @@ <h2>2. Programmatic Localization via API<a class="headerlink" href="#programmati
462462
</div>
463463
<p>More details can be found in <a class="reference internal" href="LLMs.html"><span class="doc">LLMs</span></a>.</p>
464464
</section>
465-
<section id="important-notes">
466-
<h2>⚠️ Important Notes ⚠️<a class="headerlink" href="#important-notes" title="Link to this heading"></a></h2>
465+
<section id="important-notes-on-task-localization">
466+
<h2>3. Important Notes on Task Localization<a class="headerlink" href="#important-notes-on-task-localization" title="Link to this heading"></a></h2>
467467
<ol class="arabic">
468468
<li><p><strong>Always review the auto-translated content.</strong> While LLM-based translation performs well in most cases, it’s strongly recommended to have a native speaker verify the accuracy and cultural appropriateness of the translated text.</p></li>
469469
<li><p><strong>Leverage text-to-speech (TTS) for multilingual audio delivery.</strong><br />
@@ -473,7 +473,7 @@ <h2>⚠️ Important Notes ⚠️<a class="headerlink" href="#important-notes" t
473473
<li><p>Delete the original voice files in the <code class="docutils literal notranslate"><span class="pre">assets/</span></code> folder.</p></li>
474474
<li><p>Regenerate the audio using a TTS voice that matches the target language.</p></li>
475475
</ol>
476-
<p>See <a class="reference internal" href="text2voice.html"><span class="doc">text2voice</span></a> for details on how to configure voices and view the list of supported options.</p>
476+
<p>See <a class="reference internal" href="text2voice.html"><span class="doc">Text-to-Voice</span></a> for details on how to configure voices and view the list of supported options.</p>
477477
</li>
478478
</ol>
479479
</section>
@@ -519,7 +519,7 @@ <h2>⚠️ Important Notes ⚠️<a class="headerlink" href="#important-notes" t
519519
<li><a class="reference internal" href="#">Task Localization</a><ul>
520520
<li><a class="reference internal" href="#manual-adaptation-quick-and-easy">1. Manual Adaptation (Quick and Easy)</a></li>
521521
<li><a class="reference internal" href="#programmatic-localization-via-api">2. Programmatic Localization via API</a></li>
522-
<li><a class="reference internal" href="#important-notes">⚠️ Important Notes ⚠️</a></li>
522+
<li><a class="reference internal" href="#important-notes-on-task-localization">3. Important Notes on Task Localization</a></li>
523523
</ul>
524524
</li>
525525
</ul>

docs/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/text2voice.html

Lines changed: 14 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -247,43 +247,25 @@
247247
<article role="main" id="furo-main-content">
248248
<section id="text-to-voice-conversion">
249249
<h1>Text-to-Voice Conversion<a class="headerlink" href="#text-to-voice-conversion" title="Link to this heading"></a></h1>
250-
<p><code class="docutils literal notranslate"><span class="pre">psyflow</span></code> supports <strong>text-to-speech (TTS)</strong> conversion to enhance accessibility and standardize instruction delivery across different languages. This would:</p>
251-
<p><strong>Why it matters</strong></p>
252-
<ul class="simple">
253-
<li><p>Improves accessibility — especially for children, elderly, or low-literacy participants.</p></li>
254-
<li><p>Ensures consistent voice delivery across localized versions.</p></li>
255-
<li><p>Avoids the hassle of recording human voiceovers for each translation.</p></li>
256-
</ul>
257-
<p><strong>How It Works</strong></p>
258-
<p>PsyFlow uses Microsoft’s <code class="docutils literal notranslate"><span class="pre">edge-tts</span></code>, a cloud-based TTS API that converts text to audio (MP3). Voice files are:</p>
259-
<ul class="simple">
260-
<li><p>Stored in the <code class="docutils literal notranslate"><span class="pre">assets/</span></code> folder.</p></li>
261-
<li><p>Automatically skipped if already generated (unless <code class="docutils literal notranslate"><span class="pre">overwrite=True</span></code>).</p></li>
262-
<li><p>Registered into <code class="docutils literal notranslate"><span class="pre">StimBank</span></code> as new <code class="docutils literal notranslate"><span class="pre">Sound</span></code> stimuli ready for playback.</p></li>
263-
</ul>
250+
<p><code class="docutils literal notranslate"><span class="pre">Psyflow</span></code> supports <strong>text-to-speech (TTS)</strong> conversion to enhance accessibility and standardize instruction delivery across different languages.</p>
251+
<p><strong>Why it matters</strong>: Using text-to-speech improves accessibility—especially for children, elderly individuals, or participants with low literacy. It ensures consistent voice delivery across different language versions and eliminates the need to record human voiceovers for each translation. Moreover, by using standardized synthetic voices, it reduces variability introduced by different experimenters (主试), helping to maintain consistency across sessions and sites.</p>
252+
<p><strong>How It Works</strong>: <code class="docutils literal notranslate"><span class="pre">Psyflow</span></code> uses Microsoft’s <code class="docutils literal notranslate"><span class="pre">edge-tts</span></code>, a cloud-based TTS API that converts text to audio (MP3). The generated voice files are stored in the <code class="docutils literal notranslate"><span class="pre">assets/</span></code> folder, automatically skipped if they already exist (unless <code class="docutils literal notranslate"><span class="pre">overwrite=True</span></code> is specified), and registered into the <code class="docutils literal notranslate"><span class="pre">StimBank</span></code> as new <code class="docutils literal notranslate"><span class="pre">Sound</span></code> stimuli ready for playback.</p>
264253
<blockquote>
265-
<div><p>⚠️ <strong>Note</strong>: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.</p>
254+
<div><p><strong>Note</strong>: An internet connection is required for TTS generation. Offline tools exist but produce lower-quality audio.</p>
266255
</div></blockquote>
267-
<section id="basic-usage">
268-
<h2>Basic Usage<a class="headerlink" href="#basic-usage" title="Link to this heading"></a></h2>
269256
<section id="convert-existing-text-stimuli-to-voice">
270-
<h3>Convert Existing Text Stimuli to Voice<a class="headerlink" href="#convert-existing-text-stimuli-to-voice" title="Link to this heading"></a></h3>
257+
<h2>Convert Existing Text Stimuli to Voice<a class="headerlink" href="#convert-existing-text-stimuli-to-voice" title="Link to this heading"></a></h2>
271258
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">psyflow</span><span class="w"> </span><span class="kn">import</span> <span class="n">StimBank</span>
272259
<span class="n">stim_bank</span> <span class="o">=</span> <span class="n">StimBank</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
273260
<span class="n">stim_bank</span><span class="o">.</span><span class="n">convert_to_voice</span><span class="p">(</span><span class="n">keys</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;instruction_text&quot;</span><span class="p">,</span> <span class="s2">&quot;good_bye&quot;</span><span class="p">],</span>
274261
<span class="n">voice</span><span class="o">=</span><span class="s2">&quot;zh-CN-YunyangNeural&quot;</span><span class="p">)</span>
275262
</pre></div>
276263
</div>
277-
<ul class="simple">
278-
<li><p>This will create audio files like <code class="docutils literal notranslate"><span class="pre">instruction_text_voice.mp3</span></code> in <code class="docutils literal notranslate"><span class="pre">assets/</span></code>.</p></li>
279-
<li><p>The resulting voices will be registered as <code class="docutils literal notranslate"><span class="pre">instruction_text_voice</span></code>, <code class="docutils literal notranslate"><span class="pre">good_bye_voice</span></code> in <code class="docutils literal notranslate"><span class="pre">StimBank</span></code>.</p></li>
280-
</ul>
281-
<p>If you plan to use voice:</p>
282-
<ol class="arabic simple">
283-
<li><p>Delete any previously generated audio in <code class="docutils literal notranslate"><span class="pre">assets/</span></code> before regenerating.</p></li>
284-
<li><p>Choose a TTS voice that matches the language of the text.</p></li>
285-
</ol>
286-
</section>
264+
<p>This will create audio files like <code class="docutils literal notranslate"><span class="pre">instruction_text_voice.mp3</span></code> in <code class="docutils literal notranslate"><span class="pre">assets/</span></code>.
265+
The resulting voices will be registered as <code class="docutils literal notranslate"><span class="pre">instruction_text_voice</span></code>, <code class="docutils literal notranslate"><span class="pre">good_bye_voice</span></code> in <code class="docutils literal notranslate"><span class="pre">StimBank</span></code>.</p>
266+
<blockquote>
267+
<div><p>If you plan to use voice output, make sure to delete any previously generated audio files in the <code class="docutils literal notranslate"><span class="pre">assets/</span></code> folder before generating new ones. Additionally, choose a TTS voice that matches the language of the text to ensure natural and accurate pronunciation. By default, “zh-CN-XiaoxiaoNeural” is used.</p>
268+
</div></blockquote>
287269
</section>
288270
<hr class="docutils" />
289271
<section id="add-voice-from-custom-text">
@@ -296,18 +278,15 @@ <h2>Add Voice from Custom Text<a class="headerlink" href="#add-voice-from-custom
296278
<span class="p">)</span>
297279
</pre></div>
298280
</div>
299-
<ul class="simple">
300-
<li><p>The result will be registered as <code class="docutils literal notranslate"><span class="pre">welcome_voice</span></code> and available like any other stimulus.</p></li>
301-
</ul>
281+
<p>The result will be registered as <code class="docutils literal notranslate"><span class="pre">welcome_voice</span></code> and available like any other stimulus.</p>
302282
</section>
303-
<hr class="docutils" />
304283
<section id="voice-selection">
305284
<h2>Voice Selection<a class="headerlink" href="#voice-selection" title="Link to this heading"></a></h2>
306285
<p>Use the built-in helper to explore available voices:</p>
307286
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">psyflow.tts_utils</span><span class="w"> </span><span class="kn">import</span> <span class="n">list_supported_voices</span>
308287

309288
<span class="c1"># Print all voices</span>
310-
<span class="n">list_supported_voices</span><span class="p">(</span><span class="n">filter_lang</span><span class="o">=</span><span class="s2">&quot;ja&quot;</span><span class="p">,</span> <span class="n">human_readable</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
289+
<span class="n">list_supported_voices</span><span class="p">(</span><span class="n">human_readable</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
311290

312291
<span class="c1"># Print all Japanese voices</span>
313292
<span class="n">list_supported_voices</span><span class="p">(</span><span class="n">filter_lang</span><span class="o">=</span><span class="s2">&quot;ja&quot;</span><span class="p">,</span> <span class="n">human_readable</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
@@ -358,8 +337,9 @@ <h2>Voice Selection<a class="headerlink" href="#voice-selection" title="Link to
358337
<h2>Tips and Caveats<a class="headerlink" href="#tips-and-caveats" title="Link to this heading"></a></h2>
359338
<ul class="simple">
360339
<li><p><strong>Placeholder Limitation</strong>: The TTS engine does <strong>not</strong> support dynamic text with placeholders such as <code class="docutils literal notranslate"><span class="pre">{duration}</span></code> or <code class="docutils literal notranslate"><span class="pre">{block_num}</span></code>. If your text includes placeholders, it will not be converted as expected — the synthesis may fail or result in unnatural output.</p></li>
340+
<li><p><strong>Internet Connection Required</strong>: TTS generation relies on Microsoft’s cloud service and requires a stable internet connection. If you’re offline or behind a restrictive network (e.g., with proxy issues), voice generation will fail.</p></li>
361341
<li><p><strong>Overwrite</strong>: Use <code class="docutils literal notranslate"><span class="pre">overwrite=True</span></code> to regenerate voice files even if they exist. However, be careful with this option, as it assumes you need to regenerate the voice every time you run the task ⚠️.</p></li>
362-
<li><p><strong>Voice Mismatch</strong>: Always match the voice language to the text language to avoid unnatural pronunciation.</p></li>
342+
<li><p><strong>Voice Mismatch</strong>: Always match the voice language to the text language to avoid unnatural pronunciation. By default, “zh-CN-XiaoxiaoNeural” is used.</p></li>
363343
<li><p><strong>Preview Your Audio</strong>: You can test output files manually in the <code class="docutils literal notranslate"><span class="pre">assets/</span></code> folder before running full experiments.<br />
364344
If a file is empty or not playable, it may cause the task to fail at runtime — try deleting and regenerating the voice file.</p></li>
365345
</ul>
@@ -404,10 +384,7 @@ <h2>Tips and Caveats<a class="headerlink" href="#tips-and-caveats" title="Link t
404384
<div class="toc-tree">
405385
<ul>
406386
<li><a class="reference internal" href="#">Text-to-Voice Conversion</a><ul>
407-
<li><a class="reference internal" href="#basic-usage">Basic Usage</a><ul>
408387
<li><a class="reference internal" href="#convert-existing-text-stimuli-to-voice">Convert Existing Text Stimuli to Voice</a></li>
409-
</ul>
410-
</li>
411388
<li><a class="reference internal" href="#add-voice-from-custom-text">Add Voice from Custom Text</a></li>
412389
<li><a class="reference internal" href="#voice-selection">Voice Selection</a></li>
413390
<li><a class="reference internal" href="#tips-and-caveats">Tips and Caveats</a></li>

0 commit comments

Comments
 (0)