Skip to content

Commit b057c4a

Browse files
minor fixes
1 parent 6ef8d14 commit b057c4a

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

Bolbosh/index.html

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="utf-8">
55
<meta name="viewport" content="width=device-width, initial-scale=1">
6-
<title>Bolbosh: A Multi-Speaker Text-to-Speech System for Kashmiri</title>
6+
<title>Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech</title>
77

88
<!-- Fonts -->
99
<link href="https://fonts.googleapis.com/css2?family=Google+Sans:wght@400;500;700&display=swap" rel="stylesheet">
@@ -40,7 +40,7 @@
4040
<div class="hero-body">
4141
<div class="container is-max-desktop">
4242
<h1 class="title is-1 publication-title">
43-
<span class="bolbosh-text">Bolbosh</span>: A Multi-Speaker Text-to-Speech System for Kashmiri
43+
<span class="bolbosh-text">Bolbosh</span>: Script-Aware Flow Matching for Kashmiri Text-to-Speech
4444
</h1>
4545

4646
<div class="is-size-5 publication-authors">
@@ -115,10 +115,9 @@ <h1 class="title is-1 publication-title">
115115
<h2 class="title is-3 section-title">Abstract</h2>
116116
<div class="content has-text-justified">
117117
<p>
118-
<b>Bolbosh</b> is a multi-speaker text-to-speech (TTS) system for the Kashmiri language (Persio-Arabic script). Adapting the Matcha-TTS framework—a non-autoregressive, conditional flow matching (CFM) based model—Bolbosh synthesizes natural-sounding Kashmiri speech from text. To the best of our knowledge, this is among the first neural TTS systems for Kashmiri, a low-resource language spoken in the Kashmir region.
119-
</p>
120-
<p>
121-
Our system is trained on 424 speakers by combining the Rasa dataset and IndicVoices corpus. We employ transfer learning from a pre-trained English multi-speaker model (VCTK) and utilize character-level text processing with a custom Kashmiri text normalizer—eliminating the need for a phonemizer. Using an ODE-based inference procedure, Bolbosh enables fast, high-quality synthesis in as few as 10 steps.
118+
Kashmiri is spoken by around 7 million people, but remains critically underserved in speech technology. despite its official status and rich linguistic heritage. The lack of robust Text-to-Speech (\texttt{TTS}) systems limits digital accessibility and inclusive human-computer interaction for native speakers. In this work, we present the first dedicated, open-source neural \texttt{TTS} system designed for Kashmiri. We show that zero-shot multilingual baselines trained for Indic languages fail to produce intelligible speech, achieving a Mean Opinion Score (MOS) of only 1.86, largely due to inadequate modeling of Perso-Arabic diacritics and language-specific phonotactics.
119+
To address these limitations, we propose <b>Bolbosh</b>, a supervised cross-lingual adaptation strategy based on Optimal Transport Conditional Flow Matching (OT-CFM) within the Matcha-TTS framework. This enables stable alignment under limited paired data. We further introduce a three-stage acoustic enhancement pipeline consisting of dereverberation, silence trimming, and loudness normalization to unify heterogeneous speech sources and stabilize alignment learning. The models's vocabulary is expanded to explicitly encode Kashmiri graphemes, preserving fine-grained vowel distinctions.
120+
Our system achieves a MOS of 3.63 and a Mel-Cepstral Distortion (MCD) of 3.73, substantially outperforming multilingual baselines and establishing a new benchmark for Kashmiri speech synthesis. Our results demonstrate that script-aware and supervised flow-based adaptation are critical for low-resource TTS in diacritic-sensitive languages.
122121
</p>
123122
</div>
124123
</div>
@@ -131,7 +130,7 @@ <h2 class="title is-3 section-title">Abstract</h2>
131130
<h2 class="title is-3 has-text-centered section-title">Audio Samples</h2>
132131
<div class="content has-text-justified mb-5">
133132
<p>
134-
These audio samples were synthesized by <b>Bolbosh</b> using our two high-quality Rasa speakers. Generation was performed using a Conditional Flow Matching (CFM) decoder in 10 ODE steps and synthesized into waveforms via a HiFi-GAN vocoder.
133+
These audio samples were synthesized by <b>Bolbosh</b> using our two high-quality Rasa speakers. Generation was performed using a Conditional Flow Matching (CFM) decoder and synthesized into waveforms via a HiFi-GAN vocoder.
135134
</p>
136135
</div>
137136

@@ -452,11 +451,12 @@ <h4 class="title is-5">Synthesis Engine</h4>
452451
<section class="section" id="BibTeX">
453452
<div class="container is-max-desktop content">
454453
<h2 class="title section-title">BibTeX</h2>
455-
<pre><code>@misc{bolbosh2026,
456-
title={Bolbosh: A Multi-Speaker Text-to-Speech System for Kashmiri},
454+
<pre><code>@inproceedings{ashraf2026bolbosh,
455+
title={Bolbosh: Script-Aware Flow Matching for Kashmiri Text-to-Speech},
457456
author={Ashraf, Tajamul and Zargar, Burhaan Rasheed and Muizz, Saeed Abdul and Mushtaq, Ifrah and Mehdi, Nazima and Gillani, Iqra Altaf and Kak, Aadil Amin and Bashir, Janibul},
457+
booktitle={Proceedings of Interspeech},
458458
year={2026},
459-
howpublished={\url{https://github.com/gaash-lab/Bolbosh}},
459+
url={https://github.com/gaash-lab/Bolbosh}
460460
}</code></pre>
461461
</div>
462462
</section>

0 commit comments

Comments
 (0)