Serbo-Croatian #16

gxbag · 2022-09-21T21:50:20Z

gxbag
Sep 21, 2022

Very small question: I noticed that you went with 3 seperate languages for Serbo-Croatian (Bosnian, Croatian, and Serbian).

In linguistics contexts (e.g. also on Wiktionary), the language of Serbo-Croatian, sometimes known as BCS (Bosnian/Croatian/Serbian), is not broken into its standard varieties as they are all based on the same subdialect (Eastern Herzegovinian) of the same dialect (Shtokavian) of the same language (Serbo-Croatian).

Why did you break up the language into Bosnian and Croatian and Serbian as that unnecessarily weakens the translation quality?

I would be interested in higher quality translations from Bosnian/Croatian/Serbian speech into English.

jongwook · 2022-09-22T03:53:03Z

jongwook
Sep 22, 2022
Maintainer

This was an overlook in our data processing, when we imported language labels from the VoxLingua107 dataset, which had three separate labels for the language. The WER scores on Fleurs for the three language labels ranges from 16% to 29%, and we're curious how usable Whisper is for Serbo-Croatian and what kinds of errors it makes.

0 replies

gxbag · 2022-09-22T12:56:36Z

gxbag
Sep 22, 2022
Author

I did some informal testing on translation from YouTube videos, interviews, and movie clips as I'm sometimes sent clips in Serbo-Croatian, which I don't really understand myself. I tested with --model small and --model medium and found that autodetect always detects Croatian. If I change it to Serbian, the translation is word by word the same, except the translation sometimes changed "a bit" to "a little" and "between us" to "among us".

So I didn't notice the translation become worse when switching to a lower resource language as I first expected.

Here's a translation with some differences:

Source: https://www.youtube.com/watch?v=B8agRjMdijM

Croatian:

[00:03.800 --> 00:05.560] 8,000 euros, right? [00:05.560 --> 00:07.960] I hope we didn't disappoint you. [00:07.960 --> 00:09.760] It's only for the first three months. [00:09.760 --> 00:12.360] After that, there are increases. [00:12.360 --> 00:17.040] Ten percent quarterly bonuses, annual, half annual, monthly. [00:17.040 --> 00:21.080] And for seniors, we have special daily bonuses. [00:21.080 --> 00:25.800] Namely, whoever shows up before 11 a.m. in the morning, gets 200 euros. [00:25.800 --> 00:30.160] That's the same if you want to have a party after 5 a.m. [00:30.160 --> 00:32.520] That's the basic part. What do you mean? [00:32.520 --> 00:35.080] But seniors don't do petcom at all. [00:35.080 --> 00:36.840] Wow. [00:36.840 --> 00:42.920] Every other Wednesday, we organize a special, what we call, Senior Day, [00:42.920 --> 00:46.680] when you have the opportunity to do all the work you've been busy with [00:46.680 --> 00:50.880] for the past two weeks, which just annoys you or you don't want to do it, [00:50.880 --> 00:53.920] to your younger colleagues. [00:53.920 --> 00:57.680] I'm sorry, I didn't hear you. Go on. [00:57.680 --> 01:03.600] We also give shares of the company, cryptocurrencies worth 5,000 euros, [01:03.600 --> 01:05.280] according to your choice. [01:05.280 --> 01:09.440] Private pensions worth 50,000 dinars a month. [01:09.440 --> 01:13.280] Life insurance is the biggest package, with a special policy [01:13.280 --> 01:17.720] that includes the risk of cancer due to the growth of dandruff on the cheeks. [01:17.720 --> 01:22.600] We have a house assistant. [01:22.600 --> 01:26.760] A personal dog walker. [01:26.760 --> 01:29.680] A massage on Monday, Wednesday, Friday. [01:29.680 --> 01:33.640] Team Buildings, All Inclusive, Sony Pet, [01:33.640 --> 01:38.200] cards for all shows, movies and concerts in the city. [01:38.200 --> 01:41.000] Two electric trotinets. [01:41.000 --> 01:43.280] And a three week flight. [01:43.280 --> 01:49.200] Basically, you put your finger on the card and... [01:49.200 --> 01:53.200] And? [01:53.200 --> 01:56.200] A fit pass. [01:56.200 --> 02:01.200] A fit pass? No one uses it alive. [02:01.200 --> 02:05.200] You pay me a nice six pack and I look like Saša Kovačević. [02:05.200 --> 02:08.200] I'm a senior, I don't have time to train. That's for the director. [02:08.200 --> 02:12.200] I'm sorry, I'm deeply sorry. This must be an old document. [02:12.200 --> 02:16.200] Just a moment. Here it is. [02:16.200 --> 02:19.200] I have something that you'll surely like. [02:19.200 --> 02:24.200] Don't worry, I was free to investigate your interests on social media [02:24.200 --> 02:28.200] and I found out that you're a big fan of charity. [02:28.200 --> 02:31.200] From the company, for your birthday, [02:31.200 --> 02:35.200] three times a year, with a professional instructor, [02:35.200 --> 02:39.200] you get a jump in the pool full of chocolate. [02:39.200 --> 02:42.200] In the previous company, I jumped four times a month. [02:42.200 --> 02:44.200] What is it? What do you offer me? [02:44.200 --> 02:48.200] I'm sorry, I'm a senior, I just don't have time. [02:48.200 --> 02:52.200] Just a second. [02:52.200 --> 02:55.200] Hello? [02:55.200 --> 02:59.200] We have a big tray, you can sit on it. [02:59.200 --> 03:02.200] We can put the chocolate pool on it too. [03:02.200 --> 03:05.200] Put it in the chocolate, it's fantastic. [03:05.200 --> 03:07.200] It's a great opportunity. [03:07.200 --> 03:09.200] We can bring the pool full of chocolate. [03:09.200 --> 03:12.200] Yes, yes, it sounds good. [03:12.200 --> 03:15.200] It's not a bad offer, yes. [03:15.200 --> 03:17.200] I'll come to you to talk. [03:17.200 --> 03:20.200] Just send the driver to me, I'll send you the address. [03:20.200 --> 03:23.200] Please, just a second, stop. [03:23.200 --> 03:25.200] Yes, yes, my hologram works instead of me. [03:25.200 --> 03:27.200] I don't do anything. [03:27.200 --> 03:31.200] There's no need for that building, I can drink myself. [03:31.200 --> 03:35.200] Yes, yes, come on, come on. [03:35.200 --> 03:39.200] Here you go, man. [04:05.200 --> 04:10.200] Thank you for watching.

Serbian:

[00:03.800 --> 00:05.560] 8,000 euros, right? [00:05.560 --> 00:07.960] Well, I hope we didn't bother you. [00:07.960 --> 00:09.760] That's just for the first three months. [00:09.760 --> 00:12.360] After that, there are increases. [00:12.360 --> 00:17.040] Ten percent quarterly bonuses, annual, half a year, monthly. [00:17.040 --> 00:21.080] And for seniors, we have special daily bonuses. [00:21.080 --> 00:25.760] Namely, whoever shows up before 11 a.m. in the morning, gets 200 euros. [00:25.760 --> 00:30.120] That's the same if you want to have a party after 5 a.m. [00:30.120 --> 00:32.480] That's the basic part. What do you mean? [00:32.480 --> 00:35.040] But seniors don't do petcom at all here. [00:35.040 --> 00:36.760] Wow. [00:36.760 --> 00:42.880] Every other Wednesday, we organize a special, what we call, Senior Day, [00:42.880 --> 00:46.640] when you have the opportunity to do all the work that you were busy with [00:46.640 --> 00:50.800] in the past two weeks, which simply annoys you or you don't want to do it, [00:50.800 --> 00:53.840] you can call in a younger colleague. [00:53.840 --> 00:57.600] I'm sorry, I didn't hear you. Go on. [00:57.600 --> 01:03.520] We also give shares of the company, cryptocurrencies worth 5,000 euros, [01:03.520 --> 01:05.200] according to your choice. [01:05.200 --> 01:09.360] Private pensions worth 50,000 dinars a month. [01:09.360 --> 01:13.200] Life insurance is the biggest package, with a special policy [01:13.200 --> 01:18.800] that includes the risk of cancer due to the spread of the virus on the joints. [01:18.800 --> 01:32.000] Home assistant, personal dog walker, massage on Monday, Wednesday, Friday, [01:32.000 --> 01:41.200] team building, all inclusive, sony 5, cards for all shows, films and concerts in the city, [01:41.200 --> 01:52.160] two electric trotinets, and a three week flight, basically you put your finger on the card, and... [01:52.160 --> 01:53.520] And? [01:53.520 --> 01:55.840] A fit pass. [01:59.440 --> 02:01.840] A fit pass, no one uses it alive. [02:01.840 --> 02:05.280] You pay me a nice six pack and I look like Saša Kovačević. [02:05.280 --> 02:07.600] I'm a senior, I don't have time to train, that's for the director. [02:07.600 --> 02:14.640] I'm sorry, I'm deeply sorry, this must be some old document, just a moment, [02:14.640 --> 02:19.280] and here it is, I have something that you will surely like. [02:19.280 --> 02:24.800] You won't mind, I was free to explore your interests on social networks [02:24.800 --> 02:28.400] and I found out that you are a big fan of charity. [02:28.400 --> 02:33.280] From the company for your birthday, so three times a year, [02:33.280 --> 02:39.600] with a professional instructor, you get a jump into a basin full of chocolate. [02:39.600 --> 02:42.720] In the previous company, I jumped four times a month. [02:42.720 --> 02:44.160] What's wrong with you, what do you offer me? [02:44.160 --> 02:49.280] You don't understand, I'm a senior, I just don't have time, so... [02:49.280 --> 02:51.360] Sorry, just a second. [02:53.120 --> 02:54.640] Hello? [02:54.640 --> 02:59.520] No, we have a big tray, you sit on it, [02:59.520 --> 03:03.360] and we can also transfer the basin full of chocolate into the chocolate, [03:03.360 --> 03:06.560] that's great, fantastic, wonderful, wonderful opportunity, [03:06.560 --> 03:09.040] and we can bring the basin full of chocolate. [03:09.040 --> 03:11.760] Yes, yes, it sounds OK. [03:11.760 --> 03:14.080] It sounds OK, yes. [03:14.080 --> 03:15.840] It's not a bad offer, yes. [03:15.840 --> 03:19.280] I'll come to you to talk, just send the driver to me, [03:19.280 --> 03:20.560] I'll send you the address. [03:20.560 --> 03:23.520] Please, just a second, wait. [03:23.520 --> 03:28.160] Yes, yes, my hologram works instead of me, I don't do anything, yes. [03:28.160 --> 03:32.240] There's no need for a team building, I can drink on my own, yes, yes. [03:32.240 --> 03:34.240] Come on, come on. [03:34.240 --> 03:55.200] Here you go, man.

There are differences, but rather minor. The Serbian one is perhaps better, and the original audio is also of the Serbian variety.

0 replies

DedaDev · 2022-10-27T05:12:38Z

DedaDev
Oct 27, 2022

I like this proposal, please join those 3 for a better performance!

0 replies

samrussell · 2023-09-17T21:14:46Z

samrussell
Sep 17, 2023

5 replies

gxbag Sep 17, 2023
Author

This is an unwarrantedly bold statement of yours.

This is not the space to discuss it but there is a simple metric to assess the mutual intellegibility of languages – that is, where to draw language borders. Closely related mutually intelligible languages will share 85+% of their vocabulary in their most commonly used words, that is for example, Danish and Norwegian. Serbian and Croatian and Bosnian share 100% of their vocabulary in their most commonly used words. During the Yugoslav war trials the language was called BCS, in universities still today you can study Serbo-Croatian (also called BCS, also called BCMS to integrate the Montenegrin variety). In the war trials they pretended not to understand each other and had interpreters at first which "translated" by exactly repeating what was said, when it became clear that it is a mockery under cheers and laughter.

There is no need to further discuss the specifics. If one doesn't use German (Austria) and German (Germany) as separately selectable languages it is the obvious choice to not introduce different selections for Bosnian, Croatian, and Serbian.

German (Austria) will have words like Jänner and heuer which are not used in German (Germany).

Serbo-Croatian (Croatia) will have words like the month names which are not used in Serbia.

That does not qualify the separation.

gxbag · 2023-10-30T07:00:01Z

gxbag
Oct 30, 2023
Author

I've found that the language selection is not really important after all. I've selected Norwegian for a Japanese audio file, and it still translated it correctly.

1 reply

Pictor13 Feb 17, 2025

German doesn't work unfortunately.

Sabering1 · 2025-02-15T19:29:55Z

Sabering1
Feb 15, 2025

any updates?

0 replies

Serbo-Croatian #16

Uh oh!

Uh oh!

Replies: 6 comments · 6 replies

Uh oh!

Uh oh!

jongwook Sep 22, 2022 Maintainer

Uh oh!

Uh oh!

gxbag Sep 22, 2022 Author

Uh oh!

Uh oh!

Uh oh!

gxbag Sep 17, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gxbag Oct 30, 2023 Author

Uh oh!

Uh oh!

Replies: 6 comments 6 replies

jongwook
Sep 22, 2022
Maintainer

gxbag
Sep 22, 2022
Author

gxbag Sep 17, 2023
Author

gxbag
Oct 30, 2023
Author