Replies: 4 comments
-
Small correction if we are talking about complete audio duration |
Beta Was this translation helpful? Give feedback.
-
In practice, the whisper segments do not seem to exactly match the actual audio duration. A different option would be to use ffmpeg directly for this purpose. For example |
Beta Was this translation helpful? Give feedback.
-
The Lines 22 to 49 in f680570 But this will of course slower than more direct method like what @glangford suggested. |
Beta Was this translation helpful? Give feedback.
-
In fact, whisper has produced the right duration details in Whisper show the following details: # sample: 13 seconds
[00:00:00,000 --> 00:00:07,000] 项目的物业 富华行物业 管理长安俱乐部
[00:00:07,000 --> 00:00:37,000] 长安俱乐部是十大俱乐部之首 物业费是24块钱一瓶
# sample: 10 seconds
[00:00.000 --> 00:02.760] 项目配有2000坪的会所
[00:02.760 --> 00:06.160] 里面游泳、健身、疗养与一体
[00:06.160 --> 00:34.160] 还有,还设有会客厅、思想、宴厅等
# sample: 21 seconds
[00:00.000 --> 00:04.100] 项目200米范围内配套有王府井三圈
[00:04.100 --> 00:05.900] 在北京独一无二
[00:05.900 --> 00:08.600] 王府中环APM
[00:08.600 --> 00:10.300] 东营IM88
[00:10.300 --> 00:11.800] 东方新天地
[00:11.800 --> 00:14.000] 王府井百货大楼
[00:14.000 --> 00:15.600] 金宝会
[00:15.600 --> 00:18.400] 哈姆雷斯玩具店
[00:18.400 --> 00:20.800] 汇聚一线高端品牌
[00:20.800 --> 00:30.800] 可足以满足日常的购物休闲等生活 As you see, all the end time of the last segment are wrong... How does the VAD function work? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am trying to check if I could get audio file duration from any of whispers' methods.
This seems closer to audio duration:
transcription["segments"][0]["end"]-transcription["segments"][0]["start"]
What would be recommended way to do it?
Beta Was this translation helpful? Give feedback.
All reactions