Replies: 2 comments
-
You cannot do this in Whisper, however you can post-process using a tool such as spaCy to identify sentence breaks, named entities etc. For example. it is possible to post-process a word-level timestamped .json to create an .srt which follows sentence boundaries and breaks lines at grammatically selected boundaries. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks! My current approach: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Does Whisper have any explicit or implicit understanding of NER?
Should it be able to label named entities?
I am asking because one of the issues I am finding is that, if you split sentences on ". ", punctuated named entities such as "U.S. " or "John W. Doe" pose a problem.
Of course, this issue can be addressed a posteriori, but one could save considerable resources if we had them annotated a priori.
Best,
Ed
Beta Was this translation helpful? Give feedback.
All reactions