Replies: 1 comment
-
Regarding to case 4 and 6. I think "Transcription" means you should write down whatever you hear? If you heard "Avenue" then the transcript shouldn't be "Ave." |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I realize high-quality transcriptions as ground truth is critical to fine-tune Whisper models. A crucial consideration arises: should manual transcriptions in English adhere to normalized text? To address this overarching query, I break it down into the following specific aspects. Giving comments and suggestion for each of them would be greatly appreciated!
Regarding punctuation, such as commas, periods, and hyphens, should retain them in the transcriptions?
In terms of capitalization, should the case mirror the original written form? This includes capitalizing words at the start of sentences and proper nouns.
When transcribing numerals, should complete words be utilized, like "twenty-two," or should numerical figures be employed, as in "21"? Similarly, for expressions like "nine one one," should it be transcribed as is or as "911"?
In the context of abbreviations, should they be included in their full form, for instance, transcribing "Avenue" instead of "Ave." when the speaker mentions an address?
For acronyms presented as a single word but pronounced as a sequence of individual letters (e.g., "U P S"), should the transcription capture the pronunciation ("UPS") or the individual letters ("U P S")?
If a speaker interrupts a word midway, should the transcription encompass the fragment of the incomplete word?
Should the transcriptions account for ambient noise, crosstalk, or any other unintelligible sounds encountered during the recording, and labeled them as e.g. [noise]?
Beta Was this translation helpful? Give feedback.
All reactions