You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/concepts/audio-concepts.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,10 +19,10 @@ The Speech service accepts and provides audio in multiple formats, and the area
19
19
Speech is inherently analog, which is approximated by converting it to a digital signal by sampling. The number of times it's sampled per second is the sampling rate, and how accurate each sample is defined by the bit-depth.
20
20
21
21
### Sample Rate
22
-
How many audio samples there are per second. A higher sampling rate will more accurately reproduce higher frequencies such as music. Humans can typically hear between 20 Hz and 20 kHz but most sensitive up to 5 kHz. The sample rate needs to be twice the highest frequency so for human speech a 16-kHz sampling rate is normally adequate, but a higher sampling rate can provide a higher quality although larger files. The default for both STT and TTS is 16 kHz, however 48 kHz is recommended for audio books. Some source audio is in 8 kHz, especially when coming from legacy telecom systems, which will result in degraded results.
22
+
How many audio samples there are per second. A higher sampling rate will more accurately reproduce higher frequencies such as music. Humans can typically hear between 20 Hz and 20 kHz but most sensitive up to 5 kHz. The sample rate needs to be twice the highest frequency so for human speech a 16kHz sampling rate is normally adequate, but a higher sampling rate can provide a higher quality although larger files. The default for both STT and TTS is 16 kHz, however 48 kHz is recommended for audio books. Some source audio is in 8 kHz, especially when coming from legacy telecom systems, which will result in degraded results.
23
23
24
24
### Bit-depth
25
-
Uncompressed audio samples are each represented by many bits that define its accuracy or resolution. For human speech 13 bits are needed, which is rounded up to a 16-bit sample. A higher bit-depth would be needed for professional audio or music. Legacy telephony systems often use 8 bits with compression, but it isn't ideal.
25
+
Uncompressed audio samples are each represented by many bits that define its accuracy or resolution. For human speech 13 bits are needed, which is rounded up to a 16bit sample. A higher bit-depth would be needed for professional audio or music. Legacy telephony systems often use 8 bits with compression, but it isn't ideal.
26
26
27
27
### Channels
28
28
The speech service typically expects and provides a mono stream. The behavior of stereo and multi-channel files is API specific, for example the REST STT will split a stereo file and generate a result for each channel. TTS is mono only.
@@ -42,7 +42,7 @@ Lossy algorithms might enable greater compression resulting in smaller files or
42
42
MP3 was designed for music rather than speech.
43
43
AMR and AMR-WB were designed to efficiently compress speech for mobile phones, and won't work as well representing music or noise.
44
44
45
-
A-Law and Mu-Law are older algorithms that compress each sample by itself, and converts a 16-bit sample to 8 bit using a logarithmic quantization technique. It should only be used to support legacy systems.
45
+
A-Law and Mu-Law are older algorithms that compress each sample by itself, and converts a 16bit sample to 8 bit using a logarithmic quantization technique. It should only be used to support legacy systems.
46
46
47
47
### Lossless compressed audio
48
48
@@ -51,4 +51,4 @@ Lossless compression allows you to recreate the original uncompressed file. The
51
51
The most common lossless compression is FLAC.
52
52
53
53
## Next steps
54
-
[Use the Speech SDK for audio processing](audio-processing-speech-sdk.md)
54
+
[Use the Speech SDK for audio processing](../audio-processing-speech-sdk.md)
0 commit comments