Skip to content

Commit 718dca0

Browse files
committed
per PM feedback
1 parent bb12bb3 commit 718dca0

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

articles/ai-services/speech-service/concepts/audio-concepts.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@ The Speech service accepts and provides audio in multiple formats, and the area
1919
Speech is inherently analog, which is approximated by converting it to a digital signal by sampling. The number of times it's sampled per second is the sampling rate, and how accurate each sample is defined by the bit-depth.
2020

2121
### Sample Rate
22-
How many audio samples there are per second. A higher sampling rate will more accurately reproduce higher frequencies such as music. Humans can typically hear between 20 Hz and 20 kHz but most sensitive up to 5 kHz. The sample rate needs to be twice the highest frequency so for human speech a 16-kHz sampling rate is normally adequate, but a higher sampling rate can provide a higher quality although larger files. The default for both STT and TTS is 16 kHz, however 48 kHz is recommended for audio books. Some source audio is in 8 kHz, especially when coming from legacy telecom systems, which will result in degraded results.
22+
How many audio samples there are per second. A higher sampling rate will more accurately reproduce higher frequencies such as music. Humans can typically hear between 20 Hz and 20 kHz but most sensitive up to 5 kHz. The sample rate needs to be twice the highest frequency so for human speech a 16 kHz sampling rate is normally adequate, but a higher sampling rate can provide a higher quality although larger files. The default for both STT and TTS is 16 kHz, however 48 kHz is recommended for audio books. Some source audio is in 8 kHz, especially when coming from legacy telecom systems, which will result in degraded results.
2323

2424
### Bit-depth
25-
Uncompressed audio samples are each represented by many bits that define its accuracy or resolution. For human speech 13 bits are needed, which is rounded up to a 16-bit sample. A higher bit-depth would be needed for professional audio or music. Legacy telephony systems often use 8 bits with compression, but it isn't ideal.
25+
Uncompressed audio samples are each represented by many bits that define its accuracy or resolution. For human speech 13 bits are needed, which is rounded up to a 16 bit sample. A higher bit-depth would be needed for professional audio or music. Legacy telephony systems often use 8 bits with compression, but it isn't ideal.
2626

2727
### Channels
2828
The speech service typically expects and provides a mono stream. The behavior of stereo and multi-channel files is API specific, for example the REST STT will split a stereo file and generate a result for each channel. TTS is mono only.
@@ -42,7 +42,7 @@ Lossy algorithms might enable greater compression resulting in smaller files or
4242
MP3 was designed for music rather than speech.
4343
AMR and AMR-WB were designed to efficiently compress speech for mobile phones, and won't work as well representing music or noise.
4444

45-
A-Law and Mu-Law are older algorithms that compress each sample by itself, and converts a 16-bit sample to 8 bit using a logarithmic quantization technique. It should only be used to support legacy systems.
45+
A-Law and Mu-Law are older algorithms that compress each sample by itself, and converts a 16 bit sample to 8 bit using a logarithmic quantization technique. It should only be used to support legacy systems.
4646

4747
### Lossless compressed audio
4848

@@ -51,4 +51,4 @@ Lossless compression allows you to recreate the original uncompressed file. The
5151
The most common lossless compression is FLAC.
5252

5353
## Next steps
54-
[Use the Speech SDK for audio processing](audio-processing-speech-sdk.md)
54+
[Use the Speech SDK for audio processing](../audio-processing-speech-sdk.md)

0 commit comments

Comments
 (0)