Small Azure Functions HTTP endpoint for Azure Speech-to-Text Batch Transcription. Used as part of the video-annotator project.
Workflow:
- Provide a Blob SAS URL to an audio file (
media_url) - Function submits a batch job and returns a
job_url - Call again with
job_urlto get status; whenSucceeded, it returns normalized transcript output.
-
TranscribeHttp/__init__.py— HTTP function (submit + status/result)function.json— HTTP trigger bindings
-
shared/speech_batch.py— Speech + Blob helpers (submit job, list files, normalize output) -
make_sas.py— generate a SAS URL for a blob (for testing) -
test_transcribe.py— local test driver -
local.settings.json— local dev settings (do not commit secrets) -
requirements.txt,host.json
- Python 3.11+
- Azure Functions Core Tools v4
- Azure Speech resource (
SPEECH_KEY, region or endpoint) - Azure Storage account with a container that holds your audio (defaults assume
speech-input)
Install deps:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtLocal config is read from local.settings.json:
-
SPEECH_KEY -
SPEECH_REGION(optional ifSPEECH_ENDPOINTis set) -
SPEECH_ENDPOINT(e.g.https://eastus.api.cognitive.microsoft.com/) -
SPEECH_API_VERSION(e.g.2025-10-15) -
Storage vars if you’re using the helper scripts:
AZURE_STORAGE_ACCOUNTAZURE_STORAGE_KEYINPUT_CONTAINER(defaultspeech-input)OUTPUT_CONTAINER(defaultspeech-output)
Start the Functions host:
func startIf you already uploaded an audio file to Blob Storage, run:
python make_sas.pyIt prints a SAS URL you can use as media_url.
curl -sS -X POST "http://localhost:7071/api/TranscribeHttp" \
-H "Content-Type: application/json" \
-d '{
"media_url": "https://<account>.blob.core.windows.net/<container>/<file>.m4a?<sas>",
"locale": "en-US",
"display_name": "measles_short"
}' | jqResponse (202):
{ "job_url": "https://.../speechtotext/transcriptions/<id>?api-version=..." }curl -sS -X POST "http://localhost:7071/api/TranscribeHttp" \
-H "Content-Type: application/json" \
-d '{ "job_url": "PASTE_JOB_URL_HERE" }' | jqPossible responses:
- Running:
{ "status": "Running", "job_url": "..." }- Failed:
{ "status": "Failed", "job": { ... } }- Succeeded:
{ "status": "Succeeded", "result": { "utterances": [...], "words": [...] } }- Batch jobs may sit in
Runningdue to queueing; poll untilSucceeded/Failed. - Normalization returns
utterancesand (if present)wordswith timestamps in milliseconds. - The shared submit payload uses
channels: [0]to avoid duplicated results from stereo audio.
Set the same configuration values as app settings in your Azure Function App, then publish:
func azure functionapp publish <APP_NAME> --python