Skip to content

Conversation

@jonathan343
Copy link
Contributor

This PR adds a new client for the Amazon Transcribe Streaming Service.

Summary

Add support for the Amazon Transcribe Streaming service by adding the latest Smithy model and code-generating a new client package (aws-sdk-transcribe-streaming).

Changes

Testing

  • Change into the transcribe directory: cd clients/aws-sdk-transcribe-streaming
  • Create a virtual environment:uv venv -p 3.14
  • Activate the virtual environment: source .venv/bin/activate
  • Install the transcribe client and sounddevice: uv pip install . sounddevice
  • Copy AWS creds to my environment (I'm using the EnvironmentCredentialsResolver)
  • Run the following example:
import asyncio
import sounddevice

from smithy_aws_core.identity import EnvironmentCredentialsResolver
from smithy_core.aio.interfaces.eventstream import EventPublisher, EventReceiver
from aws_sdk_transcribe_streaming.client import (
    TranscribeStreamingClient,
    StartStreamTranscriptionInput,
)
from aws_sdk_transcribe_streaming.models import (
    AudioStreamAudioEvent,
    AudioEvent,
    TranscriptEvent,
    AudioStream,
    TranscriptResultStream,
)
from aws_sdk_transcribe_streaming.config import Config

AWS_REGION = "us-west-2"
ENDPOINT_URI = f"https://transcribestreaming.{AWS_REGION}.amazonaws.com"


async def mic_stream():
    # This function wraps the raw input stream from the microphone forwarding
    # the blocks to an asyncio.Queue.
    loop = asyncio.get_event_loop()
    input_queue = asyncio.Queue()

    def callback(indata, frame_count, time_info, status):
        loop.call_soon_threadsafe(input_queue.put_nowait, (bytes(indata), status))

    # Be sure to use the correct parameters for the audio stream that matches
    # the audio formats described for the source language you'll be using:
    # https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html
    stream = sounddevice.RawInputStream(
        channels=1,
        samplerate=16000,
        callback=callback,
        blocksize=1024 * 2,
        dtype="int16",
    )
    # Initiate the audio stream and asynchronously yield the audio chunks
    # as they become available.
    with stream:
        while True:
            indata, status = await input_queue.get()
            yield indata, status


class TranscriptResultStreamHandler:
    def __init__(self, stream: EventReceiver[TranscriptResultStream]):
        self.stream = stream

    async def handle_events(self):
        """Process generic incoming events from Amazon Transcribe
        and delegate to appropriate sub-handlers.
        """
        async for event in self.stream:
            if isinstance(event.value, TranscriptEvent):
                await self.handle_transcript_event(event.value)

    async def handle_transcript_event(self, event: TranscriptEvent):
        # This handler can be implemented to handle transcriptions as needed.
        # Here's an example to get started.
        results = event.transcript.results
        for result in results:
            for alt in result.alternatives:
                print(alt.transcript)


async def write_chunks(audio_stream: EventPublisher[AudioStream]):
    # This connects the raw audio chunks generator coming from the microphone
    # and passes them along to the transcription stream.
    async for chunk, _ in mic_stream():
        await audio_stream.send(
            AudioStreamAudioEvent(value=AudioEvent(audio_chunk=chunk))
        )


async def main():
    client = TranscribeStreamingClient(
        config=Config(
            endpoint_uri=ENDPOINT_URI,
            region=AWS_REGION,
            aws_credentials_identity_resolver=EnvironmentCredentialsResolver(),
        )
    )

    stream = await client.start_stream_transcription(
        input=StartStreamTranscriptionInput(
            language_code="en-US", media_sample_rate_hertz=16000, media_encoding="pcm"
        )
    )
    _, output_stream = await stream.await_output()

    handler = TranscriptResultStreamHandler(output_stream)
    print("Start talking to see transcription!")
    print("===================================")
    await asyncio.gather(write_chunks(stream.input_stream), handler.handle_events())
    await stream.close()


if __name__ == "__main__":
    asyncio.run(main())

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jonathan343 jonathan343 requested a review from a team as a code owner November 3, 2025 21:15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional? I wouldn't expect this to be a package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug in the code generator. Will be addressed in a seperate PR! I'll update this PR to remove all __init__.py file under the docs directory once the fix is up.

"""
Amazon Transcribe streaming offers four main types of real-time transcription: **Standard**, **Medical**, **Call Analytics**, and **Health Scribe**.
* **Standard transcriptions** are the most common option. Refer to for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to for details.

Refer to what? Is this a source docs issue or are we missing something in the generator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an issue with the code generator. The issue is in the service model documentation:

https://github.com/aws/api-models-aws/blob/main/models/transcribe-streaming/service/2017-10-26/transcribe-streaming-2017-10-26.json#L3717

* **Call Analytics transcriptions** are designed for use with call center audio on two different channels; if you're looking for insight into customer service calls, use this option. Refer to for details.
* **HealthScribe transcriptions** are designed to automatically create clinical notes from patient-clinician conversations using generative AI. Refer to [here] for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to [here] for details.

Hmm, this isn't great either. It looks like this is an issue in the upstream docs. We should make sure the service team is aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, same as I mentioned above. I'll reach out to them. Thanks for catching this!

:param plugins: A list of callables that modify the configuration dynamically.
Changes made by these plugins only apply for the duration of the operation
execution and will not affect any other operation invocations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an issue in the docstring generator for operations? We're getting an unnecessary newline here but not in the top level class docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check to see if these doc issue go away after https://github.com/smithy-lang/smithy-python/pull/592/files

* ``sample-rate``
For more information on streaming with Amazon Transcribe, see `Transcribing streaming audio <https://docs.aws.amazon.com/transcribe/latest/dg/streaming.html>`_
.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This newline also seems potentially wrong?


@dataclass(init=False)
class Config:
"""Configuration for Transcribe."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What'd we use to generate this? It's strange we've named the client correctly (TranscribeStreamingClient) but this is Transcribe which would be a different client. It seems like we potentially have a mismatch or Transcribe themselves may not be making the distinction we expect. Is this the signing name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is using the name of the service type in the smithy model. The bedrock Config object also has a weird docstring.

I just put up a PR to fix this. The description has the before/after docstrings for bedrock runtime and transcribe streaming.
smithy-lang/smithy-python#595

Comment on lines +19 to +32
class RequestTestHTTPClient:
"""An asynchronous HTTP client solely for testing purposes."""

def __init__(self, *, client_config: HTTPClientConfiguration | None = None):
self._client_config = client_config

async def send(
self,
request: HTTPRequest,
*,
request_config: HTTPRequestConfiguration | None = None,
) -> HTTPResponse:
# Raise the exception with the request object to bypass actual request handling
raise TestHttpServiceError(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we generate this? If not, we should be using the fixtures from @alexgromero's PR.

Copy link

@alexgromero alexgromero Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is generated. I asked Jordon about this file and whether it was intentional to generate this in codegen/protocol-test and in the generated clients. He mentioned "It's intentional in that any service can have protocol tests. But the actual file could be omitted if there’s no trait"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants