Skip to content

[livekit-plugins-aws] No transcripts generated when End-to-End Encryption (E2EE) is enabled on the clientΒ #5231

@cldsime

Description

@cldsime

Bug Description

When End-to-End Encryption (E2EE) is enabled on the LiveKit client, the server-side STT agent receives audio frames but produces no transcripts. The agent successfully connects to the room, subscribes to the audio track, and streams audio to Amazon Transcribe (status 200), yet no interim or final transcript events are ever returned. The audio frames reaching the agent appear to be encrypted, so Amazon Transcribe receives unintelligible data and cannot detect any speech. Disabling E2EE on the same setup immediately restores normal transcription.

Expected Behavior

When E2EE is enabled, the agent SDK should either provide a mechanism for server-side agents to participate in the E2EE key exchange and decrypt audio tracks before processing, or at minimum detect that incoming audio is encrypted and surface a clear warning or error rather than silently forwarding encrypted bytes to the STT service with no output.

Reproduction Steps

1.Start a local LiveKit server with livekit-server --dev
2.then launch an STT agent using livekit-plugins-aws with python3 test_auto_lang.py dev. 
3.Generate a room token and connect via https://meet.livekit.io/?tab=custom using ws://localhost:7880, making sure to enable the E2EE toggle in the client settings before joining. 
4. Once connected, speak into the microphone for at least 10–15 seconds. The agent logs will show registered worker, Subscribed to audio track, and incrementing Processed X audio frames messages, confirming audio is flowing, but no [Interim], [FINAL], or [Speech started] transcript events will appear. Disconnect, disable E2EE, reconnect with a new token, and speak again β€” transcripts will appear immediately, confirming E2EE is the cause.

Operating System

macOS Tahoe

Models Used

Amazon Transcribe

Package Versions

Package	Version
Python	3.14.2
livekit-server	1.9.11
livekit-agents	1.4.4
livekit-plugins-aws	1.4.4
livekit (rtc)	1.1.2
livekit-api	1.1.0
livekit-protocol	1.1.2
aws-sdk-transcribe-streaming	0.4.0
smithy-aws-core	0

Session/Room/Call IDs

No response

Proposed Solution

The LiveKit agent SDK should provide a mechanism for server-side agents to participate in the E2EE key exchange so they can decrypt audio tracks before forwarding them to external STT services like Amazon Transcribe. If full E2EE participation is not feasible due to architectural constraints, an alternative approach would be to support a "trusted agent" mode where the server provisions the shared encryption key to registered agents, allowing them to decrypt media server-side while maintaining E2EE between all other participants. At a minimum, the SDK should detect when incoming audio tracks are E2EE-encrypted and emit a clear warning log such as "Audio track is E2EE-encrypted β€” STT transcription will not work without decryption" rather than silently processing encrypted bytes that produce no output, which makes debugging extremely difficult.

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions