Skip to content

Agent audio captured as user input (echo/feedback loop) on Firefox Android #640

@andypmw

Description

@andypmw

Summary

When using Conversation.startSession() on Firefox for Android, the AI agent's audio output is fed back into the microphone input stream, causing the agent's voice to be recognized as user speech. This creates a feedback loop where the agent essentially "talks to itself."

The issue does not occur on Chrome (which has built-in echo cancellation).

Environment

  • @elevenlabs/client: ^0.15.0
  • React: ^19.1.0
  • Browser: Firefox 149.0 (Android)
  • OS: Android 15 (security patch March 1, 2026)
  • Device: Motorola G45 5G

Steps to Reproduce

  1. Open the app on Firefox for Android.
  2. Start a conversation session via Conversation.startSession().
  3. The agent speaks its first message.
  4. The agent's audio is picked up by the microphone and interpreted as user input.
  5. The agent responds to its own speech, creating a feedback loop.

Expected Behavior

The agent's audio output should not be captured by the microphone input. Echo cancellation should prevent the agent's voice from being treated as user speech.

Actual Behavior

The agent's audio stream is fed back into the user's microphone input, causing the agent to interpret its own speech as user input. The conversation becomes unusable.

Additional Context

  • Chrome: Works correctly — no echo or feedback loop.
  • ElevenLabs Dashboard preview (Firefox Android): Only a slight echo; the conversation remains functional and does not exhibit the feedback loop.
  • SDK via WebSocket (Firefox Android): Severe feedback loop as described above.

This suggests the issue is specific to how the SDK handles audio routing/echo cancellation, rather than a browser-level limitation. The Dashboard preview appears to handle this case better, even on Firefox.

ElevenLabs support confirmed they have seen this issue more often at the SDK level and directed us to report here.

Code

import { Conversation } from '@elevenlabs/client';

const config = {
  signedUrl: '<signed-url>',
  overrides: {
    agent: {
      prompt: { prompt: '<prompt>' },
      firstMessage: '<first-message>',
      language: '<language>',
    },
    // On mobile, we override STT settings
    stt: {
      model: 'scribe_v2',
      vadSensitivity: 'low',
    },
  },
};

const conversation = await Conversation.startSession({
  ...config,
  onConnect: ({ conversationId }) => { /* ... */ },
  onError: (error) => { /* ... */ },
  onModeChange: ({ mode }) => { /* ... */ },
  onMessage: ({ source, message }) => { /* ... */ },
});

Question

  • Is there a way to configure audio routing to prevent the output from being captured by the input?

ElevenLabs Audio Evidences

Here I attached the Elevenlabs conversation audio recordings, both using the exactly same Firefox, and the Android 15 device.

This one was using the Elevenlabs Dashboard - AI Agent Preview:
firefox-ElevenlabsDashboard-Normal.mp3

And this one was using the Elevenlabs Websocket SDK:
firefox-WebsocketSDK-EchoFeedback.mp3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions