Skip to content

Text-Only Chat Mode: Agent Disconnects When Microphone Track Not PublishedΒ #7

@devqmr

Description

@devqmr

Text-Only Chat Mode: Agent Disconnects When Microphone Track Not Published

Summary

When using text-only chat mode (agent-level or runtime override), the WebRTC connection disconnects immediately after the agent sends its first response if no microphone audio track is published by the client.

Environment

  • SDK Version: 0.3.0 (or latest)
  • Flutter Version: 3.x
  • Platform: Android (tested on emulator sdk gphone64 arm64)
  • Agent Configuration: "Enable chat mode" enabled in Advanced settings

Agent Configuration

The agent has the following settings enabled in the ElevenLabs dashboard:

  • Advanced > Automatic Speech Recognition: "Enable chat mode" = ON
  • Security > Overrides: "Text only" override enabled

Expected Behavior

When using text-only chat mode:

  1. No microphone permission should be required
  2. Connection should remain stable for text-based conversation
  3. Messages should be exchanged via the data channel without audio tracks

Actual Behavior

The connection disconnects immediately after the agent sends its first text response, with the agent participant leaving the LiveKit room.

Detailed Investigation

Attempt 1: Runtime Override with textOnly: true

Code:

await client.startSession(
  agentId: 'agent_id',
  overrides: ConversationOverrides(
    conversation: ConversationSettingsOverrides(textOnly: true),
  ),
);

Result: Connection disconnects mid-response. The server receives the override but still initializes audio infrastructure. When disposing the audio track, the connection fails.

Logs:

[TextChat] πŸ“‘ Status: connected
[TextChat] πŸ› Debug: {conversation_initiation_metadata_event: {conversation_id: conv_xxx, agent_output_audio_format: pcm_48000, user_input_audio_format: pcm_48000}, type: conversation_initiation_metadata}
[TextChat] πŸ“ Agent text part [start]: ""
[TextChat] πŸ“ Agent text part [delta]: "greeting message..."
trackDispose() track is null
[TextChat] ❌ Disconnected: agent

Observation: Even with textOnly: true override, the metadata still shows agent_output_audio_format: pcm_48000, indicating the server still sets up audio infrastructure.

Attempt 2: Agent-Level Chat Mode Only (No Runtime Override)

Removed the runtime textOnly: true override to avoid potential conflicts, relying solely on the agent-level "Enable chat mode" setting.

Code:

await client.startSession(
  agentId: 'agent_id',
  // No overrides - using agent-level chat mode
);

Result:

  • With microphone enabled: Connection works perfectly. Ping/pong keepalive functions, messages exchange successfully.
  • With microphone disabled (skipMicrophone): Connection disconnects after first agent response.

Attempt 3: Skip Microphone Setup

Modified the SDK to add a skipMicrophone parameter that prevents setMicrophoneEnabled(true) from being called in LiveKitManager.connect().

Code:

// In LiveKitManager.connect()
if (!textOnly) {
  await _room!.localParticipant?.setMicrophoneEnabled(
    true,
    audioCaptureOptions: const AudioCaptureOptions(...),
  );
}

Result: Same disconnection issue. The server expects an audio track to be published.

Root Cause Analysis

Based on the investigation:

  1. The ElevenLabs server expects audio tracks even when chat mode is enabled at the agent level
  2. The trackDispose() track is null error occurs when the server tries to handle/dispose audio infrastructure but no track exists
  3. The ParticipantDisconnectedEvent fires because the agent participant leaves the room after the track disposal failure
  4. The connection only remains stable when the client publishes a microphone audio track

Working vs Non-Working Scenarios

Scenario Microphone Track textOnly Override Result
Voice mode Published None βœ… Works
Chat mode (agent-level) Published None βœ… Works
Chat mode (agent-level) NOT Published None ❌ Disconnects
Chat mode (runtime) NOT Published textOnly: true ❌ Disconnects
Chat mode (both) NOT Published textOnly: true ❌ Disconnects

Request

For true text-only chat mode without microphone permission requirements, could the server be updated to:

  1. Not require audio track publication when chat mode is enabled
  2. Handle the absence of client audio track gracefully
  3. Use only the data channel for text-based communication

Workaround (Current)

The only working solution is to enable the microphone (which triggers permission request) even when using text-only chat. This defeats the purpose of chat mode for applications that want to avoid microphone permissions entirely.

Reproduction Steps

  1. Create an agent with "Enable chat mode" in Advanced settings
  2. Connect using the Flutter SDK without enabling microphone:
// Modified SDK to skip microphone
await _liveKitManager.connect(wsUrl, token, textOnly: true);
// Where textOnly skips setMicrophoneEnabled(true)
  1. Observe that connection disconnects after agent's first response

Additional Context

  • Ping/pong keepalive mechanism works correctly
  • Data channel messages are received successfully
  • The disconnection is triggered by ParticipantDisconnectedEvent with agent identity
  • The issue occurs consistently across multiple connection attempts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions