Is Audio Input available and working for ChatGoogleGenerativeAI ? #8285

PaperBoardOfficial · 2025-05-30T12:23:18Z

PaperBoardOfficial
May 30, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

import base64

from langchain_core.messages import HumanMessage

# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"


with open(audio_file_path, "rb") as audio_file:
    encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Transcribe the audio."},
        {
            "type": "media",
            "data": encoded_audio,  # Use base64 string directly
            "mime_type": audio_mime_type,
        },
    ]
)
response = llm.invoke([message])  # Uncomment to run
print(f"Response for audio: {response.content}")

Description

Does Langchain Gemini ChatModel (ChatGoogleGenerativeAI) support audio input? I checked that the python version (which I have attached) does support but I couldn't find any documentation for typescript version. Also, I tried running the typescript version of python code and got this error: Invalid media content.
I guess Langchain JS doesn't support audio input.

System Info

OS: Ubuntu 24.04

Answered by PaperBoardOfficial

May 30, 2025

The issue was that I was using mime_type. But correct approach was to use mimeType. This discrepancy occurred because I had taken code from python documentation. It would be better if example code is mentioned in the documentation of typescript. I used this code and it worked:

  const audioBuffer = await audioBlob.arrayBuffer();
      const base64Audio = btoa(
        String.fromCharCode(...new Uint8Array(audioBuffer))
      );

      const transcriptionMessage = new HumanMessage({
        content: [
          {
            type: "text",
            text: "Transcribe this audio. Return only the transcribed text without any additional formatting or explanations.",
          },
          { …

View full answer

@PaperBoardOfficial · 2025-05-30T12:25:57Z

dosubot[bot]
bot May 30, 2025

Hi @PaperBoardOfficial! I'm Dosu and I’m helping the langchainjs team.

Audio input is technically supported in LangChain JS for ChatGoogleGenerativeAI, but only if you are using a Gemini model that is explicitly multimodal-capable. The codebase includes logic to handle audio blocks (with base64 data or file URIs and a mimeType), but this support is gated by the model's capabilities—if the model does not support audio, you will get an error like "This model does not support audio" or "Invalid media content" if the input format is incorrect. There is no official documentation or example for audio input in the JS docs, and most examples focus on images or text, not audio.

To use audio input, you must ensure:

You are using a Gemini model that supports audio (not all do).
Your audio block is formatted with the correct properties: { mimeType: "audio/mpeg", data: "<base64 string>" } for inline data, or { mimeType: "...", fileUri: "..." } for file URIs.
The message content structure matches what the JS implementation expects, which may differ from the Python version.

If you still encounter "Invalid media content," double-check that your model is multimodal and that your input object matches the expected structure. There are no known issues or discussions confirming end-to-end audio input working in production, and the lack of documentation or examples suggests this feature is not widely used or tested in JS yet. For reference, see the relevant code handling audio blocks and error messages here.

If your use case is critical and you cannot get audio input working, you may need to use the Python version or wait for more robust support and documentation in JS.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

PaperBoardOfficial May 30, 2025
Author

@dosu I used gemini-2.0-flash and it supports audio input. Could you provide an example typescript code snippet so that I can input audio into ChatGoogleGenerativeAI

dosubot[bot] bot May 30, 2025

Here’s an example TypeScript code snippet for sending audio input to ChatGoogleGenerativeAI using the gemini-2.0-flash model in LangChain JS. This approach matches the expected structure in the codebase: the audio must be base64-encoded, and the content block must use the correct property names (mimeType and data) for inline audio data.

import { ChatGoogleGenerativeAI } from "langchain/chat_models/google_genai";
import { HumanMessage } from "langchain/schema";
import * as fs from "fs";

// Read and encode your audio file as base64
const audioFilePath = "example_audio.mp3";
const audioMimeType = "audio/mpeg";
const audioBuffer = fs.readFileSync(audioFilePath);
const encodedAudio = audioBuffer.toString("base64");

// Construct the message content
const message = new HumanMessage({
  content: [
    { type: "text", text: "Transcribe the audio." },
    {
      type: "media",
      mimeType: audioMimeType,
      data: encodedAudio,
    },
  ],
});

// Instantiate the model (ensure your API key is set in the environment or passed here)
const llm = new ChatGoogleGenerativeAI({
  model: "gemini-2.0-flash",
  // apiKey: "YOUR_GOOGLE_API_KEY", // Optional if set in env
});

(async () => {
  const response = await llm.invoke([message]);
  console.log("Response for audio:", response.content);
})();

Key points:

The audio block must use mimeType and data (not mime_type or other variants).
The model name must be "gemini-2.0-flash" (or another multimodal Gemini model).
The message content array can include both text and media blocks.

This structure aligns with how LangChain JS processes audio input for Gemini models, as seen in the codebase’s content block conversion utilities and message handling logic (reference, reference). If you still see an "Invalid media content" error, double-check the property names and ensure the model is set to a multimodal Gemini version.

To reply, just mention my name @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

PaperBoardOfficial · 2025-05-30T12:50:51Z

PaperBoardOfficial
May 30, 2025
Author

The issue was that I was using mime_type. But correct approach was to use mimeType. This discrepancy occurred because I had taken code from python documentation. It would be better if example code is mentioned in the documentation of typescript. I used this code and it worked:

  const audioBuffer = await audioBlob.arrayBuffer();
      const base64Audio = btoa(
        String.fromCharCode(...new Uint8Array(audioBuffer))
      );

      const transcriptionMessage = new HumanMessage({
        content: [
          {
            type: "text",
            text: "Transcribe this audio. Return only the transcribed text without any additional formatting or explanations.",
          },
          { type: "media", data: base64Audio, mimeType: "audio/wav" },
        ],
      });

      const transcriptionResponse = await llm.invoke([transcriptionMessage]);

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is Audio Input available and working for ChatGoogleGenerativeAI ? #8285

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is Audio Input available and working for ChatGoogleGenerativeAI ? #8285

Uh oh!

PaperBoardOfficial May 30, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 2 comments · 2 replies

Uh oh!

dosubot[bot] bot May 30, 2025

Uh oh!

PaperBoardOfficial May 30, 2025 Author

Uh oh!

dosubot[bot] bot May 30, 2025

Uh oh!

PaperBoardOfficial May 30, 2025 Author

PaperBoardOfficial
May 30, 2025

Replies: 2 comments 2 replies

dosubot[bot]
bot May 30, 2025

PaperBoardOfficial May 30, 2025
Author

PaperBoardOfficial
May 30, 2025
Author