Is Audio Input available and working for ChatGoogleGenerativeAI ? #8285
-
Checked other resources
Commit to Help
Example Codeimport base64
from langchain_core.messages import HumanMessage
# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"
with open(audio_file_path, "rb") as audio_file:
encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")
message = HumanMessage(
content=[
{"type": "text", "text": "Transcribe the audio."},
{
"type": "media",
"data": encoded_audio, # Use base64 string directly
"mime_type": audio_mime_type,
},
]
)
response = llm.invoke([message]) # Uncomment to run
print(f"Response for audio: {response.content}") DescriptionDoes Langchain Gemini ChatModel (ChatGoogleGenerativeAI) support audio input? I checked that the python version (which I have attached) does support but I couldn't find any documentation for typescript version. Also, I tried running the typescript version of python code and got this error: Invalid media content. System InfoOS: Ubuntu 24.04 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @PaperBoardOfficial! I'm Dosu and I’m helping the langchainjs team. Audio input is technically supported in LangChain JS for ChatGoogleGenerativeAI, but only if you are using a Gemini model that is explicitly multimodal-capable. The codebase includes logic to handle audio blocks (with base64 data or file URIs and a mimeType), but this support is gated by the model's capabilities—if the model does not support audio, you will get an error like "This model does not support audio" or "Invalid media content" if the input format is incorrect. There is no official documentation or example for audio input in the JS docs, and most examples focus on images or text, not audio. To use audio input, you must ensure:
If you still encounter "Invalid media content," double-check that your model is multimodal and that your input object matches the expected structure. There are no known issues or discussions confirming end-to-end audio input working in production, and the lack of documentation or examples suggests this feature is not widely used or tested in JS yet. For reference, see the relevant code handling audio blocks and error messages here. If your use case is critical and you cannot get audio input working, you may need to use the Python version or wait for more robust support and documentation in JS. To reply, just mention my name @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
The issue was that I was using mime_type. But correct approach was to use mimeType. This discrepancy occurred because I had taken code from python documentation. It would be better if example code is mentioned in the documentation of typescript. I used this code and it worked:
|
Beta Was this translation helpful? Give feedback.
The issue was that I was using mime_type. But correct approach was to use mimeType. This discrepancy occurred because I had taken code from python documentation. It would be better if example code is mentioned in the documentation of typescript. I used this code and it worked: