Skip to content

Feature request: automatic speech recognition (ASR) #4548

@rektide

Description

@rektide

Feature description

Automatic Speech Recognition would be lovely to see on burn! There's amazing models galore out there now, and it would be lovely to get some running. Particularly in a streaming fashion.

Feature motivation

I would really like to be able to talk to my computer and do work via speech to text.

(Optional) Suggest a Solution

Theres a ton of not bad options for what to port. I'll rattle off some interesting ones that have caught my eye:

(And many more)

Streaming, timestamps (Qwen calls it "forced aligner"), and diaritization would all be wonderful. Kimi-Audio notably seems to be very featureful in its capabilities.

Notes

There are some very good community projects for speech to text already! https://github.com/tracel-ai/models?tab=readme-ov-file#community-contributions

Thanks laggui for the mention in the discussion I opened on this feature request: #4376 (comment)

vLLM ticket where they added realtime support over websockets (in case it's useful): vllm-project/vllm#33187

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureThe feature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions