Skip to content

BUG: Transcription returns only the first ~600 characters (the beginning of the audio), ignoring the rest. It doesn't matter if it's 30 seconds or 53 minutes. #31

@GrigK

Description

@GrigK

Izwi Bug Report Info

System

  • OS: Ubuntu 24.04.4 LTS
  • Kernel: Linux 6.8.0-100-generic
  • CPU: 12 cores
  • RAM: 31 GB (13 GB available)
  • GPU: AMD Radeon RX 470/480/570/580/590 (Ellesmere)
  • No NVIDIA GPU - running on CPU only

Version

  • Izwi: v0.1.0-alpha-11
  • CLI & Server

Steps to Reproduce

  1. Start server: ./izwi-server
  2. Transcribe any audio file (tested with 30 sec, 10 min, 53 min files)
  3. Command: izwi transcribe <file> --language ru -f json
  4. Result: Returns same short text (~600 chars) regardless of input file length

Expected Behavior

Full transcription of entire audio file.

Actual Behavior

Only returns first ~600 characters of audio, ignoring rest. Same result for:

  • 30 second audio
  • 10 minute audio
  • 53 minute audio

Logs

Server runs fine, transcription completes, returns truncated text.

Tested Models

  • Qwen3-ASR-0.6B
  • Qwen3-ASR-1.7B

Both return same truncated output.

Additional Notes

  • Model files download successfully (~4.7GB for 1.7B)
  • Server doesn't crash on alpha-11 (unlike earlier tests)
  • CPU inference works (RTF ~0.9 for 30s audio)
  • The output text is always the same ~first 10-15 seconds of audio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions