Skip to content

diarizer: libtorchcodec can't find ffmpeg shared libs after Forky base image bump #7181

@mdmohsin7

Description

@mdmohsin7

The diarizer service (backend/diarizer/) emits a UserWarning at pod startup since the Debian 13 → 14 (Forky) base image migration in #7157 (deployed 2026-05-04 02:38 UTC):

torchcodec is not installed correctly so built-in audio decoding will fail.
…
[start of libtorchcodec loading traceback]
FFmpeg version 7: libavutil.so.59: cannot open shared object file: No such file or directory
FFmpeg version 6: libavutil.so.58: cannot open shared object file: No such file or directory
FFmpeg version 5: libavutil.so.57: cannot open shared object file: No such file or directory
FFmpeg version 4: libavutil.so.56: cannot open shared object file: No such file or directory
[end of libtorchcodec loading traceback].

pyannote/audio/core/io.py:47 raises this when it tries to pre-load libtorchcodec. The dynamic linker can't find any of the four ffmpeg sonames torchcodec accepts.

Evidence

Suspected cause

The Dockerfile sets ENV LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64" — only NVIDIA paths. Debian's apt-shipped ffmpeg installs libavutil.so.59 under /usr/lib/x86_64-linux-gnu/ which is normally picked up via the default /etc/ld.so.conf.d/ config, but something in the Forky + CUDA 13.2 install layering may be shadowing or missing that.

Why this isn't urgent (p3)

  • Non-fatal warning, diarizer's POST /v2/embedding and POST /v1/diarization endpoints continue serving HTTP 200 with normal latency.
  • The diarizer's audio paths use preloaded torch tensors (the warning's recommended fallback: "use audio preloaded in-memory as a {'waveform': (channel, time) torch.Tensor, 'sample_rate': int} dictionary"), not torchcodec's file-decoding API.
  • Verified post-deploy of diarizer: sync 10 dep pins from backend (no langchain stack) #7178 (PYPI sync): 2000/2000 responses HTTP 200 over a 35-min sample.

Possible fixes (cheapest first)

  1. Append /usr/lib/x86_64-linux-gnu to LD_LIBRARY_PATH in backend/diarizer/Dockerfile. Lets the linker find the apt-installed ffmpeg libs alongside the CUDA libs.
  2. Verify ldconfig runs after the CUDA install so the cache picks up both ffmpeg and CUDA paths.
  3. If still missing, install the libavutil dev package explicitly (apt-get install libavutil59 libavcodec61 or equivalent on Forky).

Scope note

Tracking against the OS bump (#7157), not against #7178 (PYPI sync) — the warning predates the PYPI deploy and is unaffected by it.

cc @thainguyensunya

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend Task (python)p3Priority: Backlog (score <14)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions