Skip to content

Workaround: torchcodec ABI incompatibility with PyTorch 2.9.1+cu130 #36

@bobsummerwill

Description

@bobsummerwill

Summary

pyannote.audio 4.0.1 uses torchcodec's AudioDecoder for audio loading, but torchcodec has C++ ABI incompatibility with PyTorch 2.9.1+cu130, causing undefined symbol errors. This issue documents the soundfile-based workaround implemented in our install script.

Error

When running whisperx with diarization enabled:

ImportError: /path/to/venv-nvidia/lib/python3.12/site-packages/torchcodec/_torchcodec.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

This manifests as NameError: name 'AudioDecoder' is not defined when pyannote.audio attempts to use torchcodec.

Root Cause

  • torchcodec has C++ ABI incompatibility with PyTorch 2.9.1+cu130
  • torchaudio 2.9.x uses torchcodec internally (the backend parameter is ignored)
  • pyannote.audio 4.0.1 imports AudioDecoder from torchcodec directly

Related Upstream Issues

Version Constraints

  • pyannote.audio 4.0.2+ pins torch==2.8.0, so 4.0.1 is required for PyTorch 2.9.1 compatibility
  • PyTorch 2.9.1+cu130 is required for other project dependencies

Workaround

The install script (Step 10d) patches pyannote/audio/core/io.py to use a soundfile-based fallback decoder when torchcodec fails to load:

class SoundfileFallbackDecoder:
    """Fallback AudioDecoder using soundfile when torchcodec is unavailable."""
    
    def __init__(self, source):
        self._source = source
        self._waveform = None
        self._sample_rate = None
        self._loaded = False

    def _ensure_loaded(self):
        if not self._loaded:
            import numpy as np
            try:
                # Primary: soundfile (works with WAV, FLAC, etc.)
                data, sr = sf.read(self._source, dtype='float32')
                waveform = torch.from_numpy(data)
                if waveform.ndim == 1:
                    waveform = waveform.unsqueeze(0)
                else:
                    waveform = waveform.T
                self._waveform = waveform
                self._sample_rate = sr
            except Exception as sf_error:
                # Fallback: ffmpeg for MP3 and other formats
                import subprocess
                cmd = ['ffmpeg', '-i', str(self._source), '-f', 'f32le', 
                       '-acodec', 'pcm_f32le', '-ar', '16000', '-ac', '1', 
                       '-loglevel', 'error', '-']
                result = subprocess.run(cmd, capture_output=True)
                if result.returncode != 0:
                    raise RuntimeError(f"soundfile failed: {sf_error}; ffmpeg failed")
                samples = np.frombuffer(result.stdout, dtype=np.float32)
                self._waveform = torch.from_numpy(samples).unsqueeze(0)
                self._sample_rate = 16000
            self._loaded = True

    @property
    def metadata(self) -> AudioStreamMetadata:
        self._ensure_loaded()
        num_channels, num_frames = self._waveform.shape
        return AudioStreamMetadata(
            sample_rate=self._sample_rate,
            duration_seconds_from_header=num_frames / self._sample_rate,
            num_frames=num_frames, num_channels=num_channels)

    def get_all_samples(self) -> AudioSamples:
        self._ensure_loaded()
        return AudioSamples(data=self._waveform, sample_rate=self._sample_rate)

    def get_samples_played_in_range(self, start: float, end: float) -> AudioSamples:
        self._ensure_loaded()
        start_sample = max(0, int(start * self._sample_rate))
        end_sample = min(self._waveform.shape[1], int(end * self._sample_rate))
        return AudioSamples(data=self._waveform[:, start_sample:end_sample], sample_rate=self._sample_rate)

The patch is applied automatically during install_packages_and_venv.sh execution.

Testing

Successfully processed a 90-minute audio file:

  • WhisperX local transcription: ~5.5 minutes
  • Diarization (speaker identification): working
  • All 10 AI post-processors: working

Resolution

This workaround can be removed once:

  1. torchcodec releases a version compatible with PyTorch 2.9.x, OR
  2. pyannote.audio adds its own fallback mechanism, OR
  3. We downgrade to PyTorch 2.8.0 (not desired)

Monitor the upstream torchcodec issues for resolution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions