Workaround: torchcodec ABI incompatibility with PyTorch 2.9.1+cu130

## Summary

pyannote.audio 4.0.1 uses torchcodec's `AudioDecoder` for audio loading, but torchcodec has C++ ABI incompatibility with PyTorch 2.9.1+cu130, causing `undefined symbol` errors. This issue documents the soundfile-based workaround implemented in our install script.

## Error

When running whisperx with diarization enabled:
```
ImportError: /path/to/venv-nvidia/lib/python3.12/site-packages/torchcodec/_torchcodec.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
```

This manifests as `NameError: name 'AudioDecoder' is not defined` when pyannote.audio attempts to use torchcodec.

## Root Cause

- **torchcodec** has C++ ABI incompatibility with PyTorch 2.9.1+cu130
- **torchaudio 2.9.x** uses torchcodec internally (the `backend` parameter is ignored)
- **pyannote.audio 4.0.1** imports `AudioDecoder` from torchcodec directly

## Related Upstream Issues

- meta-pytorch/torchcodec#912
- meta-pytorch/torchcodec#1006
- meta-pytorch/torchcodec#998
- meta-pytorch/torchcodec#995

## Version Constraints

- **pyannote.audio 4.0.2+** pins `torch==2.8.0`, so 4.0.1 is required for PyTorch 2.9.1 compatibility
- **PyTorch 2.9.1+cu130** is required for other project dependencies

## Workaround

The install script (Step 10d) patches `pyannote/audio/core/io.py` to use a soundfile-based fallback decoder when torchcodec fails to load:

```python
class SoundfileFallbackDecoder:
    """Fallback AudioDecoder using soundfile when torchcodec is unavailable."""
    
    def __init__(self, source):
        self._source = source
        self._waveform = None
        self._sample_rate = None
        self._loaded = False

    def _ensure_loaded(self):
        if not self._loaded:
            import numpy as np
            try:
                # Primary: soundfile (works with WAV, FLAC, etc.)
                data, sr = sf.read(self._source, dtype='float32')
                waveform = torch.from_numpy(data)
                if waveform.ndim == 1:
                    waveform = waveform.unsqueeze(0)
                else:
                    waveform = waveform.T
                self._waveform = waveform
                self._sample_rate = sr
            except Exception as sf_error:
                # Fallback: ffmpeg for MP3 and other formats
                import subprocess
                cmd = ['ffmpeg', '-i', str(self._source), '-f', 'f32le', 
                       '-acodec', 'pcm_f32le', '-ar', '16000', '-ac', '1', 
                       '-loglevel', 'error', '-']
                result = subprocess.run(cmd, capture_output=True)
                if result.returncode != 0:
                    raise RuntimeError(f"soundfile failed: {sf_error}; ffmpeg failed")
                samples = np.frombuffer(result.stdout, dtype=np.float32)
                self._waveform = torch.from_numpy(samples).unsqueeze(0)
                self._sample_rate = 16000
            self._loaded = True

    @property
    def metadata(self) -> AudioStreamMetadata:
        self._ensure_loaded()
        num_channels, num_frames = self._waveform.shape
        return AudioStreamMetadata(
            sample_rate=self._sample_rate,
            duration_seconds_from_header=num_frames / self._sample_rate,
            num_frames=num_frames, num_channels=num_channels)

    def get_all_samples(self) -> AudioSamples:
        self._ensure_loaded()
        return AudioSamples(data=self._waveform, sample_rate=self._sample_rate)

    def get_samples_played_in_range(self, start: float, end: float) -> AudioSamples:
        self._ensure_loaded()
        start_sample = max(0, int(start * self._sample_rate))
        end_sample = min(self._waveform.shape[1], int(end * self._sample_rate))
        return AudioSamples(data=self._waveform[:, start_sample:end_sample], sample_rate=self._sample_rate)
```

The patch is applied automatically during `install_packages_and_venv.sh` execution.

## Testing

Successfully processed a 90-minute audio file:
- WhisperX local transcription: ~5.5 minutes
- Diarization (speaker identification): working
- All 10 AI post-processors: working

## Resolution

This workaround can be removed once:
1. torchcodec releases a version compatible with PyTorch 2.9.x, OR
2. pyannote.audio adds its own fallback mechanism, OR
3. We downgrade to PyTorch 2.8.0 (not desired)

Monitor the upstream torchcodec issues for resolution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround: torchcodec ABI incompatibility with PyTorch 2.9.1+cu130 #36

Summary

Error

Root Cause

Related Upstream Issues

Version Constraints

Workaround

Testing

Resolution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Workaround: torchcodec ABI incompatibility with PyTorch 2.9.1+cu130 #36

Description

Summary

Error

Root Cause

Related Upstream Issues

Version Constraints

Workaround

Testing

Resolution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions