-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
pyannote.audio 4.0.1 uses torchcodec's AudioDecoder for audio loading, but torchcodec has C++ ABI incompatibility with PyTorch 2.9.1+cu130, causing undefined symbol errors. This issue documents the soundfile-based workaround implemented in our install script.
Error
When running whisperx with diarization enabled:
ImportError: /path/to/venv-nvidia/lib/python3.12/site-packages/torchcodec/_torchcodec.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
This manifests as NameError: name 'AudioDecoder' is not defined when pyannote.audio attempts to use torchcodec.
Root Cause
- torchcodec has C++ ABI incompatibility with PyTorch 2.9.1+cu130
- torchaudio 2.9.x uses torchcodec internally (the
backendparameter is ignored) - pyannote.audio 4.0.1 imports
AudioDecoderfrom torchcodec directly
Related Upstream Issues
RuntimeError: Could not load libtorchcodecwhen torchcodec being installed along with torch 2.9 RC meta-pytorch/torchcodec#912- v8.1 - torch 2.9.0+cu130 issue with details meta-pytorch/torchcodec#1006
- possible (2.9.0+cu130) issue meta-pytorch/torchcodec#998
- Improve UX by letting the user know that their
torchandtorchcodecversions are incompatible meta-pytorch/torchcodec#995
Version Constraints
- pyannote.audio 4.0.2+ pins
torch==2.8.0, so 4.0.1 is required for PyTorch 2.9.1 compatibility - PyTorch 2.9.1+cu130 is required for other project dependencies
Workaround
The install script (Step 10d) patches pyannote/audio/core/io.py to use a soundfile-based fallback decoder when torchcodec fails to load:
class SoundfileFallbackDecoder:
"""Fallback AudioDecoder using soundfile when torchcodec is unavailable."""
def __init__(self, source):
self._source = source
self._waveform = None
self._sample_rate = None
self._loaded = False
def _ensure_loaded(self):
if not self._loaded:
import numpy as np
try:
# Primary: soundfile (works with WAV, FLAC, etc.)
data, sr = sf.read(self._source, dtype='float32')
waveform = torch.from_numpy(data)
if waveform.ndim == 1:
waveform = waveform.unsqueeze(0)
else:
waveform = waveform.T
self._waveform = waveform
self._sample_rate = sr
except Exception as sf_error:
# Fallback: ffmpeg for MP3 and other formats
import subprocess
cmd = ['ffmpeg', '-i', str(self._source), '-f', 'f32le',
'-acodec', 'pcm_f32le', '-ar', '16000', '-ac', '1',
'-loglevel', 'error', '-']
result = subprocess.run(cmd, capture_output=True)
if result.returncode != 0:
raise RuntimeError(f"soundfile failed: {sf_error}; ffmpeg failed")
samples = np.frombuffer(result.stdout, dtype=np.float32)
self._waveform = torch.from_numpy(samples).unsqueeze(0)
self._sample_rate = 16000
self._loaded = True
@property
def metadata(self) -> AudioStreamMetadata:
self._ensure_loaded()
num_channels, num_frames = self._waveform.shape
return AudioStreamMetadata(
sample_rate=self._sample_rate,
duration_seconds_from_header=num_frames / self._sample_rate,
num_frames=num_frames, num_channels=num_channels)
def get_all_samples(self) -> AudioSamples:
self._ensure_loaded()
return AudioSamples(data=self._waveform, sample_rate=self._sample_rate)
def get_samples_played_in_range(self, start: float, end: float) -> AudioSamples:
self._ensure_loaded()
start_sample = max(0, int(start * self._sample_rate))
end_sample = min(self._waveform.shape[1], int(end * self._sample_rate))
return AudioSamples(data=self._waveform[:, start_sample:end_sample], sample_rate=self._sample_rate)The patch is applied automatically during install_packages_and_venv.sh execution.
Testing
Successfully processed a 90-minute audio file:
- WhisperX local transcription: ~5.5 minutes
- Diarization (speaker identification): working
- All 10 AI post-processors: working
Resolution
This workaround can be removed once:
- torchcodec releases a version compatible with PyTorch 2.9.x, OR
- pyannote.audio adds its own fallback mechanism, OR
- We downgrade to PyTorch 2.8.0 (not desired)
Monitor the upstream torchcodec issues for resolution.