Python package providing wrappers around ivrit.ai's capabilities.
pip install ivrit
The ivrit
package provides audio transcription functionality using multiple engines.
import ivrit
# Transcribe a local audio file
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2")
result = model.transcribe(path="audio.mp3")
# With custom device
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2", device="cpu")
result = model.transcribe(path="audio.mp3")
print(result["text"])
# Transcribe audio from a URL
model = ivrit.load_model(engine="faster-whisper", model="ivrit-ai/whisper-large-v3-turbo-ct2")
result = model.transcribe(url="https://example.com/audio.mp3")
print(result["text"])
# Get results as a stream (generator)
model = ivrit.load_model(engine="faster-whisper", model="base")
for segment in model.transcribe(path="audio.mp3", stream=True, verbose=True):
print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")
# Or use the model directly
model = ivrit.FasterWhisperModel(model="base")
for segment in model.transcribe(path="audio.mp3", stream=True):
print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")
# Access word-level timing
for segment in model.transcribe(path="audio.mp3", stream=True):
print(f"Segment: {segment.text}")
for word in segment.extra_data.get('words', []):
print(f" {word['start']:.2f}s - {word['end']:.2f}s: '{word['word']}'")
For RunPod models, you can use async transcription for better performance:
import asyncio
from ivrit.audio import load_model
async def transcribe_async():
# Load RunPod model
model = load_model(
engine="runpod",
model="large-v3-turbo",
api_key="your-api-key",
endpoint_id="your-endpoint-id"
)
# Stream results asynchronously
async for segment in model.transcribe_async(path="audio.mp3", language="he"):
print(f"{segment.start:.2f}s - {segment.end:.2f}s: {segment.text}")
# Run the async function
asyncio.run(transcribe_async())
Note: Async transcription is only available for RunPod models. The sync transcribe()
method uses the original sync implementation.
Load a transcription model for the specified engine and model.
- engine (
str
): Transcription engine to use. Options:"faster-whisper"
,"stable-ts"
- model (
str
): Model name for the selected engine - device (
str
, optional): Device to use for inference. Default:"auto"
. Options:"auto"
,"cpu"
,"cuda"
,"cuda:0"
, etc. - model_path (
str
, optional): Custom path to the model (for faster-whisper)
TranscriptionModel
object that can be used for transcription
ValueError
: If the engine is not supportedImportError
: If required dependencies are not installed
Transcribe audio using the loaded model.
- path (
str
, optional): Path to the audio file to transcribe - url (
str
, optional): URL to download and transcribe - blob (
str
, optional): Base64 encoded blob data to transcribe - language (
str
, optional): Language code for transcription (e.g., 'he' for Hebrew, 'en' for English) - stream (
bool
, optional): Whether to return results as a generator (True) or full result (False) - only fortranscribe()
- diarize (
bool
, optional): Whether to enable speaker diarization - verbose (
bool
, optional): Whether to enable verbose output - **kwargs: Additional keyword arguments for the transcription model
transcribe()
: Ifstream=True
: Generator yielding transcription segments, Ifstream=False
: Complete transcription result as dictionarytranscribe_async()
: AsyncGenerator yielding transcription segments
ValueError
: If multiple input sources are provided, or none is providedFileNotFoundError
: If the specified path doesn't existException
: For other transcription errors
Note: transcribe_async()
is only available for RunPod models and always returns an AsyncGenerator.
The ivrit package uses an object-oriented design with a base TranscriptionModel
class and specific implementations for each transcription engine.
TranscriptionModel
: Abstract base class for all transcription modelsFasterWhisperModel
: Implementation for the Faster Whisper engine
# Step 1: Load the model
model = ivrit.load_model(engine="faster-whisper", model="base")
# Step 2: Transcribe audio
result = model.transcribe(path="audio.mp3")
# Create model directly
model = ivrit.FasterWhisperModel(model="base")
# Use the model
result = model.transcribe(path="audio.mp3")
For multiple transcriptions, load the model once and reuse it:
# Load model once
model = ivrit.load_model(engine="faster-whisper", model="base")
# Use for multiple transcriptions
result1 = model.transcribe(path="audio1.mp3")
result2 = model.transcribe(path="audio2.mp3")
result3 = model.transcribe(path="audio3.mp3")
pip install ivrit
pip install ivrit[faster-whisper]
Fast and accurate speech recognition using the Faster Whisper model.
Model Class: FasterWhisperModel
Available Models: base
, large
, small
, medium
, large-v2
, large-v3
Features:
- Word-level timing information
- Language detection with confidence scores
- Support for custom devices (CPU, CUDA, etc.)
- Support for custom model paths
- Streaming transcription
Dependencies: faster-whisper>=1.1.1
Stable and reliable transcription using Stable-TS models.
Status: Not yet implemented
git clone <repository-url>
cd ivrit
pip install -e ".[dev]"
pytest
black .
isort .
MIT License - see LICENSE file for details.