A production-ready Python module for acoustic validation of German phoneme pronunciation. This module implements the Research Brief specification for L2 German pronunciation assessment.
This module provides acoustic feature-based validation to confirm whether a German phoneme was pronounced correctly by a second language learner, using only acoustic evidence from the audio signal.
- Python 3.8 or higher
- PyTorch 2.0+ (install via conda recommended for better compatibility)
If you have the source code locally:
cd german-phoneme-validator
pip install -e .This installs the package in editable mode, so changes to the source code are immediately available.
Install directly from the GitHub repository:
pip install git+https://github.com/SergejKurtasch/german-phoneme-validator.gitIf the package is published to PyPI:
pip install german-phoneme-validatorIf a conda package is available:
conda install -c conda-forge german-phoneme-validatorOr if using a custom channel:
conda install -c your-channel german-phoneme-validatorNote: For best compatibility, install PyTorch via conda:
conda install pytorch torchaudio -c pytorchFor advanced formant extraction features:
pip install -e ".[optional]"-
Model Download: Trained models are automatically downloaded from Hugging Face Hub on first use. An internet connection is required for the initial download. Models are cached locally for subsequent use.
-
Local development: If you have a local
artifacts/directory, it will be used instead of downloading from Hugging Face Hub. This allows for offline development and testing. -
Dependencies only: If you only want to install dependencies without the package itself:
pip install -r requirements.txt
Recommended: Install as package (pip/conda)
After installing the package (see Installation section above), you can import and use it directly:
from german_phoneme_validator import validate_phoneme
import numpy as np
# Using numpy array
audio_array = np.random.randn(3 * 16000).astype(np.float32) # 3 seconds at 16kHz
result = validate_phoneme(
audio=audio_array,
phoneme="/b/",
position_ms=1500.0,
expected_phoneme="/b/"
)
print(f"Correct: {result['is_correct']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Explanation: {result['explanation']}")Note: Models are automatically downloaded from Hugging Face Hub on first use. You don't need to specify artifacts_dir unless you have local models for development.
from german_phoneme_validator import validate_phoneme
result = validate_phoneme(
audio="path/to/audio.wav",
phoneme="/p/",
position_ms=1200.0,
expected_phoneme="/b/"
)from german_phoneme_validator import PhonemeValidator
validator = PhonemeValidator()
available_pairs = validator.get_available_pairs()
print(f"Available pairs: {available_pairs}")
result = validator.validate_phoneme(
audio="audio.wav",
phoneme="/b/",
position_ms=1500.0,
expected_phoneme="/b/"
)Main function for phoneme validation.
Parameters:
audio: Path to WAV file (str/Path) or numpy array (16kHz, mono)phoneme: Target phoneme in IPA notation (e.g.,/b/orb)position_ms: Timestamp in milliseconds where the phoneme occursexpected_phoneme: (Optional) Expected correct phonemeartifacts_dir: (Optional) Path to artifacts directory
Returns:
{
'is_correct': bool, # True/False/None (error)
'confidence': float, # 0.0 to 1.0
'features': dict, # Extracted acoustic features
'explanation': str # Human-readable explanation
}Audio Format:
- WAV file or numpy array
- 16kHz sample rate (auto-resampled)
- Mono channel (auto-converted)
- 3-5 seconds recommended
Phoneme Notation:
- IPA notation with or without brackets:
/b/,b,/p/,p - Case-insensitive
Output:
is_correct: True (correct), False (incorrect), None (error)confidence: Model confidence (0.0-1.0)features: Dictionary of acoustic features (MFCC, formants, VOT, etc.)explanation: Human-readable result description
The system supports 22 phoneme pairs including:
- Plosives:
b-p,d-t,g-k,kʰ-g,tʰ-d - Fricatives:
s-ʃ,ç-ʃ,ç-x,z-s,ts-s,x-k - Vowels:
a-ɛ,aː-a,aɪ̯-aː,aʊ̯-aː,eː-ɛ,iː-ɪ,uː-ʊ,oː-ɔ,ə-ɛ - Others:
ŋ-n,ʁ-ɐ
Use validator.get_available_pairs() to see available pairs in your installation.
-
Clone the repository:
git clone https://github.com/SergejKurtasch/german-phoneme-validator.git cd german-phoneme-validator -
Install dependencies:
pip install -e . # or pip install -r requirements.txt
-
Verify installation:
from german_phoneme_validator import validate_phoneme print("Installation successful!")
-
Automated environment preparation:
./setup_env.sh
This helper script creates a
.venvvirtual environment, upgradespip, installssetuptools<58(required becausegoogleads==3.8.0, a dependency ofparselmouth, still uses the legacyuse_2to3flag), and then installs everything fromrequirements.txt.Optional dependencies (
parselmouth,webrtcvad,pandas,tqdm,torchaudio) remain inrequirements.txtfor convenience, but you can omit them if you only need the core validator. Runpip install -r requirements.txtwithout thesetup_env.shhelper if you prefer manual control.
Note: Models are automatically downloaded from Hugging Face Hub on first use. The module will automatically detect available phoneme pairs. Currently, 22 phoneme pairs are supported. Models are cached locally after first download, so subsequent runs don't require internet access (unless checking for updates).
- SETUP.md - Detailed setup instructions
- PROJECT_STRUCTURE.md - Project structure and components
- TECHNICAL_REPORT.md - Technical documentation and methodology
- example_usage.py - Complete usage examples
- INSTRUCTIONS_HF_UPLOAD.md - Instructions for uploading models to Hugging Face Hub (for maintainers)
The function handles errors gracefully:
- File not found →
is_correct=Nonewith error message - Invalid audio format → Error description in
explanation - Position out of bounds → Error message
- Unsupported phoneme pair → List of available pairs
- Model loading errors → Error description
- Models loaded lazily and cached in memory
- Optimized feature extraction for numpy arrays
- Automatic audio resampling
- First call slower due to model loading
MIT License - see LICENSE file for details.
This module is part of the German Speech Recognition project.