A comprehensive web-based system for evaluating speech recognition model performance and managing speech datasets.
- Upload and process evaluation JSON files containing transcription results
- Compare Character Error Rate (CER) metrics between different models
- Visualize performance metrics with interactive charts
- View detailed transcription results in a tabulated format
- Support for multiple model comparisons
- Manage and organize speech datasets
- View dataset statistics and information
- Process raw and processed audio files
- Track dataset versions and modifications
- Backend: FastAPI (Python)
- Frontend: Bootstrap 5, Chart.js
- Database: SQLite (for temporary data storage)
- File Processing: JSON, Audio file handling
- Internationalization: Multi-language support (English/Korean)
- Clone the repository:
git clone [repository-url]
cd speech-recognition-evaluation
- Create and activate virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Start the server:
uvicorn app.main:app --reload
- Access the web interface:
http://localhost:8000
- Navigate to the evaluation page:
http://localhost:8000/evaluation
- Upload your evaluation JSON file with the following structure:
{
"transcriptions": {
"model-name": [
{
"audio_filepath": "path/to/audio.wav",
"pred_sentence": "predicted transcription"
}
]
},
"CER metric": {
"model-name": 0.123
}
}
app/
├── main.py # FastAPI application entry point
├── config.py # Configuration settings
├── models/
│ ├── database.py # Database models
│ └── evaluation.py # Evaluation data models
├── routers/
│ ├── evaluation.py # Evaluation routes
│ ├── dataset.py # Dataset management routes
│ └── main_routes.py # Main page routes
├── services/
│ └── evaluation_service.py # Evaluation processing logic
├── templates/ # HTML templates
└── utils/
└── i18n.py # Internationalization support
- Model Comparison: Compare multiple speech recognition models side by side
- Metric Visualization: Interactive charts showing CER metrics
- Detailed Results: View complete transcription results in a searchable table
- Error Analysis: Identify and analyze transcription errors
- Clean and intuitive web interface
- Responsive design for various screen sizes
- Interactive data visualization
- Drag-and-drop file upload support
- English (Default)
- Korean (한국어)
Language can be switched using the selector in the navigation bar.
Contributions are welcome! Please feel free to submit a Pull Request.