This project is a speech-to-text and file retrieval system using SenseVoice for audio transcription and Elasticsearch for file indexing and searching. Users can upload audio files, transcribe them to text, and search for file names in a specified directory based on the transcription results.
- Audio Transcription: Leverages SenseVoice for high-quality speech-to-text conversion.
- File Indexing: Automatically indexes files in a specified folder using Elasticsearch.
- File Searching: Retrieves matching file names based on transcription results.
- User Interface: A simple and intuitive web interface built with Gradio.
git clone https://github.com/yourusername/speech-to-file-retrieval.git
cd speech-to-file-retrieval
Install required Python packages:
pip install -r requirements.txt
Ensure an Elasticsearch instance is running. If using Docker:
docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:8.0.0
DOCUMENT_FOLDER
: Path to the folder containing files to be indexed. Set as/app/documents
in the code. You can modify this as needed.- Model Directory: The default model directory is
iic/SenseVoiceSmall
. Update themodel_dir
path in the code if necessary.
On startup, the system automatically indexes all files in the specified folder (DOCUMENT_FOLDER
). To manually reindex the files:
- Click the Index Files button on the web interface.
- Upload an audio file.
- The system transcribes the audio into text using SenseVoice.
- It searches for file names in the indexed directory matching the transcription result.
- The top 5 matching files are displayed for download.
Run the application locally:
python app.py
Access the interface at http://localhost:7860
.
Build and run the application using Docker:
-
Build the Docker image:
docker build -t speech-to-file-retrieval .
-
Run the Docker container:
docker run -d -p 7860:7860 speech-to-file-retrieval
Access the interface at http://localhost:7860
.
The application includes the following core functionalities:
- Automatically indexes files from the
DOCUMENT_FOLDER
into Elasticsearch.
- Processes audio files using the SenseVoice model and converts them to text.
- Searches file names based on the transcribed text and retrieves matching results.
- Place files in the
DOCUMENT_FOLDER
directory. - Upload an audio file through the web interface.
- The system transcribes the audio and searches for matching file names.
- Matching results are displayed as downloadable links.
- Gradio: User interface for file uploads and results display.
- Elasticsearch: Indexing and searching files.
- SenseVoice: Speech-to-text transcription model.
- Python: Core application logic.
- GPU is recommended for faster transcription. If GPU is unavailable, set
device="cpu"
in the code. - Ensure Elasticsearch is properly configured and accessible before starting the application.
Feel free to fork this repository and contribute. Submit a pull request with improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE
file for details.