Audio to Video AI Editor is a web-based editor built with Vanilla JS and Python Flask. It allows you to upload an audio file of yourself speaking, and automatically generates image or video clips that correspond to your audio. This eliminates the need to manually search for and insert images into your videos.
This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). Modifications are only allowed if contributed directly back to the original repository via a Pull Request. Commercial use is strictly prohibited.
1. When an audio file is uploaded, a Whisper model transcribes the audio with timestamps using
Linto's Whisper Library.
2. The transcription is then sent to an LLM (default: Gemma3:12B, configurable in Configuration), which generates image/video description queries.
Examples:
- Image: "1960s Bus Boycott, Black and White"
- Video: "Historical footage of MLK"
3. Using these descriptions, the server fetches images from the internet via DuckDuckGo and displays them to the user.
4. The entire process is automatic; only the audio upload is required.
-
Clone the repository:
git clone https://github.com/lucabae/video-editor
cd video-editor
-
Set up a virtual environment (recommended):
It is highly recommended to use Python 3.9 for compatibility.
python3.9 -m venv venv
Linux/macOS:source venv/bin/activate
Windows:venv\Scripts\activate
-
Install backend dependencies:
pip install -r backend/requirements.txt
-
Run the backend server:
cd backend
python server.py
The Flask server will start, usually athttp://127.0.0.1:8000
-
Open the frontend:
Openfrontend/index.html
in your browser to start using the editor.
1. Select an audio file in the top-left corner.
2. Click "Transcribe audio and generate images" and wait (this may take some time).
3. Edit the generated clips using the editor.
4. Click "Export" when finished (this may also take some time).
5. The final video will be saved in: backend/result.mp4
.
After clips are generated, you can freely modify your video.
- Video Effects: Apply effects to individual clips or all clips.
- Add New Clip: Create clips with custom image descriptions.
- Change Clip Start Time: Adjust start time in seconds:milliseconds format.
- Show Other Image Options: Browse alternative images and apply the one you like.
- Generate New Image: Enter a new search query to get another image.
- Enter Image URL: Replace the current clip image with a custom link.
- Go to Clip in Audio: Play the audio corresponding to a clip.
- Enter YouTube Link: Download and trim a YouTube video based on timestamps; duration matches until the next clip.
- Search YouTube Video: Opens a YouTube search with the LLM-generated description for more accurate results.
- Remove Clip
Configuration file: backend/config.json
Available options:
- PROJECT_FPS (default: 30)
- OLLAMA_MODEL (default: gemma3:12b) – Handles all text processing; downgrading is not recommended.
- WHISPER_MODEL (default: base) – See available models and languages.
- WHISPER_LANGUAGE (default: "es")
- CLIPS_PER_SENTENCE (default: 1)
Contributions are highly appreciated. You can help with:
- Adding new MoviePy effects
- Improving documentation and overall clarity
- Design enhancements for the web UI
- Any relevant code modifications