Skip to content

Audio to Video AI Editor is a web-based editor built with Vanilla JS and Python Flask. It allows you to upload an audio file of yourself speaking, and automatically generates image or video clips that correspond to your audio. This eliminates the need to manually search for and insert images into your videos.

License

Notifications You must be signed in to change notification settings

lucabae/audio2video-editor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AUDIO TO VIDEO AI EDITOR

Audio to Video AI Editor is a web-based editor built with Vanilla JS and Python Flask. It allows you to upload an audio file of yourself speaking, and automatically generates image or video clips that correspond to your audio. This eliminates the need to manually search for and insert images into your videos.

LICENSE

This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). Modifications are only allowed if contributed directly back to the original repository via a Pull Request. Commercial use is strictly prohibited.

HOW IT WORKS

1. When an audio file is uploaded, a Whisper model transcribes the audio with timestamps using Linto's Whisper Library.
2. The transcription is then sent to an LLM (default: Gemma3:12B, configurable in Configuration), which generates image/video description queries. Examples:
- Image: "1960s Bus Boycott, Black and White"
- Video: "Historical footage of MLK"
3. Using these descriptions, the server fetches images from the internet via DuckDuckGo and displays them to the user.
4. The entire process is automatic; only the audio upload is required.

INSTALLATION

  1. Clone the repository:
    git clone https://github.com/lucabae/video-editor
    cd video-editor
  2. Set up a virtual environment (recommended):
    It is highly recommended to use Python 3.9 for compatibility.
    python3.9 -m venv venv
    Linux/macOS: source venv/bin/activate
    Windows: venv\Scripts\activate  
  3. Install backend dependencies:
    pip install -r backend/requirements.txt
  4. Run the backend server:
    cd backend
    python server.py
    The Flask server will start, usually at http://127.0.0.1:8000
  5. Open the frontend:
    Open frontend/index.html in your browser to start using the editor.

HOW TO USE

1. Select an audio file in the top-left corner.
2. Click "Transcribe audio and generate images" and wait (this may take some time).
3. Edit the generated clips using the editor.
4. Click "Export" when finished (this may also take some time).
5. The final video will be saved in: backend/result.mp4.

EDITOR FEATURES

After clips are generated, you can freely modify your video.

Available Features

  • Video Effects: Apply effects to individual clips or all clips.
  • Add New Clip: Create clips with custom image descriptions.
  • Change Clip Start Time: Adjust start time in seconds:milliseconds format.
  • Show Other Image Options: Browse alternative images and apply the one you like.
  • Generate New Image: Enter a new search query to get another image.
  • Enter Image URL: Replace the current clip image with a custom link.
  • Go to Clip in Audio: Play the audio corresponding to a clip.
  • Enter YouTube Link: Download and trim a YouTube video based on timestamps; duration matches until the next clip.
  • Search YouTube Video: Opens a YouTube search with the LLM-generated description for more accurate results.
  • Remove Clip

CONFIGURATION

Configuration file: backend/config.json
Available options:

  • PROJECT_FPS (default: 30)
  • OLLAMA_MODEL (default: gemma3:12b) – Handles all text processing; downgrading is not recommended.
  • WHISPER_MODEL (default: base) – See available models and languages.
  • WHISPER_LANGUAGE (default: "es")
  • CLIPS_PER_SENTENCE (default: 1)

CONTRIBUTIONS

Contributions are highly appreciated. You can help with:

  • Adding new MoviePy effects
  • Improving documentation and overall clarity
  • Design enhancements for the web UI
  • Any relevant code modifications

About

Audio to Video AI Editor is a web-based editor built with Vanilla JS and Python Flask. It allows you to upload an audio file of yourself speaking, and automatically generates image or video clips that correspond to your audio. This eliminates the need to manually search for and insert images into your videos.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published