Automated Slicing: This is a project that realizes the automated slicing of recorded videos through ASR + LLM Agent.
- One day, while you're livestreaming, on a whim, you start fantasizing. You fantasize about becoming one of those "super slicing guys" and making video clips for your beloved streamer or the VTuber you adore. What should you do?
- You have the intention, but you don't have the time to edit. What's more, you haven't even learned how to edit. What should you do?
- You think it over, gather your courage to obtain the recorded broadcast and prepare to edit. But then you realize that this is just the starting point.🙂
↕️ - Screening, aligning the timeline ⌚️, coming up with an appropriate title 🙋, it's all too complicated. 😩 If there's a day when you haven't watched the live stream 🏃♂️ and you have to watch the entire recorded broadcast before slicing, what should you do? 😭
- You say that your love is charming and you're really determined, but it still requires a lot of operation. So, is there any simple yet powerful way to slice the video?
- Yes, there is, bro.
- Take a look at "auto-slicing". You take care of the recorded broadcast, and it takes care of the slicing.
- Currently, it is still at a relatively ordinary stage. Waiting for multiple rounds of optimization.
Prerequisites:
- Video memory >= 8GB. I haven't tried with less.
- Memory >= 32GB.
Python version: Theoretically, Python 3.11 - 3.12 can be used. There were some problems with 3.13 before, and it's not clear if they have been fixed now.
- It is recommended to use uv to install
uv pip install - r requirements.txt- Or directly use pip to install
pip install - r requirements.txtNote: If there is an error related to pillow during the installation process, you can
sudo apt-get install -y libjpeg-dev zlib1g-devThe video edit parts depends on zakahan/vedit-mcp, which depends on ffmpeg to work, so please configure ffmpeg.
# Ubuntu
sudo apt update
sudo apt install ffmpeg- For audio analysis, including speech recognition and speech interruption detection. Please refer to:
- https://github.com/FunAudioLLM/SenseVoice
- modelscope - iic/SenseVoiceSmall - corresponding to SENSE_VOICE_MODEL_PATH
- modelscope - icc/vad_fsmmn - corresponding to SENSE_VOICE_VAD_MODEL_PATH
Note: This part currently only supports local inference, and the API method may be supported in the future.
cd auto - slicing/src
cp .env.example .envEdit the .env file and modify some configurations according to the actual situation.
Note: Currently, this script uses the API of the Volcano Ark Platform, so both API_BASE and API_KEY are from this platform.
OPENAI_API_BASE: Currently, it is the api - base of the Volcano Ark Platform.OPENAI_API_KEY: It is recommended to configure it directly using environment variables to prevent leakage. Of course, it can also be configured directly here.OPENAI_MODELandOPENAI_MODEL_THINKING: Model names, please adjust according to the actual situation.SENSE_VOICE_LOCAL_MODEL_PATH: Modify it to the address of the sense_voice model weights you downloaded.SENSE_VOICE_LOCAL_VAD_MODEL_PATH: Modify it to the address of the vad_model weights you downloaded.KB_BASE_PATH: The basic path for slice processing. All files will be relative to this path.
Note: Absolute paths are recommended for the above paths.
(Choose either 2.1 or 2.2)
Please modify the query section in src/main.py according to your requirements.
Note: raw_video must be a path relative to KB_BASE_PATH. This design is to reduce the possibility of path errors when the large model is being invoked.
cd src
python main.pybash start_up.shThen you can access the Web UI.
The overall architecture diagram is as follows.
For the specific implementation, you can directly refer to the src/processor part of the code. This is the entry point of each module, and the overall idea is very clear in the diagram.
- Add prompt switching to support opening and closing titles.
- Implement support for the ASR API to break away from local inference limitations.
- Expand
vedit-mcp. Currently, it only supports basic editing functions and more support is needed. - Add the function of adding subtitles.
- Add the API calling method for speech recognition.
- Add the function of generating covers. First, create a simple version.
- Consider support for song live streams.
- Consider using speaker separation to support scenarios where the sound signals are not unique, such as game live streams and watching video live streams.
- 2025-05-18, unable to solve the bug of streamlit's file_uploader, so switched to using gradio for implementation.
- 2025-05-08, implemented a simple webui interface using streamlit.
