Skip to content

Commit 4afda49

Browse files
authored
feat: auto slice videos (#151)
* feat: auto slice videos fix #150 * refactor: add copyright * docs: update readme * refactor: adjust icons * style: adjust style * style: substitue icons
1 parent bcb0aeb commit 4afda49

File tree

11 files changed

+220
-20
lines changed

11 files changed

+220
-20
lines changed

README.md

Lines changed: 22 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,18 @@
44
<img src="assets/headerLight.svg" alt="BILIVE" />
55
</picture>
66

7-
*7 x 24 小时无人监守录制、渲染弹幕、识别字幕、自动上传,启动项目,人人都是录播员。*
7+
*7 x 24 小时无人监守录制、渲染弹幕、识别字幕、自动切片、自动上传,启动项目,人人都是录播员。*
88

99
[:page_facing_up: Documentation](https://timerring.github.io/bilive/) |
1010
[:gear: Installation](#quick-start) |
1111
[:thinking: Reporting Issues](https://github.com/timerring/bilive/issues/new/choose)
1212

13+
支持模型
14+
15+
<img src="assets/openai.svg" alt="OpenAI whisper" width="60" height="60" />
16+
<img src="assets/zhipu-color.svg" alt="Zhipu GLM-4V-PLUS" width="60" height="60" />
17+
<img src="assets/gemini-brand-color.svg" alt="Google Gemini 1.5 Pro" width="60" height="60" />
18+
1319
</div>
1420

1521
## 1. Introduction
@@ -29,6 +35,7 @@
2935
- **自动渲染弹幕**:自动转换xml为ass弹幕文件并且渲染到视频中形成**有弹幕版视频**并自动上传。
3036
- **硬件要求极低**:无需GPU,只需最基础的单核CPU搭配最低的运存即可完成录制,弹幕渲染,上传等等全部过程,无最低配置要求,10年前的电脑或服务器依然可以使用!
3137
- **( :tada: NEW)自动渲染字幕**(如需使用本功能,则需保证有 Nvidia 显卡):采用 OpenAI 的开源模型 [`whisper`](https://github.com/openai/whisper),自动识别视频内语音并转换为字幕渲染至视频中。
38+
- **( :tada: NEW)自动切片上传**:根据弹幕密度计算寻找高能片段并切片,结合多模态视频理解大模型 [`GLM-4V-PLUS`](https://bigmodel.cn/dev/api/normal-model/glm-4) 自动生成有意思的切片标题及内容,并且自动上传。
3239

3340
项目架构流程如下:
3441

@@ -46,8 +53,13 @@ graph TD
4653
ifDanmaku -->|有弹幕| DanmakuFactory[DanmakuFactory]
4754
ifDanmaku -->|无弹幕| ffmpeg1[ffmpeg]
4855
DanmakuFactory[DanmakuFactory] --根据分辨率转换弹幕--> ffmpeg1[ffmpeg]
56+
ffmpeg1[ffmpeg] --渲染弹幕及字幕 --> Video[视频文件]
57+
Video[视频文件] --计算弹幕密度并切片--> GLM[多模态视频理解模型]
58+
GLM[多模态视频理解模型] --生成切片信息--> slice[视频切片]
4959
end
50-
ffmpeg1[ffmpeg] --渲染弹幕及字幕 --> uploadQueue[(上传队列)]
60+
61+
slice[视频切片] --> uploadQueue[(上传队列)]
62+
Video[视频文件] --> uploadQueue[(上传队列)]
5163
5264
User((用户))--upload-->startUpload(启动视频上传进程)
5365
startUpload(启动视频上传进程) <--扫描队列并上传视频--> uploadQueue[(上传队列)]
@@ -110,15 +122,21 @@ pip install -r requirements.txt
110122
./setPath.sh && source ~/.bashrc
111123
```
112124

113-
#### 3. 配置 whisper 模型
125+
#### 3. 配置 whisper 模型及 GLM-4V-PLUS 模型
114126

127+
##### 3.1 whisper 模型
115128
项目默认采用 [`small`](https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt) 模型,请点击下载所需文件,并放置在 `src/subtitle/models` 文件夹中。
116129

117130
> [!TIP]
118131
> 使用该参数模型至少需要保证有显存大于 2.7GB 的 GPU,否则请使用其他参数量的模型。
119132
> + 更多模型请参考 [whisper 参数模型](https://timerring.github.io/bilive/models.html) 部分。
120133
> + 更换模型方法请参考 [更换模型方法](https://timerring.github.io/bilive/models.html#更换模型方法) 部分。
121134
135+
##### 3.2 GLM-4V-PLUS 模型
136+
137+
> 此功能默认关闭,如果需要打开请将 `src/config.py` 文件中的 `AUTO_SLICE` 参数设置为 `True`
138+
139+
在配置文件 `src/config.py` 中,`SLICE_DURATION` 以秒为单位设置切片时长(不建议超过 1 分钟),在项目的自动切片功能需要使用到智谱的 [`GLM-4V-PLUS`](https://bigmodel.cn/dev/api/normal-model/glm-4) 模型,请自行[注册账号](https://www.bigmodel.cn/invite?icode=shBtZUfNE6FfdMH1R6NybGczbXFgPRGIalpycrEwJ28%3D)并申请 API Key,填写到 `src/config.py` 文件中对应的 `Your_API_KEY` 中。
122140

123141
#### 4. biliup-rs 登录
124142

@@ -176,7 +194,7 @@ logs # 日志文件夹
176194
```
177195

178196
### Installation(无 GPU 版本)
179-
无 GPU 版本过程基本同上,可以跳过步骤 3,需要注意在执行步骤 5 **之前**完成以下设置将确保完全用 CPU 渲染视频弹幕。
197+
无 GPU 版本过程基本同上,可以跳过步骤 3 配置 whisper 的部分,需要注意在执行步骤 5 **之前**完成以下设置将确保完全用 CPU 渲染视频弹幕。
180198

181199
1. 请将 `src/config.py` 文件中的 `GPU_EXIST` 参数设置为 `False`。(若不置为 `False` 且则会使用 CPU 推理,不推荐,可自行根据硬件条件进行尝试。)
182200
2.`MODEL_TYPE` 调整为 `merge` 或者 `append`

assets/gemini-brand-color.svg

Lines changed: 1 addition & 0 deletions
Loading

assets/openai.svg

Lines changed: 1 addition & 0 deletions
Loading

assets/zhipu-color.svg

Lines changed: 1 addition & 0 deletions
Loading

src/autoslice/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Copyright (c) 2024 bilive.
2+
3+
import sys
4+
import os
5+
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

src/autoslice/calculate_density.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Copyright (c) 2024 bilive.
2+
3+
import re
4+
from collections import defaultdict
5+
from src.config import SLICE_DURATION
6+
7+
def parse_time(time_str):
8+
"""Convert ASS time format to seconds with milliseconds."""
9+
h, m, s = time_str.split(':')
10+
s, ms = s.split('.')
11+
return int(h) * 3600 + int(m) * 60 + int(s) + int(ms) / 100
12+
13+
def format_time(seconds):
14+
"""Format seconds to hh:mm:ss.xx."""
15+
h = int(seconds // 3600)
16+
m = int((seconds % 3600) // 60)
17+
s = int(seconds % 60)
18+
ms = int((seconds - int(seconds)) * 100)
19+
return f"{h:02}:{m:02}:{s:02}.{ms:02}"
20+
21+
def extract_dialogues(file_path):
22+
"""Extract dialogue start times from the ASS file."""
23+
dialogues = []
24+
with open(file_path, 'r', encoding='utf-8') as file:
25+
for line in file:
26+
if line.startswith('Dialogue:'):
27+
parts = line.split(',')
28+
start_time = parse_time(parts[1].strip())
29+
dialogues.append(start_time)
30+
return dialogues
31+
32+
def calculate_density(dialogues, window_size=SLICE_DURATION):
33+
"""Calculate the maximum density of dialogues in a given window size."""
34+
time_counts = defaultdict(int)
35+
for time in dialogues:
36+
time_counts[time] += 1
37+
38+
max_density = 0
39+
max_start_time = 0
40+
41+
# Use a sliding window to calculate density
42+
sorted_times = sorted(time_counts.keys())
43+
for i in range(len(sorted_times)):
44+
start_time = sorted_times[i]
45+
end_time = start_time + window_size
46+
current_density = sum(count for time, count in time_counts.items() if start_time <= time < end_time)
47+
if current_density > max_density:
48+
max_density = current_density
49+
max_start_time = start_time
50+
51+
return max_start_time, max_density

src/autoslice/slice_video.py

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Copyright (c) 2024 bilive.
2+
3+
import subprocess
4+
from src.autoslice.calculate_density import extract_dialogues, calculate_density, format_time
5+
from src.config import Your_API_KEY, SLICE_DURATION
6+
import base64
7+
from zhipuai import ZhipuAI
8+
9+
def zhipu_glm_4v_plus_generate_title(video_path, artist):
10+
with open(video_path, 'rb') as video_file:
11+
video_base = base64.b64encode(video_file.read()).decode('utf-8')
12+
13+
client = ZhipuAI(api_key=Your_API_KEY)
14+
response = client.chat.completions.create(
15+
model="glm-4v-plus",
16+
messages=[
17+
{
18+
"role": "user",
19+
"content": [
20+
{
21+
"type": "video_url",
22+
"video_url": {
23+
"url" : video_base
24+
}
25+
},
26+
{
27+
"type": "text",
28+
"text": f"视频是{artist}的直播的切片,请根据该视频中的内容及弹幕信息,为这段视频起一个调皮并且吸引眼球的标题,注意标题中如果有“主播”请替换成{artist}。"
29+
}
30+
]
31+
}
32+
]
33+
)
34+
return response.choices[0].message.content.replace("《", "").replace("》", "")
35+
36+
# https://stackoverflow.com/questions/64849478/cant-insert-stream-metadata-into-mp4
37+
def inject_metadata(video_path, generate_title, output_path):
38+
"""Slice the video using ffmpeg."""
39+
command = [
40+
'ffmpeg',
41+
'-i', video_path,
42+
'-metadata:g', f'generate={generate_title}',
43+
'-c:v', 'copy',
44+
'-c:a', 'copy',
45+
output_path
46+
]
47+
subprocess.run(command)
48+
49+
def slice_video(video_path, start_time, output_path, duration=f'00:00:{SLICE_DURATION}'):
50+
"""Slice the video using ffmpeg."""
51+
command = [
52+
'ffmpeg',
53+
'-ss', format_time(start_time),
54+
'-i', video_path,
55+
'-t', duration,
56+
'-map_metadata', '-1',
57+
'-c:v', 'copy',
58+
'-c:a', 'copy',
59+
output_path
60+
]
61+
subprocess.run(command)

src/burn/only_render.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,13 @@
33
import argparse
44
import os
55
import subprocess
6-
from src.config import GPU_EXIST, SRC_DIR, MODEL_TYPE
6+
from src.config import GPU_EXIST, SRC_DIR, MODEL_TYPE, AUTO_SLICE, SLICE_DURATION
77
from src.burn.generate_danmakus import get_resolution, process_danmakus
88
from src.burn.generate_subtitles import generate_subtitles
99
from src.burn.render_video import render_video
10+
from src.autoslice.slice_video import slice_video, inject_metadata, zhipu_glm_4v_plus_generate_title
11+
from src.autoslice.calculate_density import extract_dialogues, calculate_density, format_time
12+
from src.upload.extract_video_info import get_video_info
1013
import queue
1114
import threading
1215
import time
@@ -52,7 +55,20 @@ def render_video_only(video_path):
5255
render_video(original_video_path, format_video_path, subtitle_font_size, subtitle_margin_v)
5356
print("complete danamku burning and wait for uploading!", flush=True)
5457

55-
# # Delete relative files
58+
if AUTO_SLICE:
59+
title, artist, date = get_video_info(format_video_path)
60+
slice_video_path = format_video_path[:-4] + '_slice.mp4'
61+
dialogues = extract_dialogues(ass_path)
62+
max_start_time, max_density = calculate_density(dialogues)
63+
formatted_time = format_time(max_start_time)
64+
print(f"The 30-second window with the highest density starts at {formatted_time} seconds with {max_density} danmakus.", flush=True)
65+
slice_video(format_video_path, max_start_time, slice_video_path)
66+
glm_title = zhipu_glm_4v_plus_generate_title(slice_video_path, artist)
67+
slice_video_flv_path = slice_video_path[:-4] + '.flv'
68+
inject_metadata(slice_video_path, glm_title, slice_video_flv_path)
69+
os.remove(slice_video_path)
70+
71+
# Delete relative files
5672
for remove_path in [original_video_path, xml_path, ass_path, srt_path, jsonl_path]:
5773
if os.path.exists(remove_path):
5874
os.remove(remove_path)
@@ -63,6 +79,9 @@ def render_video_only(video_path):
6379

6480
with open(f"{SRC_DIR}/upload/uploadVideoQueue.txt", "a") as file:
6581
file.write(f"{format_video_path}\n")
82+
if AUTO_SLICE:
83+
print("complete slice video and wait for uploading!", flush=True)
84+
file.write(f"{slice_video_flv_path}\n")
6685

6786
class VideoRenderQueue:
6887
def __init__(self):

src/config.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,10 @@
1010
# Can be pipeline, append, merge
1111
MODEL_TYPE = "pipeline"
1212
Inference_Model = "small"
13-
13+
AUTO_SLICE = False
14+
SLICE_DURATION = 30
15+
# Apply for your own GLM-4v-Plus API key at https://www.bigmodel.cn/invite?icode=shBtZUfNE6FfdMH1R6NybGczbXFgPRGIalpycrEwJ28%3D
16+
Your_API_KEY = ""
1417
# ============================ Basic configuration ============================
1518
SRC_DIR = str(Path(os.path.abspath(__file__)).parent)
1619
BILIVE_DIR = str(Path(SRC_DIR).parent)

0 commit comments

Comments
 (0)