feat: cover generation (#261)

timerring · web-flow · commit df8de7a77ed5 · 2025-04-05T16:41:24.000+08:00
* feat: cover generation
* docs: update docs
* docs: update icons
diff --git a/README.md b/README.md
@@ -16,6 +16,8 @@
   <img src="assets/zhipu-color.svg" alt="Zhipu GLM-4V-PLUS" width="60" height="60" />
   <img src="assets/gemini-brand-color.svg" alt="Google Gemini 1.5 Pro" width="60" height="60" />
   <img src="assets/qwen-color.svg" alt="Qwen-2.5-72B-Instruct" width="60" height="60" />
+  <img src="assets/minimax-color.svg" alt="Minimax" width="20" height="60" />
+  <img src="assets/minimax-text.svg" alt="Minimax" width="60" height="60" />
 
 </div>
 
@@ -41,6 +43,8 @@
   - `Qwen-2.5-72B-Instruct`
 - **( :tada: NEW)持久化登录/下载/上传视频(支持多p投稿)**：[bilitool](https://github.com/timerring/bilitool) 已经开源，实现持久化登录，下载视频及弹幕(含多p)/上传视频(可分p投稿)，查询投稿状态，查询详细信息等功能，一键pip安装，可以使用命令行 cli 操作，也可以作为api调用。
 - **( :tada: NEW)自动多平台循环直播推流**：该工具已经开源 [looplive](https://github.com/timerring/looplive) 是一个 7 x 24 小时全自动**循环多平台同时推流**直播工具。
+- **( :tada: NEW)自动生成风格变换的视频封面**：采用图生图多模态模型，自动获取视频截图并上传风格变换后的视频封面。
+  - `Minimax image-01`
 
 项目架构流程如下：
 
@@ -144,11 +148,11 @@ pip install -r requirements.txt
 
 ##### 3.1.1 采用 api 方式
 
-将 `src/config.py` 文件中的 `ASR_METHOD` 参数设置为 `api`，然后填写 `WHISPER_API_KEY` 参数为你的 [API Key](https://console.groq.com/keys)。本项目采用 groq 提供 free tier 的 `whisper-large-v3-turbo` 模型，上传限制为 40 MB（约半小时），因此如需采用 api 识别的方式，请将视频录制分段调整为 30 分钟。此外，free tier 请求限制为 7200秒/20次/小时，28800秒/2000次/天。如果有更多需求，也欢迎升级到 dev tier，更多信息见[groq 官网](https://console.groq.com/docs/rate-limits)。
+将 `settings.toml` 文件中的 `ASR_METHOD` 参数设置为 `api`，然后填写 `WHISPER_API_KEY` 参数为你的 [API Key](https://console.groq.com/keys)。本项目采用 groq 提供 free tier 的 `whisper-large-v3-turbo` 模型，上传限制为 40 MB（约半小时），因此如需采用 api 识别的方式，请将视频录制分段调整为 30 分钟。此外，free tier 请求限制为 7200秒/20次/小时，28800秒/2000次/天。如果有更多需求，也欢迎升级到 dev tier，更多信息见[groq 官网](https://console.groq.com/docs/rate-limits)。
 
 ##### 3.1.2 采用本地部署方式(需保证有 NVIDIA 显卡)
 
-将 `src/config.py` 文件中的 `ASR_METHOD` 参数设置为 `deploy`，然后下载所需模型文件，并放置在 `src/subtitle/models` 文件夹中。
+将 `settings.toml` 文件中的 `ASR_METHOD` 参数设置为 `deploy`，然后下载所需模型文件，并放置在 `src/subtitle/models` 文件夹中。
 
 项目默认采用 [`small`](https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt) 模型，请点击下载所需文件，并放置在 `src/subtitle/models` 文件夹中。
 
@@ -160,7 +164,7 @@ pip install -r requirements.txt
 
 ##### 3.2 MLLM 模型
 
-MLLM 模型主要用于自动切片后的切片标题生成，此功能默认关闭，如果需要打开请将 `src/config.py` 文件中的 `AUTO_SLICE` 参数设置为 `True`。其他配置分别有：
+MLLM 模型主要用于自动切片后的切片标题生成，此功能默认关闭，如果需要打开请将 `settings.toml` 文件中的 `AUTO_SLICE` 参数设置为 `True`。其他配置分别有：
 - `SLICE_DURATION` 以秒为单位设置切片时长（不建议超过 60 秒）。
 - `SLICE_NUM` 设置切片数量。
 - `SLICE_OVERLAP` 设置切片重叠时长。切片采用滑动窗口法处理，细节内容请见 [auto-slice-video](https://github.com/timerring/auto-slice-video)
@@ -169,21 +173,27 @@ MLLM 模型主要用于自动切片后的切片标题生成，此功能默认关
 
 ##### 3.2.1 GLM-4V-PLUS 模型
 
-> 如需使用 GLM-4V-PLUS 模型，请将 `src/config.py` 文件中的 `MLLM_MODEL` 参数设置为 `zhipu`
+> 如需使用 GLM-4V-PLUS 模型，请将 `settings.toml` 文件中的 `MLLM_MODEL` 参数设置为 `zhipu`
 
-在项目的自动切片功能需要使用到智谱的 [`GLM-4V-PLUS`](https://bigmodel.cn/dev/api/normal-model/glm-4) 模型，请自行[注册账号](https://www.bigmodel.cn/invite?icode=shBtZUfNE6FfdMH1R6NybGczbXFgPRGIalpycrEwJ28%3D)并申请 API Key，填写到 `src/config.py` 文件中对应的 `ZHIPU_API_KEY` 中。
+在项目的自动切片功能需要使用到智谱的 [`GLM-4V-PLUS`](https://bigmodel.cn/dev/api/normal-model/glm-4) 模型，请自行[注册账号](https://www.bigmodel.cn/invite?icode=shBtZUfNE6FfdMH1R6NybGczbXFgPRGIalpycrEwJ28%3D)并申请 API Key，填写到 `settings.toml` 文件中对应的 `ZHIPU_API_KEY` 中。
 
 ##### 3.2.2 Gemini 模型
 
-> 如需使用 Gemini-2.0-flash 模型，请将 `src/config.py` 文件中的 `MLLM_MODEL` 参数设置为 `gemini`
+> 如需使用 Gemini-2.0-flash 模型，请将 `settings.toml` 文件中的 `MLLM_MODEL` 参数设置为 `gemini`
 
-在项目的自动切片功能需要使用到 Gemini-2.0-flash 模型，请自行[注册账号](https://aistudio.google.com/app/apikey)并申请 API Key，填写到 `src/config.py` 文件中对应的 `GEMINI_API_KEY` 中。
+在项目的自动切片功能需要使用到 Gemini-2.0-flash 模型，请自行[注册账号](https://aistudio.google.com/app/apikey)并申请 API Key，填写到 `settings.toml` 文件中对应的 `GEMINI_API_KEY` 中。
 
 ##### 3.2.3 Qwen 模型
 
-> 如需使用 Qwen-2.5-72B-Instruct 模型，请将 `src/config.py` 文件中的 `MLLM_MODEL` 参数设置为 `qwen`
+> 如需使用 Qwen-2.5-72B-Instruct 模型，请将 `settings.toml` 文件中的 `MLLM_MODEL` 参数设置为 `qwen`
 
-在项目的自动切片功能需要使用到 Qwen-2.5-72B-Instruct 模型，请自行[注册账号](https://bailian.console.aliyun.com/?apiKey=1)并申请 API Key，填写到 `src/config.py` 文件中对应的 `QWEN_API_KEY` 中。
+在项目的自动切片功能需要使用到 Qwen-2.5-72B-Instruct 模型，请自行[注册账号](https://bailian.console.aliyun.com/?apiKey=1)并申请 API Key，填写到 `settings.toml` 文件中对应的 `QWEN_API_KEY` 中。
+
+##### 3.2.4 Minimax 模型
+
+> 如需使用 Minimax 模型，请将 `settings.toml` 文件中 `generate_cover` 参数设置为 `true`，并将 `IMAGE_GEN_MODEL` 参数设置为 `minimax`。
+
+在项目的自动切片功能需要使用到 Minimax 模型，请自行[注册账号](https://www.minimax.chat/)并申请 API Key，填写到 `settings.toml` 文件中对应的 `MINIMAX_API_KEY` 中。
 
 #### 4. bilitool 登录
 
@@ -248,7 +258,7 @@ logs # 日志文件夹
 #### 8. 配置上传参数
 
 > [!TIP]
-> 上传默认参数如下，[]中内容全部自动替换。可以在 `src/config.py` 中自定义相关配置，映射关键词为 `{artist}`、`{date}`、`{title}`、`{source_link}`，可自行组合删减定制模板：
+> 上传默认参数如下，[]中内容全部自动替换。可以在 `settings.toml` 中自定义相关配置，映射关键词为 `{artist}`、`{date}`、`{title}`、`{source_link}`，可自行组合删减定制模板：
 > + 标题模板是`{artist}直播回放-{date}-{title}`，效果为"【弹幕+字幕】[XXX]直播回放-[日期]-[直播间标题]"，可自行修改。
 > + 简介模板是`{artist}直播，直播间地址：{source_link} 内容仅供娱乐，直播中主播的言论、观点和行为均由主播本人负责，不代表录播员的观点或立场。`，效果为"【弹幕+字幕】[XXX]直播，直播间地址：[https://live.bilibili.com/XXX] 内容仅供娱乐，直播中主播的言论、观点和行为均由主播本人负责，不代表录播员的观点或立场。"，可自行修改。
 > + 默认标签是根据主播名字自动在 b 站搜索推荐中抓取的热搜词。
diff --git a/assets/minimax-color.svg b/assets/minimax-color.svg
@@ -0,0 +1 @@
+<svg height="1em" style="flex:none;line-height:1" viewBox="0 0 24 24" width="1em" xmlns="http://www.w3.org/2000/svg"><title>Minimax</title><defs><linearGradient id="lobe-icons-minimax-fill" x1="0%" x2="100.182%" y1="50.057%" y2="50.057%"><stop offset="0%" stop-color="#E2167E"></stop><stop offset="100%" stop-color="#FE603C"></stop></linearGradient></defs><path d="M16.278 2c1.156 0 2.093.927 2.093 2.07v12.501a.74.74 0 00.744.709.74.74 0 00.743-.709V9.099a2.06 2.06 0 012.071-2.049A2.06 2.06 0 0124 9.1v6.561a.649.649 0 01-.652.645.649.649 0 01-.653-.645V9.1a.762.762 0 00-.766-.758.762.762 0 00-.766.758v7.472a2.037 2.037 0 01-2.048 2.026 2.037 2.037 0 01-2.048-2.026v-12.5a.785.785 0 00-.788-.753.785.785 0 00-.789.752l-.001 15.904A2.037 2.037 0 0113.441 22a2.037 2.037 0 01-2.048-2.026V18.04c0-.356.292-.645.652-.645.36 0 .652.289.652.645v1.934c0 .263.142.506.372.638.23.131.514.131.744 0a.734.734 0 00.372-.638V4.07c0-1.143.937-2.07 2.093-2.07zm-5.674 0c1.156 0 2.093.927 2.093 2.07v11.523a.648.648 0 01-.652.645.648.648 0 01-.652-.645V4.07a.785.785 0 00-.789-.78.785.785 0 00-.789.78v14.013a2.06 2.06 0 01-2.07 2.048 2.06 2.06 0 01-2.071-2.048V9.1a.762.762 0 00-.766-.758.762.762 0 00-.766.758v3.8a2.06 2.06 0 01-2.071 2.049A2.06 2.06 0 010 12.9v-1.378c0-.357.292-.646.652-.646.36 0 .653.29.653.646V12.9c0 .418.343.757.766.757s.766-.339.766-.757V9.099a2.06 2.06 0 012.07-2.048 2.06 2.06 0 012.071 2.048v8.984c0 .419.343.758.767.758.423 0 .766-.339.766-.758V4.07c0-1.143.937-2.07 2.093-2.07z" fill="url(#lobe-icons-minimax-fill)" fill-rule="nonzero"></path></svg>
diff --git a/assets/minimax-text.svg b/assets/minimax-text.svg
@@ -0,0 +1 @@
+<svg fill="currentColor" fill-rule="evenodd" height="1em" style="flex:none;line-height:1" viewBox="0 0 114 24" xmlns="http://www.w3.org/2000/svg"><title>Minimax</title><path d="M2 22V2h4.23l4.41 6.828L14.93 2h3.988v20l-3.987-.06V9.25l-3.263 4.714H9.674L6.23 9.25V22H2zM26.471 2h-3.867v20h3.867V2zm3.565 0h4.049l7.492 12.387V2h3.988v20h-3.988L34.085 9.734V21.94h-4.049V2zm23.082 0h-4.109v20h4.109V2zm3.504 0v20h4.23V9.25l3.444 4.714h1.994l3.263-4.713V21.94l3.988.06V2h-3.988l-4.29 6.828L60.852 2h-4.23zm19.457 20l6.344-20h5.076l6.404 20h-4.471l-.89-3.021h-7.139L80.49 22h-4.411zm6.369-6.405h5.078l-2.505-8.338-2.573 8.338zM111.97 2h-4.774l-3.619 6.082L99.885 2h-4.592l5.961 9.985L95.294 22h4.591l3.698-6.113 3.613 6.053h4.774l-6.025-9.956L111.97 2z"></path></svg>
diff --git a/settings.toml b/settings.toml
@@ -38,6 +38,11 @@ zhipu_api_key = "" # Apply for your own GLM-4v-Plus API key at https://www.bigmo
 gemini_api_key = "" # Apply for your own Gemini API key at https://aistudio.google.com/app/apikey
 qwen_api_key = "" # Apply for your own Qwen API key at https://bailian.console.aliyun.com/?apiKey=1
 
+[cover]
+generate_cover = false # whether to generate cover
+image_gen_model = "minimax" # the image generation model, can be "minimax"
+minimax_api_key = "" # Apply for your own Minimax API key at https://platform.minimaxi.com/user-center/basic-information/interface-key
+
 # blrec Settings
 [[tasks]]
 room_id = 173551
diff --git a/src/config.py b/src/config.py
@@ -71,3 +71,7 @@ def get_interface_config():
 ZHIPU_API_KEY = config.get('slice', {}).get('zhipu_api_key')
 GEMINI_API_KEY = config.get('slice', {}).get('gemini_api_key')
 QWEN_API_KEY = config.get('slice', {}).get('qwen_api_key')
+
+GENERATE_COVER = config.get('cover', {}).get('generate_cover')
+IMAGE_GEN_MODEL = config.get('cover', {}).get('image_gen_model')
+MINIMAX_API_KEY = config.get('cover', {}).get('minimax_api_key')
diff --git a/src/cover/__init__.py b/src/cover/__init__.py
diff --git a/src/cover/cover_generator.py b/src/cover/cover_generator.py
@@ -0,0 +1,59 @@
+from functools import wraps
+from src.log.logger import upload_log
+from src.config import IMAGE_GEN_MODEL
+import subprocess
+
+def cut_cover_use_ffmpeg(video_path):
+    """Cut cover use ffmpeg
+    Args:
+        video_path: str, path to the video file
+    Returns:
+        str: the video cut cover path
+    """
+    upload_log.info("begin to generate cover")
+    cover_path = video_path[:-4] + ".jpg"
+    ffmpeg_command = [
+        'ffmpeg', '-y', '-i', video_path, '-t', '1', '-r', '1', cover_path
+    ]
+    try:
+        result = subprocess.run(ffmpeg_command, check=True, capture_output=True, text=True)
+        upload_log.debug(f"FFmpeg output: {result.stdout}")
+        if result.stderr:
+            upload_log.debug(f"FFmpeg debug: {result.stderr}")
+        return cover_path
+    except subprocess.CalledProcessError as e:
+        upload_log.error(f"Error: {e.stderr}")
+        return None
+
+
+def cover_generator(model_type):
+    """Decorator to select cover generation function based on model type
+    Args:
+        model_type: str, type of model to use
+    Returns:
+        function: wrapped title generation function
+    """
+    def decorator(func):
+        def wrapper(video_path):
+            cover_path = cut_cover_use_ffmpeg(video_path)
+            if cover_path is None:
+                upload_log.error("Failed to generate cover using ffmpeg")
+                return None
+            if model_type == "minimax":
+                from .image_model_sdk.minimax_sdk import minimax_generate_cover
+                return minimax_generate_cover(cover_path)
+            else:
+                upload_log.error(f"Unsupported model type: {model_type}")
+                return None
+        return wrapper
+    return decorator
+
+@cover_generator(IMAGE_GEN_MODEL)
+def generate_cover(video_path):
+    """Generate cover for video
+    Args:
+        video_path: str, path to the video file
+    Returns:
+        str: generated cover
+    """
+    pass  # The actual implementation is handled by the decorator
diff --git a/src/cover/image_model_sdk/minimax_sdk.py b/src/cover/image_model_sdk/minimax_sdk.py
@@ -0,0 +1,53 @@
+import requests
+import json
+import base64
+import os
+import time
+from src.config import MINIMAX_API_KEY
+
+
+def minimax_generate_cover(your_file_path):
+    """Generater cover image using minimax api
+    Args:
+        your_file_path: str, path to the image file
+    Returns:
+        str, local download path of the generated cover image file
+    """
+    cover_name = time.strftime("%Y%m%d%H%M%S") + ".png"
+    temp_cover_path = os.path.join(os.path.dirname(your_file_path), cover_name)
+
+    with open(your_file_path, "rb") as image_file:
+        data = base64.b64encode(image_file.read()).decode('utf-8')
+
+    payload = json.dumps({
+        "model": "image-01",
+        "prompt": "这是一个视频截图，请生成其对应的吉普力风格的图片",
+        "subject_reference": [
+            {
+                "type": "character",
+                "image_file": f"data:image/jpeg;base64,{data}"
+            }
+        ],
+        "n": 2
+    })
+    headers = {
+        'Authorization': f'Bearer {MINIMAX_API_KEY}',
+        'Content-Type': 'application/json'
+    }
+
+    url = "https://api.minimax.chat/v1/image_generation"
+    response = requests.request("POST", url, headers=headers, data=payload).json()
+    if response['base_resp']['status_code'] == 0:
+        image_url = response['data']['image_urls'][0]
+        img_data = requests.get(image_url).content
+        with open(temp_cover_path, 'wb') as handler:
+            handler.write(img_data)
+        os.remove(your_file_path)
+        return temp_cover_path
+    else:
+        print(response['base_resp']['error_msg'])
+        return None
+
+if __name__ == "__main__":
+    your_file_path = ""
+    print(minimax_generate_cover(your_file_path))
diff --git a/src/log/retry.py b/src/log/retry.py
@@ -29,7 +29,7 @@ def run(self, func, *args, **kwargs) -> Tuple[bool, Any]:
                     status = (True,return_value)
                     break
             except Exception as e:
-                scan_log.error(f"Exceptions in trial {i+1}/{self.max_retry} : {e}")
+                scan_log.error(f"Exceptions in function {func.__name__} trial {i+1}/{self.max_retry} : {e}")
                 sleep(self.interval)
 
         return status
diff --git a/src/upload/upload.py b/src/upload/upload.py
@@ -3,26 +3,31 @@
 import subprocess
 import os
 import sys
-from src.config import SRC_DIR, BILIVE_DIR, RESERVE_FOR_FIXING, UPLOAD_LINE
+from src.config import SRC_DIR, BILIVE_DIR, RESERVE_FOR_FIXING, UPLOAD_LINE, GENERATE_COVER
 from datetime import datetime
 from src.upload.generate_upload_data import generate_video_data, generate_slice_data
 from src.upload.extract_video_info import generate_title
-from src.log.logger import upload_log
+from src.log.logger import upload_log, scan_log
 import time
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from db.conn import get_single_upload_queue, delete_upload_queue, update_upload_queue_lock, get_single_lock_queue
 from .bilitool.bilitool import UploadController, FeedController, LoginController
 from src.log.retry import Retry
+from src.cover.cover_generator import generate_cover
 
 @Retry(max_retry = 3, interval = 5).decorator
 def upload_video(upload_path):
     try:
         if upload_path.endswith('.flv'):
             copyright, title, tid, tag = generate_slice_data(upload_path)
-            yaml, desc, source, cover, dynamic = ("",) * 5
+            if GENERATE_COVER:
+                cover = generate_cover(upload_path)
+            else:
+                cover = ""
+            yaml, desc, source, dynamic = ("",) * 4
             if title is None:
-                upload_log.error("Fail to upload slice video, the files will be reserved.")
-                update_upload_queue_lock(upload_path, 0)
+                upload_log.error("Fail to upload slice video, the files will be locked.")
+                update_upload_queue_lock(upload_path, 1)
                 return False
         else:
             copyright, title, desc, tid, tag, source, cover, dynamic = generate_video_data(upload_path)
@@ -31,16 +36,17 @@ def upload_video(upload_path):
         if result == True:
             upload_log.info("Upload successfully, then delete the video")
             os.remove(upload_path)
+            if cover:
+                os.remove(cover)
             delete_upload_queue(upload_path)
             return True
         else:
-            upload_log.error("Fail to upload, the files will be reserved.")
-            update_upload_queue_lock(upload_path, 0)
+            upload_log.error("Fail to upload, the files will be locked.")
+            update_upload_queue_lock(upload_path, 1)
             return False
-    
-    except subprocess.CalledProcessError as e:
-        upload_log.error(f"The upload_video called failed, the files will be reserved. error: {e}")
-        update_upload_queue_lock(upload_path, 0)
+    except Exception as e:
+        upload_log.error(f"The upload_video called failed, the files will be converted to locked. error: {e}")
+        update_upload_queue_lock(upload_path, 1)
         return False
 
 @Retry(max_retry = 3, interval = 5).decorator
@@ -54,13 +60,13 @@ def append_upload(upload_path, bv_result):
             delete_upload_queue(upload_path)
             return True
         else:
-            upload_log.error("Fail to append, the files will be reserved.")
-            update_upload_queue_lock(upload_path, 0)
+            upload_log.error("Fail to append, the files will be locked.")
+            update_upload_queue_lock(upload_path, 1)
             return False
     
-    except subprocess.CalledProcessError as e:
-        upload_log.error(f"The append_upload called failed, the files will be reserved. error: {e}")
-        update_upload_queue_lock(upload_path, 0)
+    except Exception as e:
+        upload_log.error(f"The append_upload called failed, the files will be locked. error: {e}")
+        update_upload_queue_lock(upload_path, 1)
         return False
 
 def video_gate(video_path):