Skip to content

Commit aa4e6e1

Browse files
suluyanasuluyangemini-code-assist[bot]
authored
feat: update README section on outputs and retry logic (#857)
Co-authored-by: suluyan <suluyan.sly@alibaba-inc.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent a3fcd90 commit aa4e6e1

File tree

2 files changed

+164
-69
lines changed

2 files changed

+164
-69
lines changed

projects/singularity_cinema/README.md

Lines changed: 84 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -177,44 +177,92 @@ ms-agent run --project singularity_cinema \
177177

178178
### 5)输出与失败重试
179179

180-
- 运行持续约20min左右。
181-
- 生成视频输出在 命令执行目录/`output_video/`(由配置项 `--output_dir` 控制)final_video.mp4
182-
- 如果运行失败(超时/中断/文件缺失),可直接重新运行命令:系统会读取 `output_video` 中的执行信息从断点继续
183-
- 若希望完全重新生成:重命名/删除 output_video 目录
184-
- 删除输入文件可以仅删除某个分镜的部分,这样重新执行也仅执行对应分镜的。
180+
- **预计耗时**:全流程运行约 **20 分钟**(与机器性能、模型调用速度有关)。
181+
- **输出位置**:视频与中间产物默认生成在**命令执行目录**下的 `output_video/`(可通过参数 `--output_dir` 修改)。
182+
- 最终视频文件:`output_video/final_video.mp4`
183+
- **失败重试 / 断点续跑**:若运行失败(如超时、中断、文件缺失等),可直接**重新执行同一命令**。系统会读取 `output_video/` 中已生成的中间结果,并从断点继续。
184+
- **完全重新生成**:删除或重命名 `output_video/` 目录后再运行。
185+
- **只重做某个分镜/某一步**:删除你希望重生成的对应文件,以及其后续依赖生成的文件(例如删除某个分镜的渲染结果后,再运行会只重跑该分镜渲染效果)。
186+
- 常见做法:删除目标分镜相关文件 + 最后的 `final_video.mp4`,即可触发仅重生成必要部分。
185187

186188
---
187-
## 技术原理流程
188-
1. 根据用户需求生成基本台本
189-
- 输入:用户需求,可能读取用户指定的文件
190-
- 输出:台本文件script.txt,原始需求文件topic.txt,短视频名称文件title.txt
191-
2. 根据台本切分分镜设计
192-
- 输入:topic.txt, script.txt
193-
- 输出:segments.txt,描述旁白、背景图片生成要求、前景manim动画要求的分镜列表
194-
3. 生成分镜的音频讲解
195-
- 输入:segments.txt
196-
- 输出:audio/audio_N.mp3列表,N为segment序号从1开始,以及根目录audio_info.txt,包含audio时长
197-
4. 根据语音时长生成remotion动画代码
198-
- 输入:segments.txt,audio_info.txt
199-
- 输出:manim代码文件列表 remotion_code/segment_N.py,N为segment序号从1开始
200-
5. 修复remotion代码
201-
- 输入:remotion_code/segment_N.py N为segment序号从1开始,code_fix/code_fix_N.txt 预错误文件
202-
- 输出:更新的remotion_code/segment_N.py文件
203-
6. 渲染remotion代码
204-
- 输入:remotion_code/segment_N.py
205-
- 输出:remotion_render/scene_N文件夹列表,如果segments.txt中对某个步骤包含了remotion要求,则对应文件夹中会有remotion.mov文件
206-
7. 生成文生图提示词
207-
- 输入:segments.txt
208-
- 输出:illustration_prompts/segment_N.txt,N为segment序号从1开始
209-
8. 文生图
210-
- 输入:illustration_prompts/segment_N.txt列表
211-
- 输出:images/illustration_N.png列表,N为segment序号从1开始
212-
9. 生成背景,为纯色带有短视频title和slogans的图片
213-
- 输入:title.txt
214-
- 输出:background.jpg
215-
0拼合整体视频
216-
- 输入:前序所有的文件信息。这一步会有较长无日志耗时,这一阶段不消耗token。
217-
- 输出:final_video.mp4
189+
190+
## 运行流程与效果调试
191+
192+
当某一步效果不满意时,你可以通过**删除该步骤的输出文件**(以及所有依赖它的后续文件)来触发重新生成。
193+
完整流程与对应代码入口见:`projects/singularity_cinema/workflow.yaml`。下面按顺序说明各步骤的输入、输出与作用范围(均默认在 `output_video/` 下)。
194+
195+
1. **生成基础台本**
196+
- 输入:用户需求(可能包含用户指定的文件)
197+
- 输出:
198+
- `script.txt`:台本正文
199+
- `topic.txt`:原始需求/主题
200+
- `title.txt`:短视频标题
201+
- 代码:`generate_script/agent.py`
202+
203+
2. **台本切分与分镜设计**
204+
- 输入:`topic.txt``script.txt`
205+
- 输出:`segments.txt`(分镜列表:每个分镜包含旁白、背景图需求、前景动画需求等)
206+
- 代码:`segment/agent.py`
207+
208+
3. **生成分镜配音(音频)**
209+
- 输入:`segments.txt`
210+
- 输出:
211+
- `audio/segment_N.mp3`:第 N 个分镜的配音(N 从 1 开始)
212+
- `audio_info.txt`:音频时长等信息(用于后续对齐动画)
213+
- 代码:`generate_audio/agent.py`
214+
- 作用范围:默认每个分镜都有配音
215+
- 例外:当 `use_text2video=true``use_video_soundtrack=true`,且该分镜在台本设计中为**文生视频**时,将使用视频原声,不再额外使用配音。
216+
217+
4. **生成文生图提示词(Prompt)**
218+
- 输入:`segments.txt`
219+
- 输出:
220+
- `illustration_prompts/segment_N.txt`:第 N 个分镜的背景图提示词
221+
- 若该分镜需要前景图:`illustration_prompts/segment_N_foreground_K.txt`(第 N 个分镜的第 K 张前景图提示词)
222+
- 代码:`generate_illustration_prompts/agent.py`
223+
- 作用范围:描述每个分镜所需图像内容
224+
225+
5. **文生图生成图片**
226+
- 输入:`illustration_prompts/segment_N.txt` 等提示词文件
227+
- 输出:`images/illustration_N.png`(以及可能的前景图)
228+
- 代码:`generate_images/agent.py`
229+
- 作用范围:各分镜背景图/前景图素材
230+
231+
6. **根据配音时长生成 Remotion 动画代码**
232+
- 输入:`segments.txt``audio_info.txt`
233+
- 输出:`remotion_code/SegmentN.tsx`(每个分镜一份)
234+
- 代码:`generate_animation/agent.py`
235+
- 作用范围:每个分镜的动画实现代码(时长与音频对齐)
236+
237+
7. **渲染 Remotion 并自动修复代码(如有)**
238+
- 输入:`remotion_code/SegmentN.tsx`
239+
- 输出:
240+
- 更新后的 `remotion_code/SegmentN.tsx`
241+
- 渲染结果:`remotion_render/scene_N/SceneN.mov`
242+
- 代码:`render_animation/agent.py`
243+
244+
8. **生成统一背景图(标题与口号)**
245+
- 输入:`title.txt`
246+
- 输出:`background.jpg`
247+
- 代码:`create_background/agent.py`
248+
- 作用范围:视频左上角标题/背景元素(所有分镜共用)
249+
250+
9. **合成最终视频**
251+
- 输入:上述所有产物(音频、渲染视频、背景图等)
252+
- 输出:`final_video.mp4`
253+
- 说明:该阶段可能出现**较长时间无日志**,属于正常现象;通常不消耗 token。
254+
255+
256+
### 示例:只重做第 1 个分镜的动画效果
257+
258+
如果你对第 1 个分镜动画不满意,可在 `output_video/` 中删除以下文件后重新运行命令:
259+
260+
- `remotion_code/Segment1.tsx`(第 1 镜动画代码)
261+
- `remotion_render/scene_1/Scene1.mov`(由该代码渲染出的结果)
262+
- `final_video.mp4`(最终合成依赖渲染结果,需要重新合成)
263+
264+
重新执行后,系统会仅重跑与这些文件相关的步骤,并复用其它未删除的中间产物。
265+
218266
---
219267

220268
## 可调参数(概览)

projects/singularity_cinema/README_EN.md

Lines changed: 80 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -179,48 +179,95 @@ ms-agent run --project singularity_cinema \
179179

180180
---
181181

182+
182183
### 5) Output and Failure Retry
183184

184-
- The run typically takes about 20 minutes.
185-
- The generated video is output to `output_video/` under your command execution directory (controlled by `--output_dir`) as `final_video.mp4`.
186-
- If the run fails (timeout/interruption/missing files), you can rerun the command directly: the system will read execution info in `output_video` and resume from the breakpoint.
187-
- To regenerate from scratch: rename/delete the `output_video` directory.
188-
- To rerun only part of a storyboard: delete only the corresponding files for that segment; rerunning will execute only those segments.
185+
- **Estimated time**: The full pipeline takes about **20 minutes** (depends on machine performance and model/API speed).
186+
- **Output location**: By default, the video and all intermediate artifacts are generated in `output_video/` under the **directory where you run the command** (can be changed via `--output_dir`).
187+
- Final video file: `output_video/final_video.mp4`
188+
- **Failure retry / resume from checkpoint**: If the run fails (e.g., timeout, interruption, missing files), you can **rerun the exact same command**. The system will read existing intermediate results in `output_video/` and continue from where it stopped.
189+
- **Regenerate everything**: Delete or rename the `output_video/` directory, then run again.
190+
- **Redo only a specific scene/step**: Delete the files you want to regenerate, **and any downstream files that depend on them** (for example, if you delete a scene’s render output, rerunning will only re-render that scene).
191+
- Common practice: delete the target scene’s related files + the final `final_video.mp4` to trigger regeneration of only the necessary parts.
189192

190193
---
191194

192-
## Technical Workflow
195+
## Execution Pipeline and Effect Tuning
196+
197+
If you are not satisfied with the result of a certain step, you can trigger regeneration by **deleting the output files of that step** (and all subsequent files that depend on them).
198+
The complete workflow and code entry points are defined in: `projects/singularity_cinema/workflow.yaml`. Below is each step in order, including inputs, outputs, and scope (all under `output_video/` by default).
199+
200+
1. **Generate the base script**
201+
- Input: user requirements (may include user-provided files)
202+
- Output:
203+
- `script.txt`: main script content
204+
- `topic.txt`: original request/topic
205+
- `title.txt`: short-video title
206+
- Code: `generate_script/agent.py`
193207

194-
1. Generate a base script from user requirements
195-
- Input: user requirements; may read a user-specified file
196-
- Output: script file `script.txt`, original request file `topic.txt`, short-video title file `title.txt`
197-
2. Split the script into storyboard segments
208+
2. **Split the script and design storyboards**
198209
- Input: `topic.txt`, `script.txt`
199-
- Output: `segments.txt`, a list of segments describing narration, background image generation requirements, and foreground Manim animation requirements
200-
3. Generate audio narration for each segment
210+
- Output: `segments.txt` (shot list: each shot includes narration, background image requirements, foreground animation requirements, etc.)
211+
- Code: `segment/agent.py`
212+
213+
3. **Generate voice-over audio for each segment**
201214
- Input: `segments.txt`
202-
- Output: `audio/audio_N.mp3` list (N starts from 1), plus `audio_info.txt` in the root directory containing audio durations
203-
4. Generate Remotion animation code based on audio duration
204-
- Input: `segments.txt`, `audio_info.txt`
205-
- Output: Manim code files `remotion_code/segment_N.py` (N starts from 1)
206-
5. Fix Remotion code
207-
- Input: `remotion_code/segment_N.py` (N starts from 1), pre-error file `code_fix/code_fix_N.txt`
208-
- Output: updated `remotion_code/segment_N.py`
209-
6. Render Remotion code
210-
- Input: `remotion_code/segment_N.py`
211-
- Output: `remotion_render/scene_N` folder list; if a segment includes Remotion requirements in `segments.txt`, the corresponding folder will contain `remotion.mov`
212-
7. Generate text-to-image prompts
215+
- Output:
216+
- `audio/segment_N.mp3`: voice-over for segment N (N starts from 1)
217+
- `audio_info.txt`: audio duration and other info (used later for animation alignment)
218+
- Code: `generate_audio/agent.py`
219+
- Scope: by default, every segment has voice-over
220+
- Exception: when `use_text2video=true` and `use_video_soundtrack=true`, and the segment is marked as **text-to-video** in the storyboard design, the system will use the video’s original soundtrack instead of generating separate voice-over.
221+
222+
4. **Generate prompts for text-to-image**
213223
- Input: `segments.txt`
214-
- Output: `illustration_prompts/segment_N.txt` (N starts from 1)
215-
8. Text-to-image generation
216-
- Input: list of `illustration_prompts/segment_N.txt`
217-
- Output: list of `images/illustration_N.png` (N starts from 1)
218-
9. Generate a background image (solid color) with the short-video title and slogans
219-
- Input: `title.txt`
220-
- Output: `background.jpg`
221-
10. Compose the final video
222-
- Input: all files from previous steps. This step may take a long time with no logs and does not consume tokens.
223-
- Output: `final_video.mp4`
224+
- Output:
225+
- `illustration_prompts/segment_N.txt`: background image prompt for segment N
226+
- If foreground images are needed: `illustration_prompts/segment_N_foreground_K.txt` (prompt for the K-th foreground image of segment N)
227+
- Code: `generate_illustration_prompts/agent.py`
228+
- Scope: describes the image content required for each segment
229+
230+
5. **Generate images from prompts (text-to-image)**
231+
- Input: prompt files such as `illustration_prompts/segment_N.txt`
232+
- Output: `images/illustration_N.png` (and possibly foreground images)
233+
- Code: `generate_images/agent.py`
234+
- Scope: background/foreground visual assets for each segment
235+
236+
6. **Generate Remotion animation code based on voice-over duration**
237+
- Input: `segments.txt`, `audio_info.txt`
238+
- Output: `remotion_code/SegmentN.tsx` (one per segment)
239+
- Code: `generate_animation/agent.py`
240+
- Scope: animation implementation code for each segment (duration aligned to audio)
241+
242+
7. **Render Remotion and auto-fix code (if needed)**
243+
- Input: `remotion_code/SegmentN.tsx`
244+
- Output:
245+
- Updated `remotion_code/SegmentN.tsx`
246+
- Render result: `remotion_render/scene_N/SceneN.mov`
247+
- Code: `render_animation/agent.py`
248+
249+
8. **Generate a unified background image (title and slogan)**
250+
- Input: `title.txt`
251+
- Output: `background.jpg`
252+
- Code: `create_background/agent.py`
253+
- Scope: top-left title/background element shared by all segments
254+
255+
9. **Compose the final video**
256+
- Input: all artifacts above (audio, rendered videos, background image, etc.)
257+
- Output: `final_video.mp4`
258+
- Note: this stage may have a **long period with no logs**, which is normal; it typically does not consume tokens.
259+
260+
---
261+
262+
### Example: Redo only the animation of Segment 1
263+
264+
If you’re not satisfied with the animation of segment 1, delete the following files under `output_video/` and rerun the command:
265+
266+
- `remotion_code/Segment1.tsx` (segment 1 animation code)
267+
- `remotion_render/scene_1/Scene1.mov` (rendered output from that code)
268+
- `final_video.mp4` (final composition depends on the render result, so it must be recomposed)
269+
270+
After rerunning, the system will only redo the steps related to these files and reuse the other intermediate artifacts that were not deleted.
224271

225272
---
226273

0 commit comments

Comments
 (0)