feat: update README section on outputs and retry logic (#857)

suluyana · suluyan · gemini-code-assist[bot] · web-flow · commit aa4e6e1cc261 · 2026-02-05T15:04:45.000+08:00
Co-authored-by: suluyan &lt;suluyan.sly@alibaba-inc.com&gt;
Co-authored-by: gemini-code-assist[bot] &lt;176961590+gemini-code-assist[bot]@users.noreply.github.com&gt;
diff --git a/projects/singularity_cinema/README.md b/projects/singularity_cinema/README.md
@@ -177,44 +177,92 @@ ms-agent run --project singularity_cinema \
 
 ### 5）输出与失败重试
 
-- 运行持续约20min左右。
-- 生成视频输出在 命令执行目录/`output_video/`（由配置项 `--output_dir` 控制）final_video.mp4
-- 如果运行失败（超时/中断/文件缺失），可直接重新运行命令：系统会读取 `output_video` 中的执行信息从断点继续
-  - 若希望完全重新生成：重命名/删除 output_video 目录
-  - 删除输入文件可以仅删除某个分镜的部分，这样重新执行也仅执行对应分镜的。
+- **预计耗时**：全流程运行约 **20 分钟**（与机器性能、模型调用速度有关）。
+- **输出位置**：视频与中间产物默认生成在**命令执行目录**下的 `output_video/`（可通过参数 `--output_dir` 修改）。
+  - 最终视频文件：`output_video/final_video.mp4`
+- **失败重试 / 断点续跑**：若运行失败（如超时、中断、文件缺失等），可直接**重新执行同一命令**。系统会读取 `output_video/` 中已生成的中间结果，并从断点继续。
+  - **完全重新生成**：删除或重命名 `output_video/` 目录后再运行。
+  - **只重做某个分镜/某一步**：删除你希望重生成的对应文件，以及其后续依赖生成的文件（例如删除某个分镜的渲染结果后，再运行会只重跑该分镜渲染效果）。
+    - 常见做法：删除目标分镜相关文件 + 最后的 `final_video.mp4`，即可触发仅重生成必要部分。
 
 ---
-## 技术原理流程
-1. 根据用户需求生成基本台本
-   - 输入：用户需求，可能读取用户指定的文件
-   - 输出：台本文件script.txt，原始需求文件topic.txt，短视频名称文件title.txt
-2. 根据台本切分分镜设计
-   - 输入：topic.txt, script.txt
-   - 输出：segments.txt，描述旁白、背景图片生成要求、前景manim动画要求的分镜列表
-3. 生成分镜的音频讲解
-   - 输入：segments.txt
-   - 输出：audio/audio_N.mp3列表，N为segment序号从1开始，以及根目录audio_info.txt，包含audio时长
-4. 根据语音时长生成remotion动画代码
-   - 输入：segments.txt，audio_info.txt
-   - 输出：manim代码文件列表 remotion_code/segment_N.py，N为segment序号从1开始
-5. 修复remotion代码
-   - 输入：remotion_code/segment_N.py N为segment序号从1开始，code_fix/code_fix_N.txt 预错误文件
-   - 输出：更新的remotion_code/segment_N.py文件
-6. 渲染remotion代码
-   - 输入：remotion_code/segment_N.py
-   - 输出：remotion_render/scene_N文件夹列表，如果segments.txt中对某个步骤包含了remotion要求，则对应文件夹中会有remotion.mov文件
-7. 生成文生图提示词
-   - 输入：segments.txt
-   - 输出：illustration_prompts/segment_N.txt，N为segment序号从1开始
-8. 文生图
-   - 输入：illustration_prompts/segment_N.txt列表
-   - 输出：images/illustration_N.png列表，N为segment序号从1开始
-9. 生成背景，为纯色带有短视频title和slogans的图片
-    - 输入：title.txt
-    - 输出：background.jpg
-0拼合整体视频
-    - 输入：前序所有的文件信息。这一步会有较长无日志耗时，这一阶段不消耗token。
-    - 输出：final_video.mp4
+
+## 运行流程与效果调试
+
+当某一步效果不满意时，你可以通过**删除该步骤的输出文件**（以及所有依赖它的后续文件）来触发重新生成。
+完整流程与对应代码入口见：`projects/singularity_cinema/workflow.yaml`。下面按顺序说明各步骤的输入、输出与作用范围（均默认在 `output_video/` 下）。
+
+1. **生成基础台本**
+   - 输入：用户需求（可能包含用户指定的文件）
+   - 输出：
+     - `script.txt`：台本正文
+     - `topic.txt`：原始需求/主题
+     - `title.txt`：短视频标题
+   - 代码：`generate_script/agent.py`
+
+2. **台本切分与分镜设计**
+   - 输入：`topic.txt`、`script.txt`
+   - 输出：`segments.txt`（分镜列表：每个分镜包含旁白、背景图需求、前景动画需求等）
+   - 代码：`segment/agent.py`
+
+3. **生成分镜配音（音频）**
+   - 输入：`segments.txt`
+   - 输出：
+     - `audio/segment_N.mp3`：第 N 个分镜的配音（N 从 1 开始）
+     - `audio_info.txt`：音频时长等信息（用于后续对齐动画）
+   - 代码：`generate_audio/agent.py`
+   - 作用范围：默认每个分镜都有配音
+     - 例外：当 `use_text2video=true` 且 `use_video_soundtrack=true`，且该分镜在台本设计中为**文生视频**时，将使用视频原声，不再额外使用配音。
+
+4. **生成文生图提示词（Prompt）**
+   - 输入：`segments.txt`
+   - 输出：
+     - `illustration_prompts/segment_N.txt`：第 N 个分镜的背景图提示词
+     - 若该分镜需要前景图：`illustration_prompts/segment_N_foreground_K.txt`（第 N 个分镜的第 K 张前景图提示词）
+   - 代码：`generate_illustration_prompts/agent.py`
+   - 作用范围：描述每个分镜所需图像内容
+
+5. **文生图生成图片**
+   - 输入：`illustration_prompts/segment_N.txt` 等提示词文件
+   - 输出：`images/illustration_N.png`（以及可能的前景图）
+   - 代码：`generate_images/agent.py`
+   - 作用范围：各分镜背景图/前景图素材
+
+6. **根据配音时长生成 Remotion 动画代码**
+   - 输入：`segments.txt`、`audio_info.txt`
+   - 输出：`remotion_code/SegmentN.tsx`（每个分镜一份）
+   - 代码：`generate_animation/agent.py`
+   - 作用范围：每个分镜的动画实现代码（时长与音频对齐）
+
+7. **渲染 Remotion 并自动修复代码（如有）**
+   - 输入：`remotion_code/SegmentN.tsx`
+   - 输出：
+     - 更新后的 `remotion_code/SegmentN.tsx`
+     - 渲染结果：`remotion_render/scene_N/SceneN.mov`
+   - 代码：`render_animation/agent.py`
+
+8. **生成统一背景图（标题与口号）**
+   - 输入：`title.txt`
+   - 输出：`background.jpg`
+   - 代码：`create_background/agent.py`
+   - 作用范围：视频左上角标题/背景元素（所有分镜共用）
+
+9. **合成最终视频**
+   - 输入：上述所有产物（音频、渲染视频、背景图等）
+   - 输出：`final_video.mp4`
+   - 说明：该阶段可能出现**较长时间无日志**，属于正常现象；通常不消耗 token。
+
+
+### 示例：只重做第 1 个分镜的动画效果
+
+如果你对第 1 个分镜动画不满意，可在 `output_video/` 中删除以下文件后重新运行命令：
+
+- `remotion_code/Segment1.tsx`（第 1 镜动画代码）
+- `remotion_render/scene_1/Scene1.mov`（由该代码渲染出的结果）
+- `final_video.mp4`（最终合成依赖渲染结果，需要重新合成）
+
+重新执行后，系统会仅重跑与这些文件相关的步骤，并复用其它未删除的中间产物。
+
 ---
 
 ## 可调参数（概览）
diff --git a/projects/singularity_cinema/README_EN.md b/projects/singularity_cinema/README_EN.md
@@ -179,48 +179,95 @@ ms-agent run --project singularity_cinema \
 
 ---
 
+
 ### 5) Output and Failure Retry
 
-- The run typically takes about 20 minutes.
-- The generated video is output to `output_video/` under your command execution directory (controlled by `--output_dir`) as `final_video.mp4`.
-- If the run fails (timeout/interruption/missing files), you can rerun the command directly: the system will read execution info in `output_video` and resume from the breakpoint.
-  - To regenerate from scratch: rename/delete the `output_video` directory.
-  - To rerun only part of a storyboard: delete only the corresponding files for that segment; rerunning will execute only those segments.
+- **Estimated time**: The full pipeline takes about **20 minutes** (depends on machine performance and model/API speed).
+- **Output location**: By default, the video and all intermediate artifacts are generated in `output_video/` under the **directory where you run the command** (can be changed via `--output_dir`).
+  - Final video file: `output_video/final_video.mp4`
+- **Failure retry / resume from checkpoint**: If the run fails (e.g., timeout, interruption, missing files), you can **rerun the exact same command**. The system will read existing intermediate results in `output_video/` and continue from where it stopped.
+  - **Regenerate everything**: Delete or rename the `output_video/` directory, then run again.
+  - **Redo only a specific scene/step**: Delete the files you want to regenerate, **and any downstream files that depend on them** (for example, if you delete a scene’s render output, rerunning will only re-render that scene).
+    - Common practice: delete the target scene’s related files + the final `final_video.mp4` to trigger regeneration of only the necessary parts.
 
 ---
 
-## Technical Workflow
+## Execution Pipeline and Effect Tuning
+
+If you are not satisfied with the result of a certain step, you can trigger regeneration by **deleting the output files of that step** (and all subsequent files that depend on them).
+The complete workflow and code entry points are defined in: `projects/singularity_cinema/workflow.yaml`. Below is each step in order, including inputs, outputs, and scope (all under `output_video/` by default).
+
+1. **Generate the base script**
+   - Input: user requirements (may include user-provided files)
+   - Output:
+     - `script.txt`: main script content
+     - `topic.txt`: original request/topic
+     - `title.txt`: short-video title
+   - Code: `generate_script/agent.py`
 
-1. Generate a base script from user requirements
-   - Input: user requirements; may read a user-specified file
-   - Output: script file `script.txt`, original request file `topic.txt`, short-video title file `title.txt`
-2. Split the script into storyboard segments
+2. **Split the script and design storyboards**
    - Input: `topic.txt`, `script.txt`
-   - Output: `segments.txt`, a list of segments describing narration, background image generation requirements, and foreground Manim animation requirements
-3. Generate audio narration for each segment
+   - Output: `segments.txt` (shot list: each shot includes narration, background image requirements, foreground animation requirements, etc.)
+   - Code: `segment/agent.py`
+
+3. **Generate voice-over audio for each segment**
    - Input: `segments.txt`
-   - Output: `audio/audio_N.mp3` list (N starts from 1), plus `audio_info.txt` in the root directory containing audio durations
-4. Generate Remotion animation code based on audio duration
-   - Input: `segments.txt`, `audio_info.txt`
-   - Output: Manim code files `remotion_code/segment_N.py` (N starts from 1)
-5. Fix Remotion code
-   - Input: `remotion_code/segment_N.py` (N starts from 1), pre-error file `code_fix/code_fix_N.txt`
-   - Output: updated `remotion_code/segment_N.py`
-6. Render Remotion code
-   - Input: `remotion_code/segment_N.py`
-   - Output: `remotion_render/scene_N` folder list; if a segment includes Remotion requirements in `segments.txt`, the corresponding folder will contain `remotion.mov`
-7. Generate text-to-image prompts
+   - Output:
+     - `audio/segment_N.mp3`: voice-over for segment N (N starts from 1)
+     - `audio_info.txt`: audio duration and other info (used later for animation alignment)
+   - Code: `generate_audio/agent.py`
+   - Scope: by default, every segment has voice-over
+     - Exception: when `use_text2video=true` and `use_video_soundtrack=true`, and the segment is marked as **text-to-video** in the storyboard design, the system will use the video’s original soundtrack instead of generating separate voice-over.
+
+4. **Generate prompts for text-to-image**
    - Input: `segments.txt`
-   - Output: `illustration_prompts/segment_N.txt` (N starts from 1)
-8. Text-to-image generation
-   - Input: list of `illustration_prompts/segment_N.txt`
-   - Output: list of `images/illustration_N.png` (N starts from 1)
-9. Generate a background image (solid color) with the short-video title and slogans
-    - Input: `title.txt`
-    - Output: `background.jpg`
-10. Compose the final video
-    - Input: all files from previous steps. This step may take a long time with no logs and does not consume tokens.
-    - Output: `final_video.mp4`
+   - Output:
+     - `illustration_prompts/segment_N.txt`: background image prompt for segment N
+     - If foreground images are needed: `illustration_prompts/segment_N_foreground_K.txt` (prompt for the K-th foreground image of segment N)
+   - Code: `generate_illustration_prompts/agent.py`
+   - Scope: describes the image content required for each segment
+
+5. **Generate images from prompts (text-to-image)**
+   - Input: prompt files such as `illustration_prompts/segment_N.txt`
+   - Output: `images/illustration_N.png` (and possibly foreground images)
+   - Code: `generate_images/agent.py`
+   - Scope: background/foreground visual assets for each segment
+
+6. **Generate Remotion animation code based on voice-over duration**
+   - Input: `segments.txt`, `audio_info.txt`
+   - Output: `remotion_code/SegmentN.tsx` (one per segment)
+   - Code: `generate_animation/agent.py`
+   - Scope: animation implementation code for each segment (duration aligned to audio)
+
+7. **Render Remotion and auto-fix code (if needed)**
+   - Input: `remotion_code/SegmentN.tsx`
+   - Output:
+     - Updated `remotion_code/SegmentN.tsx`
+     - Render result: `remotion_render/scene_N/SceneN.mov`
+   - Code: `render_animation/agent.py`
+
+8. **Generate a unified background image (title and slogan)**
+   - Input: `title.txt`
+   - Output: `background.jpg`
+   - Code: `create_background/agent.py`
+   - Scope: top-left title/background element shared by all segments
+
+9. **Compose the final video**
+   - Input: all artifacts above (audio, rendered videos, background image, etc.)
+   - Output: `final_video.mp4`
+   - Note: this stage may have a **long period with no logs**, which is normal; it typically does not consume tokens.
+
+---
+
+### Example: Redo only the animation of Segment 1
+
+If you’re not satisfied with the animation of segment 1, delete the following files under `output_video/` and rerun the command:
+
+- `remotion_code/Segment1.tsx` (segment 1 animation code)
+- `remotion_render/scene_1/Scene1.mov` (rendered output from that code)
+- `final_video.mp4` (final composition depends on the render result, so it must be recomposed)
+
+After rerunning, the system will only redo the steps related to these files and reuse the other intermediate artifacts that were not deleted.
 
 ---