Skip to content

Commit 21aacfd

Browse files
committed
Merge branch 'feat/dedup' into 'main'
feat: add per-step state tracking and video date metadata to YT digest See merge request nwpie/vibe/ai-claude-loop!4
2 parents b7a6605 + 68a643c commit 21aacfd

File tree

3 files changed

+66
-15
lines changed

3 files changed

+66
-15
lines changed

.claude/commands/ai-news-digest-yt.md

Lines changed: 51 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@ You are an AI news digest agent specializing in YouTube content from @AIDailyBri
22

33
Today's date is {{date}}.
44

5+
## Step 0: Load State & Identify Incomplete Work
6+
7+
Read `.state/last-digest-yt.json` if it exists. Parse the `video_status` dict (treat as empty `{}` if missing — backward compatible with old state files).
8+
9+
For each entry in `video_status` where `completed` is `false` (or missing), note which steps are already `true` — these will be **skipped** when processing that video later. Incomplete videos are NOT in `posted_video_ids`, so `fetch_recent_videos.py` will re-fetch them automatically.
10+
11+
After each per-video step completes (Steps 2–6), **immediately** save state with that step marked `true` in `video_status`. This ensures progress is preserved if the pipeline crashes mid-run.
12+
513
## Step 1: Fetch Recent Videos
614

715
Run the fetch script to get videos from the last 24 hours:
@@ -34,6 +42,8 @@ python scripts/yt/get_transcript.py VIDEO_ID
3442

3543
Capture the stdout output as the transcript text. If a video's transcript fails, log a warning and skip that video — continue with others.
3644

45+
After each successful transcript extraction, update `video_status[VIDEO_ID].steps.transcript = true` and save state. If Step 0 shows `transcript: true` for a video, skip transcript extraction and reuse the existing transcript file from `digest-yt/{{date}}/`.
46+
3747
## Step 2.5: Download Thumbnails
3848

3949
For each video, download the YouTube thumbnail to the digest directory:
@@ -49,6 +59,8 @@ https://i.ytimg.com/vi/VIDEO_ID/hqdefault.jpg
4959

5060
If a thumbnail download fails, continue without it — the summary will just lack an image.
5161

62+
After each successful thumbnail download, update `video_status[VIDEO_ID].steps.thumbnail = true` and save state. Skip if already `true` from Step 0.
63+
5264
## Step 3: Summarize (Claude does this)
5365

5466
For each video with a successful transcript, YOU (Claude) will:
@@ -58,15 +70,16 @@ For each video with a successful transcript, YOU (Claude) will:
5870
- **English summary**: 2-3 sentences covering the key points
5971
- **繁體中文摘要**: 2-3 sentences in Traditional Chinese covering the same points
6072

61-
3. Save each summary as markdown to `digest-yt/{{date}}/VIDEO_ID.md` with this format:
73+
3. Save each summary as markdown to `digest-yt/{{date}}/VIDEO_ID.md` with this format. Convert `upload_date` from YYYYMMDD → YYYY-MM-DD for display. Only include the **Last Modified** line if `modified_date` is present and different from `upload_date`:
6274

6375
```markdown
6476
# Video Title
6577

6678
![Video Title](VIDEO_ID_thumb.jpg)
6779

6880
**Source**: [AI Daily Brief](https://youtube.com/watch?v=VIDEO_ID)
69-
**Date**: {{date}}
81+
**Published**: 2026-03-12
82+
**Last Modified**: 2026-03-13
7083

7184
## English Summary
7285

@@ -79,32 +92,36 @@ For each video with a successful transcript, YOU (Claude) will:
7992

8093
4. Also create two combined digest files. **Each video section MUST include its thumbnail image** (use relative path). If the thumbnail file doesn't exist, omit the image line for that video.
8194

82-
**`digest-yt/{{date}}/summary_en.md`** — All English summaries combined:
95+
**`digest-yt/{{date}}/summary_en.md`** — All English summaries combined. Include `*Published: YYYY-MM-DD*` (and `| Modified: YYYY-MM-DD` only when `modified_date` is present) below each video heading:
8396
```markdown
8497
# AI Daily Brief - YouTube Digest {{date}}
8598

8699
## Video Title 1
87100
![Video Title 1](VIDEO_ID_thumb.jpg)
101+
*Published: 2026-03-12 | Modified: 2026-03-13*
88102

89103
2-3 sentence English summary...
90104

91105
## Video Title 2
92106
![Video Title 2](VIDEO_ID_thumb.jpg)
107+
*Published: 2026-03-12*
93108

94109
2-3 sentence English summary...
95110
```
96111

97-
**`digest-yt/{{date}}/summary_zh-tw.md`** — All zh-TW summaries combined:
112+
**`digest-yt/{{date}}/summary_zh-tw.md`** — All zh-TW summaries combined, same date format:
98113
```markdown
99114
# AI Daily Brief - YouTube 摘要 {{date}}
100115

101116
## Video Title 1
102117
![Video Title 1](VIDEO_ID_thumb.jpg)
118+
*Published: 2026-03-12 | Modified: 2026-03-13*
103119

104120
繁體中文摘要...
105121

106122
## Video Title 2
107123
![Video Title 2](VIDEO_ID_thumb.jpg)
124+
*Published: 2026-03-12*
108125

109126
繁體中文摘要...
110127
```
@@ -114,6 +131,8 @@ Create the `digest-yt/{{date}}/` directory first:
114131
mkdir -p "digest-yt/{{date}}"
115132
```
116133

134+
After writing summaries for each video, update `video_status[VIDEO_ID].steps.summary = true` and save state. Skip summary generation for videos where `summary: true` from Step 0 — reuse existing markdown files.
135+
117136
## Step 4: Generate HTML and PDF
118137

119138
For each language (en, zh-tw), build HTML then PDF:
@@ -130,6 +149,8 @@ python scripts/yt/build_pdf.py "digest-yt/{{date}}/summary_zh-tw.html" -o "diges
130149

131150
If PDF generation fails, note this and continue — you'll post without PDF links.
132151

152+
After successful HTML+PDF generation, update `video_status[VIDEO_ID].steps.html = true` and `video_status[VIDEO_ID].steps.pdf = true` for all videos, then save state. Skip if already `true` from Step 0.
153+
133154
## Step 5: Upload PDFs to B2
134155

135156
Upload each PDF to Backblaze B2:
@@ -141,6 +162,8 @@ python scripts/yt/upload_b2.py "digest-yt/{{date}}/summary_zh-tw_$(date +%Y%m%d)
141162

142163
Capture the download URLs from stdout. If upload fails, continue without links.
143164

165+
After successful uploads, update `video_status[VIDEO_ID].steps.b2_upload = true` for all videos, then save state. Skip if already `true` from Step 0.
166+
144167
## Step 6: Post to Slack
145168

146169
Build a Slack mrkdwn message and post it. Use this exact format:
@@ -184,9 +207,11 @@ slack_send "$MSG"
184207

185208
If Slack fails, retry once. If it fails again, save to `.state/failed-digest-yt-{{date}}.md`.
186209

187-
## Step 7: Save State
210+
After successful Slack post, update `video_status[VIDEO_ID].steps.slack_post = true` and `video_status[VIDEO_ID].completed = true` for each video, then save state.
211+
212+
## Step 7: Final State Save
188213

189-
Write `.state/last-digest-yt.json`:
214+
Write `.state/last-digest-yt.json` with the full schema. Only add a video to `posted_video_ids` when its `completed` flag is `true` (all steps including slack_post succeeded):
190215

191216
```json
192217
{
@@ -195,17 +220,33 @@ Write `.state/last-digest-yt.json`:
195220
"posted_urls": [
196221
"https://youtube.com/watch?v=id1",
197222
"https://youtube.com/watch?v=id2"
198-
]
223+
],
224+
"video_status": {
225+
"id1": {
226+
"title": "Video Title Here",
227+
"steps": {
228+
"transcript": true,
229+
"thumbnail": true,
230+
"summary": true,
231+
"html": true,
232+
"pdf": true,
233+
"b2_upload": true,
234+
"slack_post": true
235+
},
236+
"completed": true,
237+
"last_updated": "YYYY-MM-DDTHH:MM:SSZ"
238+
}
239+
}
199240
}
200241
```
201242

202-
Merge with existing state (keep max 30 video IDs from current + previous runs).
243+
Merge with existing state. Keep max 30 entries in both `posted_video_ids` and `video_status` (trim oldest together). Backward compatible: if existing state lacks `video_status`, treat as empty `{}`.
203244

204245
```bash
205246
mkdir -p .state
206247
```
207248

208-
**Always save state**, even on partial failure.
249+
**Always save state**, even on partial failure. Incomplete videos stay in `video_status` (with `completed: false`) but are NOT added to `posted_video_ids` — so they will be re-fetched on the next run.
209250

210251
## Error Handling Summary
211252

@@ -215,3 +256,4 @@ mkdir -p .state
215256
- B2 upload fails → post without download links, files stay local in `digest-yt/`
216257
- Slack fails → retry once, then save to `.state/failed-digest-yt-{{date}}.md`
217258
- State always saved even on partial failure
259+
- **Resume on retry**: incomplete videos stay in `video_status` but not in `posted_video_ids` → re-fetched next run → completed steps skipped via `video_status.steps` booleans

CLAUDE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ digest-yt/YYYY-MM-DD/ — Local output (gitignored)
6363
## State Tracking
6464

6565
- `.state/last-digest-news.json` tracks posted URLs for deduplication (news digest)
66-
- `.state/last-digest-yt.json` tracks posted video IDs for deduplication (YT digest)
66+
- `.state/last-digest-yt.json` tracks posted video IDs + per-video step completion for deduplication and resume (YT digest)
67+
- `posted_video_ids`: videos fully completed (all steps including slack_post)
68+
- `video_status`: per-video step booleans (transcript, thumbnail, summary, html, pdf, b2_upload, slack_post) — enables resume on partial failure
69+
- Incomplete videos stay in `video_status` but NOT in `posted_video_ids` → re-fetched on retry
6770
- Keep max 30 entries (current + previous digest)
6871
- State directory is gitignored

scripts/yt/fetch_recent_videos.py

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
python scripts/yt/fetch_recent_videos.py [--channel URL] [--hours 24] [--state PATH]
77
88
Outputs JSON array to stdout:
9-
[{"id": "abc123", "title": "Video Title", "upload_date": "20260311"}, ...]
9+
[{"id": "abc123", "title": "Video Title", "upload_date": "20260311", "modified_date": "20260312"}, ...]
1010
"""
1111

1212
import argparse
@@ -27,7 +27,7 @@ def fetch_channel_videos(channel_url: str, max_items: int = 10) -> list[dict]:
2727
cmd = [
2828
"yt-dlp",
2929
"--playlist-items", f"1:{max_items}",
30-
"--print", "%(id)s\t%(title)s\t%(upload_date)s\t%(thumbnail)s",
30+
"--print", "%(id)s\t%(title)s\t%(upload_date)s\t%(modified_date)s\t%(thumbnail)s",
3131
"--skip-download",
3232
f"{channel_url}/videos",
3333
]
@@ -38,14 +38,20 @@ def fetch_channel_videos(channel_url: str, max_items: int = 10) -> list[dict]:
3838

3939
videos = []
4040
for line in result.stdout.strip().splitlines():
41-
parts = line.split("\t", 3)
41+
parts = line.split("\t", 4)
4242
if len(parts) >= 2:
4343
vid_id = parts[0]
44+
upload = parts[2] if len(parts) >= 3 else "NA"
45+
modified = parts[3] if len(parts) >= 4 else "NA"
46+
# Treat modified_date as null if same as upload_date or unavailable
47+
if modified in ("NA", "", upload):
48+
modified = None
4449
vid = {
4550
"id": vid_id,
4651
"title": parts[1],
47-
"upload_date": parts[2] if len(parts) >= 3 else "NA",
48-
"thumbnail": parts[3] if len(parts) >= 4 and parts[3] != "NA"
52+
"upload_date": upload,
53+
"modified_date": modified,
54+
"thumbnail": parts[4] if len(parts) >= 5 and parts[4] != "NA"
4955
else f"https://i.ytimg.com/vi/{vid_id}/hqdefault.jpg",
5056
}
5157
videos.append(vid)

0 commit comments

Comments
 (0)