diff --git a/.claude/skills/gitnexus/gitnexus-cli/SKILL.md b/.claude/skills/gitnexus/gitnexus-cli/SKILL.md new file mode 100644 index 000000000..c9e0af341 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-cli/SKILL.md @@ -0,0 +1,82 @@ +--- +name: gitnexus-cli +description: "Use when the user needs to run GitNexus CLI commands like analyze/index a repo, check status, clean the index, generate a wiki, or list indexed repos. Examples: \"Index this repo\", \"Reanalyze the codebase\", \"Generate a wiki\"" +--- + +# GitNexus CLI Commands + +All commands work via `npx` — no global install required. + +## Commands + +### analyze — Build or refresh the index + +```bash +npx gitnexus analyze +``` + +Run from the project root. This parses all source files, builds the knowledge graph, writes it to `.gitnexus/`, and generates CLAUDE.md / AGENTS.md context files. + +| Flag | Effect | +| -------------- | ---------------------------------------------------------------- | +| `--force` | Force full re-index even if up to date | +| `--embeddings` | Enable embedding generation for semantic search (off by default) | + +**When to run:** First time in a project, after major code changes, or when `gitnexus://repo/{name}/context` reports the index is stale. In Claude Code, a PostToolUse hook runs `analyze` automatically after `git commit` and `git merge`, preserving embeddings if previously generated. + +### status — Check index freshness + +```bash +npx gitnexus status +``` + +Shows whether the current repo has a GitNexus index, when it was last updated, and symbol/relationship counts. Use this to check if re-indexing is needed. + +### clean — Delete the index + +```bash +npx gitnexus clean +``` + +Deletes the `.gitnexus/` directory and unregisters the repo from the global registry. Use before re-indexing if the index is corrupt or after removing GitNexus from a project. + +| Flag | Effect | +| --------- | ------------------------------------------------- | +| `--force` | Skip confirmation prompt | +| `--all` | Clean all indexed repos, not just the current one | + +### wiki — Generate documentation from the graph + +```bash +npx gitnexus wiki +``` + +Generates repository documentation from the knowledge graph using an LLM. Requires an API key (saved to `~/.gitnexus/config.json` on first use). + +| Flag | Effect | +| ------------------- | ----------------------------------------- | +| `--force` | Force full regeneration | +| `--model ` | LLM model (default: minimax/minimax-m2.5) | +| `--base-url ` | LLM API base URL | +| `--api-key ` | LLM API key | +| `--concurrency ` | Parallel LLM calls (default: 3) | +| `--gist` | Publish wiki as a public GitHub Gist | + +### list — Show all indexed repos + +```bash +npx gitnexus list +``` + +Lists all repositories registered in `~/.gitnexus/registry.json`. The MCP `list_repos` tool provides the same information. + +## After Indexing + +1. **Read `gitnexus://repo/{name}/context`** to verify the index loaded +2. Use the other GitNexus skills (`exploring`, `debugging`, `impact-analysis`, `refactoring`) for your task + +## Troubleshooting + +- **"Not inside a git repository"**: Run from a directory inside a git repo +- **Index is stale after re-analyzing**: Restart Claude Code to reload the MCP server +- **Embeddings slow**: Omit `--embeddings` (it's off by default) or set `OPENAI_API_KEY` for faster API-based embedding diff --git a/.claude/skills/gitnexus/gitnexus-debugging/SKILL.md b/.claude/skills/gitnexus/gitnexus-debugging/SKILL.md new file mode 100644 index 000000000..9510b97ac --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-debugging/SKILL.md @@ -0,0 +1,89 @@ +--- +name: gitnexus-debugging +description: "Use when the user is debugging a bug, tracing an error, or asking why something fails. Examples: \"Why is X failing?\", \"Where does this error come from?\", \"Trace this bug\"" +--- + +# Debugging with GitNexus + +## When to Use + +- "Why is this function failing?" +- "Trace where this error comes from" +- "Who calls this method?" +- "This endpoint returns 500" +- Investigating bugs, errors, or unexpected behavior + +## Workflow + +``` +1. gitnexus_query({query: ""}) → Find related execution flows +2. gitnexus_context({name: ""}) → See callers/callees/processes +3. READ gitnexus://repo/{name}/process/{name} → Trace execution flow +4. gitnexus_cypher({query: "MATCH path..."}) → Custom traces if needed +``` + +> If "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklist + +``` +- [ ] Understand the symptom (error message, unexpected behavior) +- [ ] gitnexus_query for error text or related code +- [ ] Identify the suspect function from returned processes +- [ ] gitnexus_context to see callers and callees +- [ ] Trace execution flow via process resource if applicable +- [ ] gitnexus_cypher for custom call chain traces if needed +- [ ] Read source files to confirm root cause +``` + +## Debugging Patterns + +| Symptom | GitNexus Approach | +| -------------------- | ---------------------------------------------------------- | +| Error message | `gitnexus_query` for error text → `context` on throw sites | +| Wrong return value | `context` on the function → trace callees for data flow | +| Intermittent failure | `context` → look for external calls, async deps | +| Performance issue | `context` → find symbols with many callers (hot paths) | +| Recent regression | `detect_changes` to see what your changes affect | + +## Tools + +**gitnexus_query** — find code related to error: + +``` +gitnexus_query({query: "payment validation error"}) +→ Processes: CheckoutFlow, ErrorHandling +→ Symbols: validatePayment, handlePaymentError, PaymentException +``` + +**gitnexus_context** — full context for a suspect: + +``` +gitnexus_context({name: "validatePayment"}) +→ Incoming calls: processCheckout, webhookHandler +→ Outgoing calls: verifyCard, fetchRates (external API!) +→ Processes: CheckoutFlow (step 3/7) +``` + +**gitnexus_cypher** — custom call chain traces: + +```cypher +MATCH path = (a)-[:CodeRelation {type: 'CALLS'}*1..2]->(b:Function {name: "validatePayment"}) +RETURN [n IN nodes(path) | n.name] AS chain +``` + +## Example: "Payment endpoint returns 500 intermittently" + +``` +1. gitnexus_query({query: "payment error handling"}) + → Processes: CheckoutFlow, ErrorHandling + → Symbols: validatePayment, handlePaymentError + +2. gitnexus_context({name: "validatePayment"}) + → Outgoing calls: verifyCard, fetchRates (external API!) + +3. READ gitnexus://repo/my-app/process/CheckoutFlow + → Step 3: validatePayment → calls fetchRates (external) + +4. Root cause: fetchRates calls external API without proper timeout +``` diff --git a/.claude/skills/gitnexus/gitnexus-exploring/SKILL.md b/.claude/skills/gitnexus/gitnexus-exploring/SKILL.md new file mode 100644 index 000000000..927a4e4b6 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-exploring/SKILL.md @@ -0,0 +1,78 @@ +--- +name: gitnexus-exploring +description: "Use when the user asks how code works, wants to understand architecture, trace execution flows, or explore unfamiliar parts of the codebase. Examples: \"How does X work?\", \"What calls this function?\", \"Show me the auth flow\"" +--- + +# Exploring Codebases with GitNexus + +## When to Use + +- "How does authentication work?" +- "What's the project structure?" +- "Show me the main components" +- "Where is the database logic?" +- Understanding code you haven't seen before + +## Workflow + +``` +1. READ gitnexus://repos → Discover indexed repos +2. READ gitnexus://repo/{name}/context → Codebase overview, check staleness +3. gitnexus_query({query: ""}) → Find related execution flows +4. gitnexus_context({name: ""}) → Deep dive on specific symbol +5. READ gitnexus://repo/{name}/process/{name} → Trace full execution flow +``` + +> If step 2 says "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklist + +``` +- [ ] READ gitnexus://repo/{name}/context +- [ ] gitnexus_query for the concept you want to understand +- [ ] Review returned processes (execution flows) +- [ ] gitnexus_context on key symbols for callers/callees +- [ ] READ process resource for full execution traces +- [ ] Read source files for implementation details +``` + +## Resources + +| Resource | What you get | +| --------------------------------------- | ------------------------------------------------------- | +| `gitnexus://repo/{name}/context` | Stats, staleness warning (~150 tokens) | +| `gitnexus://repo/{name}/clusters` | All functional areas with cohesion scores (~300 tokens) | +| `gitnexus://repo/{name}/cluster/{name}` | Area members with file paths (~500 tokens) | +| `gitnexus://repo/{name}/process/{name}` | Step-by-step execution trace (~200 tokens) | + +## Tools + +**gitnexus_query** — find execution flows related to a concept: + +``` +gitnexus_query({query: "payment processing"}) +→ Processes: CheckoutFlow, RefundFlow, WebhookHandler +→ Symbols grouped by flow with file locations +``` + +**gitnexus_context** — 360-degree view of a symbol: + +``` +gitnexus_context({name: "validateUser"}) +→ Incoming calls: loginHandler, apiMiddleware +→ Outgoing calls: checkToken, getUserById +→ Processes: LoginFlow (step 2/5), TokenRefresh (step 1/3) +``` + +## Example: "How does payment processing work?" + +``` +1. READ gitnexus://repo/my-app/context → 918 symbols, 45 processes +2. gitnexus_query({query: "payment processing"}) + → CheckoutFlow: processPayment → validateCard → chargeStripe + → RefundFlow: initiateRefund → calculateRefund → processRefund +3. gitnexus_context({name: "processPayment"}) + → Incoming: checkoutHandler, webhookHandler + → Outgoing: validateCard, chargeStripe, saveTransaction +4. Read src/payments/processor.ts for implementation details +``` diff --git a/.claude/skills/gitnexus/gitnexus-guide/SKILL.md b/.claude/skills/gitnexus/gitnexus-guide/SKILL.md new file mode 100644 index 000000000..937ac73d1 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-guide/SKILL.md @@ -0,0 +1,64 @@ +--- +name: gitnexus-guide +description: "Use when the user asks about GitNexus itself — available tools, how to query the knowledge graph, MCP resources, graph schema, or workflow reference. Examples: \"What GitNexus tools are available?\", \"How do I use GitNexus?\"" +--- + +# GitNexus Guide + +Quick reference for all GitNexus MCP tools, resources, and the knowledge graph schema. + +## Always Start Here + +For any task involving code understanding, debugging, impact analysis, or refactoring: + +1. **Read `gitnexus://repo/{name}/context`** — codebase overview + check index freshness +2. **Match your task to a skill below** and **read that skill file** +3. **Follow the skill's workflow and checklist** + +> If step 1 warns the index is stale, run `npx gitnexus analyze` in the terminal first. + +## Skills + +| Task | Skill to read | +| -------------------------------------------- | ------------------- | +| Understand architecture / "How does X work?" | `gitnexus-exploring` | +| Blast radius / "What breaks if I change X?" | `gitnexus-impact-analysis` | +| Trace bugs / "Why is X failing?" | `gitnexus-debugging` | +| Rename / extract / split / refactor | `gitnexus-refactoring` | +| Tools, resources, schema reference | `gitnexus-guide` (this file) | +| Index, status, clean, wiki CLI commands | `gitnexus-cli` | + +## Tools Reference + +| Tool | What it gives you | +| ---------------- | ------------------------------------------------------------------------ | +| `query` | Process-grouped code intelligence — execution flows related to a concept | +| `context` | 360-degree symbol view — categorized refs, processes it participates in | +| `impact` | Symbol blast radius — what breaks at depth 1/2/3 with confidence | +| `detect_changes` | Git-diff impact — what do your current changes affect | +| `rename` | Multi-file coordinated rename with confidence-tagged edits | +| `cypher` | Raw graph queries (read `gitnexus://repo/{name}/schema` first) | +| `list_repos` | Discover indexed repos | + +## Resources Reference + +Lightweight reads (~100-500 tokens) for navigation: + +| Resource | Content | +| ---------------------------------------------- | ----------------------------------------- | +| `gitnexus://repo/{name}/context` | Stats, staleness check | +| `gitnexus://repo/{name}/clusters` | All functional areas with cohesion scores | +| `gitnexus://repo/{name}/cluster/{clusterName}` | Area members | +| `gitnexus://repo/{name}/processes` | All execution flows | +| `gitnexus://repo/{name}/process/{processName}` | Step-by-step trace | +| `gitnexus://repo/{name}/schema` | Graph schema for Cypher | + +## Graph Schema + +**Nodes:** File, Function, Class, Interface, Method, Community, Process +**Edges (via CodeRelation.type):** CALLS, IMPORTS, EXTENDS, IMPLEMENTS, DEFINES, MEMBER_OF, STEP_IN_PROCESS + +```cypher +MATCH (caller)-[:CodeRelation {type: 'CALLS'}]->(f:Function {name: "myFunc"}) +RETURN caller.name, caller.filePath +``` diff --git a/.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md b/.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md new file mode 100644 index 000000000..e19af280c --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md @@ -0,0 +1,97 @@ +--- +name: gitnexus-impact-analysis +description: "Use when the user wants to know what will break if they change something, or needs safety analysis before editing code. Examples: \"Is it safe to change X?\", \"What depends on this?\", \"What will break?\"" +--- + +# Impact Analysis with GitNexus + +## When to Use + +- "Is it safe to change this function?" +- "What will break if I modify X?" +- "Show me the blast radius" +- "Who uses this code?" +- Before making non-trivial code changes +- Before committing — to understand what your changes affect + +## Workflow + +``` +1. gitnexus_impact({target: "X", direction: "upstream"}) → What depends on this +2. READ gitnexus://repo/{name}/processes → Check affected execution flows +3. gitnexus_detect_changes() → Map current git changes to affected flows +4. Assess risk and report to user +``` + +> If "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklist + +``` +- [ ] gitnexus_impact({target, direction: "upstream"}) to find dependents +- [ ] Review d=1 items first (these WILL BREAK) +- [ ] Check high-confidence (>0.8) dependencies +- [ ] READ processes to check affected execution flows +- [ ] gitnexus_detect_changes() for pre-commit check +- [ ] Assess risk level and report to user +``` + +## Understanding Output + +| Depth | Risk Level | Meaning | +| ----- | ---------------- | ------------------------ | +| d=1 | **WILL BREAK** | Direct callers/importers | +| d=2 | LIKELY AFFECTED | Indirect dependencies | +| d=3 | MAY NEED TESTING | Transitive effects | + +## Risk Assessment + +| Affected | Risk | +| ------------------------------ | -------- | +| <5 symbols, few processes | LOW | +| 5-15 symbols, 2-5 processes | MEDIUM | +| >15 symbols or many processes | HIGH | +| Critical path (auth, payments) | CRITICAL | + +## Tools + +**gitnexus_impact** — the primary tool for symbol blast radius: + +``` +gitnexus_impact({ + target: "validateUser", + direction: "upstream", + minConfidence: 0.8, + maxDepth: 3 +}) + +→ d=1 (WILL BREAK): + - loginHandler (src/auth/login.ts:42) [CALLS, 100%] + - apiMiddleware (src/api/middleware.ts:15) [CALLS, 100%] + +→ d=2 (LIKELY AFFECTED): + - authRouter (src/routes/auth.ts:22) [CALLS, 95%] +``` + +**gitnexus_detect_changes** — git-diff based impact analysis: + +``` +gitnexus_detect_changes({scope: "staged"}) + +→ Changed: 5 symbols in 3 files +→ Affected: LoginFlow, TokenRefresh, APIMiddlewarePipeline +→ Risk: MEDIUM +``` + +## Example: "What breaks if I change validateUser?" + +``` +1. gitnexus_impact({target: "validateUser", direction: "upstream"}) + → d=1: loginHandler, apiMiddleware (WILL BREAK) + → d=2: authRouter, sessionManager (LIKELY AFFECTED) + +2. READ gitnexus://repo/my-app/processes + → LoginFlow and TokenRefresh touch validateUser + +3. Risk: 2 direct callers, 2 processes = MEDIUM +``` diff --git a/.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md b/.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md new file mode 100644 index 000000000..f48cc01bd --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md @@ -0,0 +1,121 @@ +--- +name: gitnexus-refactoring +description: "Use when the user wants to rename, extract, split, move, or restructure code safely. Examples: \"Rename this function\", \"Extract this into a module\", \"Refactor this class\", \"Move this to a separate file\"" +--- + +# Refactoring with GitNexus + +## When to Use + +- "Rename this function safely" +- "Extract this into a module" +- "Split this service" +- "Move this to a new file" +- Any task involving renaming, extracting, splitting, or restructuring code + +## Workflow + +``` +1. gitnexus_impact({target: "X", direction: "upstream"}) → Map all dependents +2. gitnexus_query({query: "X"}) → Find execution flows involving X +3. gitnexus_context({name: "X"}) → See all incoming/outgoing refs +4. Plan update order: interfaces → implementations → callers → tests +``` + +> If "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklists + +### Rename Symbol + +``` +- [ ] gitnexus_rename({symbol_name: "oldName", new_name: "newName", dry_run: true}) — preview all edits +- [ ] Review graph edits (high confidence) and ast_search edits (review carefully) +- [ ] If satisfied: gitnexus_rename({..., dry_run: false}) — apply edits +- [ ] gitnexus_detect_changes() — verify only expected files changed +- [ ] Run tests for affected processes +``` + +### Extract Module + +``` +- [ ] gitnexus_context({name: target}) — see all incoming/outgoing refs +- [ ] gitnexus_impact({target, direction: "upstream"}) — find all external callers +- [ ] Define new module interface +- [ ] Extract code, update imports +- [ ] gitnexus_detect_changes() — verify affected scope +- [ ] Run tests for affected processes +``` + +### Split Function/Service + +``` +- [ ] gitnexus_context({name: target}) — understand all callees +- [ ] Group callees by responsibility +- [ ] gitnexus_impact({target, direction: "upstream"}) — map callers to update +- [ ] Create new functions/services +- [ ] Update callers +- [ ] gitnexus_detect_changes() — verify affected scope +- [ ] Run tests for affected processes +``` + +## Tools + +**gitnexus_rename** — automated multi-file rename: + +``` +gitnexus_rename({symbol_name: "validateUser", new_name: "authenticateUser", dry_run: true}) +→ 12 edits across 8 files +→ 10 graph edits (high confidence), 2 ast_search edits (review) +→ Changes: [{file_path, edits: [{line, old_text, new_text, confidence}]}] +``` + +**gitnexus_impact** — map all dependents first: + +``` +gitnexus_impact({target: "validateUser", direction: "upstream"}) +→ d=1: loginHandler, apiMiddleware, testUtils +→ Affected Processes: LoginFlow, TokenRefresh +``` + +**gitnexus_detect_changes** — verify your changes after refactoring: + +``` +gitnexus_detect_changes({scope: "all"}) +→ Changed: 8 files, 12 symbols +→ Affected processes: LoginFlow, TokenRefresh +→ Risk: MEDIUM +``` + +**gitnexus_cypher** — custom reference queries: + +```cypher +MATCH (caller)-[:CodeRelation {type: 'CALLS'}]->(f:Function {name: "validateUser"}) +RETURN caller.name, caller.filePath ORDER BY caller.filePath +``` + +## Risk Rules + +| Risk Factor | Mitigation | +| ------------------- | ----------------------------------------- | +| Many callers (>5) | Use gitnexus_rename for automated updates | +| Cross-area refs | Use detect_changes after to verify scope | +| String/dynamic refs | gitnexus_query to find them | +| External/public API | Version and deprecate properly | + +## Example: Rename `validateUser` to `authenticateUser` + +``` +1. gitnexus_rename({symbol_name: "validateUser", new_name: "authenticateUser", dry_run: true}) + → 12 edits: 10 graph (safe), 2 ast_search (review) + → Files: validator.ts, login.ts, middleware.ts, config.json... + +2. Review ast_search edits (config.json: dynamic reference!) + +3. gitnexus_rename({symbol_name: "validateUser", new_name: "authenticateUser", dry_run: false}) + → Applied 12 edits across 8 files + +4. gitnexus_detect_changes({scope: "all"}) + → Affected: LoginFlow, TokenRefresh + → Risk: MEDIUM — run tests for these flows +``` diff --git a/.gitignore b/.gitignore index cc6bd1884..53e869ab8 100644 --- a/.gitignore +++ b/.gitignore @@ -246,3 +246,5 @@ video_creation/data/envvars.txt config.toml *.exe +.gitnexus +node_modules/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..d0322792a --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,101 @@ + +# GitNexus — Code Intelligence + +This project is indexed by GitNexus as **RedditVideoMakerBot** (311 symbols, 758 relationships, 23 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. + +> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first. + +## Always Do + +- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user. +- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows. +- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits. +- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. +- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`. + +## When Debugging + +1. `gitnexus_query({query: ""})` — find execution flows related to the issue +2. `gitnexus_context({name: ""})` — see all callers, callees, and process participation +3. `READ gitnexus://repo/RedditVideoMakerBot/process/{processName}` — trace the full execution flow step by step +4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed + +## When Refactoring + +- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`. +- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code. +- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed. + +## Never Do + +- NEVER edit a function, class, or method without first running `gitnexus_impact` on it. +- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis. +- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph. +- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope. + +## Tools Quick Reference + +| Tool | When to use | Command | +|------|-------------|---------| +| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` | +| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` | +| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` | +| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` | +| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` | +| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` | + +## Impact Risk Levels + +| Depth | Meaning | Action | +|-------|---------|--------| +| d=1 | WILL BREAK — direct callers/importers | MUST update these | +| d=2 | LIKELY AFFECTED — indirect deps | Should test | +| d=3 | MAY NEED TESTING — transitive | Test if critical path | + +## Resources + +| Resource | Use for | +|----------|---------| +| `gitnexus://repo/RedditVideoMakerBot/context` | Codebase overview, check index freshness | +| `gitnexus://repo/RedditVideoMakerBot/clusters` | All functional areas | +| `gitnexus://repo/RedditVideoMakerBot/processes` | All execution flows | +| `gitnexus://repo/RedditVideoMakerBot/process/{name}` | Step-by-step execution trace | + +## Self-Check Before Finishing + +Before completing any code modification task, verify: +1. `gitnexus_impact` was run for all modified symbols +2. No HIGH/CRITICAL risk warnings were ignored +3. `gitnexus_detect_changes()` confirms changes match expected scope +4. All d=1 (WILL BREAK) dependents were updated + +## Keeping the Index Fresh + +After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it: + +```bash +npx gitnexus analyze +``` + +If the index previously included embeddings, preserve them by adding `--embeddings`: + +```bash +npx gitnexus analyze --embeddings +``` + +To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.** + +> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`. + +## CLI + +| Task | Read this skill file | +|------|---------------------| +| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` | +| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` | +| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` | +| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` | +| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` | +| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` | + + diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 000000000..f7f25c788 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,119 @@ +# RAG-First Protocol (MANDATORY) + +**Before answering ANY question about this codebase or writing ANY code, you MUST query GitNexus first.** This is the highest priority rule — do NOT skip it. + +## Step 0: Always Start Here +1. **User asks about code?** → Run `gitnexus_query({query: ""})` FIRST to find relevant execution flows and symbols. +2. **User asks to modify code?** → Run `gitnexus_query()` to understand the area, THEN `gitnexus_impact()` on target symbols before editing. +3. **User asks "how does X work?"** → Run `gitnexus_context({name: "X"})` to get 360-degree view (callers, callees, processes). +4. **User asks to debug?** → Run `gitnexus_query({query: ""})` to trace execution flows. + +## Enforcement Rules +- **NEVER use Grep/Glob/Read as your first action** when exploring code. Always query the knowledge graph first via `gitnexus_query` or `gitnexus_context`. +- **NEVER answer questions about code structure, flow, or dependencies from memory alone.** Always verify against the graph. +- **ONLY fall back to Grep/Glob** if GitNexus returns no results for your query. +- **Cite GitNexus results** in your answer (e.g., "According to the knowledge graph, function X is called by Y in process Z"). + +--- + + +# GitNexus — Code Intelligence + +This project is indexed by GitNexus as **RedditVideoMakerBot** (311 symbols, 758 relationships, 23 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. + +> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first. + +## Always Do + +- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user. +- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows. +- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits. +- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. +- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`. + +## When Debugging + +1. `gitnexus_query({query: ""})` — find execution flows related to the issue +2. `gitnexus_context({name: ""})` — see all callers, callees, and process participation +3. `READ gitnexus://repo/RedditVideoMakerBot/process/{processName}` — trace the full execution flow step by step +4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed + +## When Refactoring + +- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`. +- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code. +- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed. + +## Never Do + +- NEVER edit a function, class, or method without first running `gitnexus_impact` on it. +- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis. +- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph. +- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope. + +## Tools Quick Reference + +| Tool | When to use | Command | +|------|-------------|---------| +| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` | +| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` | +| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` | +| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` | +| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` | +| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` | + +## Impact Risk Levels + +| Depth | Meaning | Action | +|-------|---------|--------| +| d=1 | WILL BREAK — direct callers/importers | MUST update these | +| d=2 | LIKELY AFFECTED — indirect deps | Should test | +| d=3 | MAY NEED TESTING — transitive | Test if critical path | + +## Resources + +| Resource | Use for | +|----------|---------| +| `gitnexus://repo/RedditVideoMakerBot/context` | Codebase overview, check index freshness | +| `gitnexus://repo/RedditVideoMakerBot/clusters` | All functional areas | +| `gitnexus://repo/RedditVideoMakerBot/processes` | All execution flows | +| `gitnexus://repo/RedditVideoMakerBot/process/{name}` | Step-by-step execution trace | + +## Self-Check Before Finishing + +Before completing any code modification task, verify: +1. `gitnexus_impact` was run for all modified symbols +2. No HIGH/CRITICAL risk warnings were ignored +3. `gitnexus_detect_changes()` confirms changes match expected scope +4. All d=1 (WILL BREAK) dependents were updated + +## Keeping the Index Fresh + +After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it: + +```bash +npx gitnexus analyze +``` + +If the index previously included embeddings, preserve them by adding `--embeddings`: + +```bash +npx gitnexus analyze --embeddings +``` + +To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.** + +> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`. + +## CLI + +| Task | Read this skill file | +|------|---------------------| +| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` | +| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` | +| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` | +| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` | +| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` | +| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` | + + diff --git a/main.py b/main.py index 742fedfd5..ad8d5584a 100755 --- a/main.py +++ b/main.py @@ -6,11 +6,12 @@ from subprocess import Popen from typing import Dict, NoReturn -from prawcore import ResponseException - -from reddit.subreddit import get_subreddit_threads +# Reddit pipeline disabled — Threads is the active source. +# from reddit.subreddit import get_subreddit_threads +from threads.threads_api import get_threads_posts from utils import settings from utils.cleanup import cleanup +from utils.checkpoint import run_step, save_checkpoint, load_checkpoint, clear_checkpoint, print_resume_status from utils.console import print_markdown, print_step, print_substep from utils.ffmpeg_install import ffmpeg_install from utils.id import extract_id @@ -46,22 +47,60 @@ reddit_object: Dict[str, str | list] -def main(POST_ID=None) -> None: +def main(POST_URL=None) -> None: global reddit_id, reddit_object - reddit_object = get_subreddit_threads(POST_ID) + + # Step 1: Fetch Threads post (no checkpoint — thread_id unknown yet) + reddit_object = get_threads_posts(POST_URL) reddit_id = extract_id(reddit_object) print_substep(f"Thread ID is {reddit_id}", style="bold blue") - length, number_of_comments = save_text_to_mp3(reddit_object) + save_checkpoint(reddit_id, "fetch_thread", {"result": None}) + print_resume_status(reddit_id) + + # Step 2: Generate TTS audio + tts_result = run_step( + reddit_id, "generate_tts", + save_text_to_mp3, reddit_object, + ) + length, number_of_comments = tts_result[0], tts_result[1] length = math.ceil(length) - get_screenshots_of_reddit_posts(reddit_object, number_of_comments) + + # Step 3: Take screenshots + run_step( + reddit_id, "take_screenshots", + get_screenshots_of_reddit_posts, reddit_object, number_of_comments, + ) + + # Step 4: Download background video & audio bg_config = { "video": get_background_config("video"), "audio": get_background_config("audio"), } + run_step( + reddit_id, "download_background", + _download_backgrounds, bg_config, + ) + + # Step 5: Chop background + run_step( + reddit_id, "chop_background", + chop_background, bg_config, length, reddit_object, + ) + + # Step 6: Make final video + run_step( + reddit_id, "make_final_video", + make_final_video, number_of_comments, length, reddit_object, bg_config, + ) + + # Pipeline complete — clear checkpoint + clear_checkpoint(reddit_id) + print_step("Pipeline completed successfully! Checkpoint cleared.") + + +def _download_backgrounds(bg_config): download_background_video(bg_config["video"]) download_background_audio(bg_config["audio"]) - chop_background(bg_config, length, reddit_object) - make_final_video(number_of_comments, length, reddit_object, bg_config) def run_many(times) -> None: @@ -105,11 +144,15 @@ def shutdown() -> NoReturn: ) sys.exit() try: - if config["reddit"]["thread"]["post_id"]: - for index, post_id in enumerate(config["reddit"]["thread"]["post_id"].split("+")): + threads_post_id = ( + config.get("threads", {}).get("thread", {}).get("post_id", "") + if isinstance(config.get("threads", {}), dict) else "" + ) + if threads_post_id: + for index, post_id in enumerate(threads_post_id.split("+")): index += 1 print_step( - f'on the {index}{("st" if index % 10 == 1 else ("nd" if index % 10 == 2 else ("rd" if index % 10 == 3 else "th")))} post of {len(config["reddit"]["thread"]["post_id"].split("+"))}' + f'on the {index}{("st" if index % 10 == 1 else ("nd" if index % 10 == 2 else ("rd" if index % 10 == 3 else "th")))} post of {len(threads_post_id.split("+"))}' ) main(post_id) Popen("cls" if name == "nt" else "clear", shell=True).wait() @@ -119,10 +162,6 @@ def shutdown() -> NoReturn: main() except KeyboardInterrupt: shutdown() - except ResponseException: - print_markdown("## Invalid credentials") - print_markdown("Please check your credentials in the config.toml file") - shutdown() except Exception as err: config["settings"]["tts"]["tiktok_sessionid"] = "REDACTED" config["settings"]["tts"]["elevenlabs_api_key"] = "REDACTED" diff --git a/reddit/subreddit.py b/reddit/subreddit.py index daeb439f2..d6216a3cd 100644 --- a/reddit/subreddit.py +++ b/reddit/subreddit.py @@ -1,4 +1,5 @@ import re +import sys import praw from praw.models import MoreComments @@ -38,14 +39,16 @@ def get_subreddit_threads(POST_ID: str): client_secret=settings.config["reddit"]["creds"]["client_secret"], user_agent="Accessing Reddit threads", username=username, - passkey=passkey, + password=passkey, check_for_async=False, ) except ResponseException as e: if e.response.status_code == 401: print("Invalid credentials - please check them in config.toml") - except: - print("Something went wrong...") + sys.exit(1) + except Exception as e: + print(f"Something went wrong logging into Reddit: {e}") + sys.exit(1) # Ask user for subreddit input print_step("Getting subreddit threads...") diff --git a/threads/__init__.py b/threads/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/threads/threads_api.py b/threads/threads_api.py new file mode 100644 index 000000000..c76329a02 --- /dev/null +++ b/threads/threads_api.py @@ -0,0 +1,171 @@ +"""Threads API integration for fetching posts and replies. + +Uses Meta's Threads Graph API (https://developers.facebook.com/docs/threads). +Requires an access token with 'threads_basic' and 'threads_read_replies' permissions. +""" + +import json +import re +from pathlib import Path +from typing import Optional + +import requests + +from utils import settings +from utils.console import print_step, print_substep +from utils.voice import sanitize_text + +THREADS_API_BASE = "https://graph.threads.net/v1.0" +VIDEOS_DONE_FILE = "./video_creation/data/videos.json" + + +def _api_get(path: str, access_token: str, params: Optional[dict] = None) -> dict: + """Call Threads Graph API GET endpoint.""" + params = dict(params or {}) + params["access_token"] = access_token + url = f"{THREADS_API_BASE}/{path.lstrip('/')}" + response = requests.get(url, params=params, timeout=30) + response.raise_for_status() + return response.json() + + +def _extract_thread_id(url_or_id: str) -> str: + """Extract thread ID from a threads.net URL or return as-is if already an ID.""" + match = re.search(r"/post/([A-Za-z0-9_-]+)", url_or_id) + return match.group(1) if match else url_or_id + + +def _fetch_thread_details(thread_id: str, access_token: str) -> dict: + fields = "id,text,username,timestamp,permalink,media_type,is_quote_post" + return _api_get(thread_id, access_token, {"fields": fields}) + + +def _fetch_replies(thread_id: str, access_token: str, limit: int = 50) -> list[dict]: + fields = "id,text,username,timestamp,permalink" + try: + data = _api_get( + f"{thread_id}/replies", + access_token, + {"fields": fields, "limit": limit, "reverse": "false"}, + ) + return data.get("data", []) + except requests.HTTPError: + return [] + + +def _fetch_user_threads(access_token: str, limit: int = 25) -> list[dict]: + fields = "id,text,username,timestamp,permalink,media_type" + data = _api_get( + "me/threads", + access_token, + {"fields": fields, "limit": limit}, + ) + return data.get("data", []) + + +def _is_valid_reply(text: str, min_len: int, max_len: int, blocked_words: list[str]) -> bool: + if not text or not text.strip(): + return False + if len(text) < min_len or len(text) > max_len: + return False + lower = text.lower() + if any(w.strip().lower() in lower for w in blocked_words if w.strip()): + return False + if not sanitize_text(text): + return False + return True + + +def get_threads_posts(POST_URL: Optional[str] = None) -> dict: + """Fetches a Threads post + replies. Returns a dict compatible with the video pipeline. + + Args: + POST_URL: Optional specific thread URL or ID. If not provided, picks from + the authenticated user's recent threads. + + Returns: + A dict with keys: thread_url, thread_title, thread_id, is_nsfw, comments, + optionally thread_post (for storymode). + """ + print_step("Fetching Threads post...") + + try: + access_token = settings.config["threads"]["creds"]["access_token"] + except KeyError: + raise RuntimeError( + "Missing Threads access_token in config.toml under [threads.creds]" + ) + + if not access_token: + raise RuntimeError("Threads access_token is empty. Set it in config.toml.") + + thread_cfg = settings.config["threads"]["thread"] + min_len = int(thread_cfg.get("min_comment_length", 10)) + max_len = int(thread_cfg.get("max_comment_length", 500)) + blocked_words = [ + w for w in str(thread_cfg.get("blocked_words", "")).split(",") if w.strip() + ] + + target = POST_URL or thread_cfg.get("post_id") or "" + + if target: + thread_id = _extract_thread_id(target) + submission = _fetch_thread_details(thread_id, access_token) + else: + print_substep("No post_id specified. Fetching authenticated user's recent threads.") + user_threads = _fetch_user_threads(access_token, limit=25) + if not user_threads: + raise RuntimeError("No threads found for authenticated user.") + submission = user_threads[0] + thread_id = submission["id"] + + text = submission.get("text", "") + title = (text[:100] + "...") if len(text) > 100 else text + permalink = submission.get("permalink", f"https://www.threads.net/@unknown/post/{thread_id}") + + print_substep(f"Thread: {title}", style="bold green") + print_substep(f"URL: {permalink}", style="bold blue") + + content = { + "thread_url": permalink, + "thread_title": title or f"Thread {thread_id}", + "thread_id": thread_id, + "is_nsfw": False, + "comments": [], + } + + if settings.config["settings"]["storymode"]: + content["thread_post"] = text + else: + replies = _fetch_replies(thread_id, access_token, limit=50) + for reply in replies: + body = reply.get("text", "") + if not _is_valid_reply(body, min_len, max_len, blocked_words): + continue + content["comments"].append({ + "comment_body": body, + "comment_url": reply.get("permalink", ""), + "comment_id": reply["id"], + }) + + print_substep( + f"Got {len(content['comments'])} valid replies.", + style="bold green", + ) + + if _is_already_done(thread_id) and not target: + print_substep("Thread already processed. Fetch skipped.", style="yellow") + raise RuntimeError("Thread already processed. Set post_id to force reprocess.") + + return content + + +def _is_already_done(thread_id: str) -> bool: + path = Path(VIDEOS_DONE_FILE) + if not path.exists(): + return False + try: + done = json.loads(path.read_text(encoding="utf-8")) + except (json.JSONDecodeError, OSError): + return False + return any(v.get("id") == thread_id for v in done) diff --git a/utils/.config.template.toml b/utils/.config.template.toml index 9b13657a5..da4c651af 100644 --- a/utils/.config.template.toml +++ b/utils/.config.template.toml @@ -1,3 +1,12 @@ +[threads.creds] +access_token = { optional = false, nmin = 10, explanation = "Meta Threads Graph API access token with threads_basic and threads_read_replies permissions", example = "THAAaBbCc123...", oob_error = "Access token too short" } + +[threads.thread] +post_id = { optional = true, default = "", explanation = "Specific Threads post URL or ID. Use '+' to separate multiple IDs. Leave empty to fetch from your own recent threads.", example = "https://www.threads.net/@user/post/ABC123" } +max_comment_length = { default = 500, optional = false, nmin = 10, nmax = 10000, type = "int", explanation = "Max characters per reply" } +min_comment_length = { default = 10, optional = true, nmin = 0, nmax = 10000, type = "int", explanation = "Min characters per reply" } +blocked_words = { optional = true, default = "", type = "str", explanation = "Comma-separated words to exclude from replies", example = "spam, nsfw" } + [reddit.creds] client_id = { optional = false, nmin = 12, nmax = 30, explanation = "The ID of your Reddit app of SCRIPT type", example = "fFAGRNJru1FTz70BzhT3Zg", regex = "^[-a-zA-Z0-9._~+/]+=*$", input_error = "The client ID can only contain printable characters.", oob_error = "The ID should be over 12 and under 30 characters, double check your input." } client_secret = { optional = false, nmin = 20, nmax = 40, explanation = "The SECRET of your Reddit app of SCRIPT type", example = "fFAGRNJru1FTz70BzhT3Zg", regex = "^[-a-zA-Z0-9._~+/]+=*$", input_error = "The client ID can only contain printable characters.", oob_error = "The secret should be over 20 and under 40 characters, double check your input." } diff --git a/utils/checkpoint.py b/utils/checkpoint.py new file mode 100644 index 000000000..87d806784 --- /dev/null +++ b/utils/checkpoint.py @@ -0,0 +1,106 @@ +import json +import time +from pathlib import Path +from typing import Any, Optional + +from utils.console import print_step, print_substep + +CHECKPOINT_DIR = Path("assets/temp") + + +def _checkpoint_path(reddit_id: str) -> Path: + return CHECKPOINT_DIR / reddit_id / "checkpoint.json" + + +def save_checkpoint(reddit_id: str, step: str, data: dict[str, Any]) -> None: + path = _checkpoint_path(reddit_id) + path.parent.mkdir(parents=True, exist_ok=True) + + checkpoint = load_checkpoint(reddit_id) or {} + checkpoint["reddit_id"] = reddit_id + checkpoint["last_step"] = step + checkpoint["updated_at"] = time.time() + checkpoint.setdefault("completed_steps", []) + if step not in checkpoint["completed_steps"]: + checkpoint["completed_steps"].append(step) + checkpoint[step] = data + + path.write_text(json.dumps(checkpoint, indent=2, default=str)) + + +def load_checkpoint(reddit_id: str) -> Optional[dict]: + path = _checkpoint_path(reddit_id) + if not path.exists(): + return None + try: + return json.loads(path.read_text()) + except (json.JSONDecodeError, OSError): + return None + + +def is_step_done(reddit_id: str, step: str) -> bool: + cp = load_checkpoint(reddit_id) + if not cp: + return False + return step in cp.get("completed_steps", []) + + +def get_step_data(reddit_id: str, step: str) -> Optional[dict]: + cp = load_checkpoint(reddit_id) + if not cp: + return None + return cp.get(step) + + +def clear_checkpoint(reddit_id: str) -> None: + path = _checkpoint_path(reddit_id) + if path.exists(): + path.unlink() + + +def print_resume_status(reddit_id: str) -> None: + cp = load_checkpoint(reddit_id) + if not cp: + return + done = cp.get("completed_steps", []) + print_substep(f"Resuming from checkpoint. Completed steps: {', '.join(done)}", style="bold yellow") + + +PIPELINE_STEPS = [ + "fetch_reddit", + "generate_tts", + "take_screenshots", + "download_background", + "chop_background", + "make_final_video", +] + + +def run_step(reddit_id: str, step: str, func, *args, max_retries: int = 3, **kwargs) -> Any: + if is_step_done(reddit_id, step): + data = get_step_data(reddit_id, step) + print_substep(f"Step '{step}' already done. Skipping.", style="bold green") + return data.get("result") if data else None + + last_error = None + for attempt in range(1, max_retries + 1): + try: + if attempt > 1: + print_substep(f"Retry {attempt}/{max_retries} for step '{step}'...", style="bold yellow") + result = func(*args, **kwargs) + save_checkpoint(reddit_id, step, {"result": result}) + print_substep(f"Step '{step}' completed successfully.", style="bold green") + return result + except KeyboardInterrupt: + raise + except Exception as e: + last_error = e + print_substep(f"Step '{step}' failed (attempt {attempt}/{max_retries}): {e}", style="bold red") + if attempt < max_retries: + wait = 2 ** attempt + print_substep(f"Waiting {wait}s before retry...", style="yellow") + time.sleep(wait) + + print_step(f"Step '{step}' failed after {max_retries} attempts.") + save_checkpoint(reddit_id, f"{step}_failed", {"error": str(last_error)}) + raise last_error diff --git a/video_creation/final_video.py b/video_creation/final_video.py index c4f3a0b07..a73931e9e 100644 --- a/video_creation/final_video.py +++ b/video_creation/final_video.py @@ -1,6 +1,8 @@ import multiprocessing import os import re +import shutil +import subprocess import tempfile import textwrap import threading @@ -10,6 +12,21 @@ from typing import Dict, Final, Tuple import ffmpeg + + +def _get_video_codec() -> str: + try: + result = subprocess.run( + ["ffmpeg", "-encoders"], capture_output=True, text=True, timeout=10 + ) + if "h264_nvenc" in result.stdout: + return "h264_nvenc" + except (subprocess.TimeoutExpired, FileNotFoundError): + pass + return "libx264" + + +VIDEO_CODEC = _get_video_codec() import translators from PIL import Image, ImageDraw, ImageFont from rich.console import Console @@ -93,7 +110,7 @@ def prepare_background(reddit_id: str, W: int, H: int) -> str: output_path, an=None, **{ - "c:v": "h264_nvenc", + "c:v": VIDEO_CODEC, "b:v": "20M", "b:a": "192k", "threads": multiprocessing.cpu_count(), @@ -438,7 +455,7 @@ def on_update_example(progress) -> None: path, f="mp4", **{ - "c:v": "h264_nvenc", + "c:v": VIDEO_CODEC, "b:v": "20M", "b:a": "192k", "threads": multiprocessing.cpu_count(), @@ -468,7 +485,7 @@ def on_update_example(progress) -> None: path, f="mp4", **{ - "c:v": "h264_nvenc", + "c:v": VIDEO_CODEC, "b:v": "20M", "b:a": "192k", "threads": multiprocessing.cpu_count(), diff --git a/video_creation/screenshot_downloader.py b/video_creation/screenshot_downloader.py index 8dafaf6a0..9164e9635 100644 --- a/video_creation/screenshot_downloader.py +++ b/video_creation/screenshot_downloader.py @@ -72,52 +72,64 @@ def get_screenshots_of_reddit_posts(reddit_object: dict, screenshot_num: int): with sync_playwright() as p: print_substep("Launching Headless Browser...") - browser = p.chromium.launch( - headless=True - ) # headless=False will show the browser for debugging purposes # Device scale factor (or dsf for short) allows us to increase the resolution of the screenshots # When the dsf is 1, the width of the screenshot is 600 pixels # so we need a dsf such that the width of the screenshot is greater than the final resolution of the video dsf = (W // 600) + 1 - context = browser.new_context( + # Create a persistent context directory to save browser profile (cookies, login state, etc.) + user_data_dir = Path("./assets/browser_profile") + user_data_dir.mkdir(parents=True, exist_ok=True) + + context = p.chromium.launch_persistent_context( + str(user_data_dir), + headless=True, # headless=False will show the browser for debugging purposes locale=lang or "en-CA,en;q=0.9", color_scheme="dark", viewport=ViewportSize(width=W, height=H), device_scale_factor=dsf, - user_agent=f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{browser.version}.0.0.0 Safari/537.36", + user_agent=f"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36", extra_http_headers={ "Dnt": "1", "Sec-Ch-Ua": '"Not A(Brand";v="8", "Chromium";v="132", "Google Chrome";v="132"', }, ) + cookies = json.load(cookie_file) cookie_file.close() - context.add_cookies(cookies) # load preference cookies - # Login to Reddit - print_substep("Logging in to Reddit...") + # Login to Reddit only if not already logged in + print_substep("Checking Reddit login status...") page = context.new_page() - page.goto("https://www.reddit.com/login", timeout=0) - page.set_viewport_size(ViewportSize(width=1920, height=1080)) + page.goto("https://www.reddit.com", timeout=0) page.wait_for_load_state() - page.locator(f'input[name="username"]').fill(settings.config["reddit"]["creds"]["username"]) - page.locator(f'input[name="password"]').fill(settings.config["reddit"]["creds"]["password"]) - page.get_by_role("button", name="Log In").click() - page.wait_for_timeout(5000) + # Check if already logged in by looking for profile dropdown + is_logged_in = page.locator('button[aria-label*="user menu"]').is_visible() or \ + page.locator('a[href*="/user/"]').first.is_visible() - login_error_div = page.locator(".AnimatedForm__errorMessage").first - if login_error_div.is_visible(): + if not is_logged_in: + print_substep("Not logged in. Logging in to Reddit...") + page.goto("https://www.reddit.com/login", timeout=0) + page.set_viewport_size(ViewportSize(width=1920, height=1080)) + page.wait_for_load_state() - print_substep( - "Your reddit credentials are incorrect! Please modify them accordingly in the config.toml file.", - style="red", - ) - exit() + page.locator(f'input[name="username"]').fill(settings.config["reddit"]["creds"]["username"]) + page.locator(f'input[name="password"]').fill(settings.config["reddit"]["creds"]["password"]) + page.get_by_role("button", name="Log In").click() + page.wait_for_timeout(5000) + + login_error_div = page.locator(".AnimatedForm__errorMessage").first + if login_error_div.is_visible(): + print_substep( + "Your reddit credentials are incorrect! Please modify them accordingly in the config.toml file.", + style="red", + ) + context.close() + exit() else: - pass + print_substep("Already logged in! Skipping login...") page.wait_for_load_state() # Handle the redesign @@ -133,16 +145,11 @@ def get_screenshots_of_reddit_posts(reddit_object: dict, screenshot_num: int): page.wait_for_load_state() page.wait_for_timeout(5000) - if page.locator( - "#t3_12hmbug > div > div._3xX726aBn29LDbsDtzr_6E._1Ap4F5maDtT1E1YuCiaO0r.D3IL3FD0RFy_mkKLPwL4 > div > div > button" - ).is_visible(): - # This means the post is NSFW and requires to click the proceed button. - + nsfw_button = page.locator("button:has-text('Yes'), button:has-text('Continue'), [data-testid='content-gate'] button").first + if nsfw_button.is_visible(): print_substep("Post is NSFW. You are spicy...") - page.locator( - "#t3_12hmbug > div > div._3xX726aBn29LDbsDtzr_6E._1Ap4F5maDtT1E1YuCiaO0r.D3IL3FD0RFy_mkKLPwL4 > div > div > button" - ).click() - page.wait_for_load_state() # Wait for page to fully load + nsfw_button.click() + page.wait_for_load_state() # translate code if page.locator( @@ -258,7 +265,7 @@ def get_screenshots_of_reddit_posts(reddit_object: dict, screenshot_num: int): print("TimeoutError: Skipping screenshot...") continue - # close browser instance when we are done using it - browser.close() + # close browser context when we are done using it + context.close() print_substep("Screenshots downloaded Successfully.", style="bold green")