Skip to content

Commit c37f838

Browse files
VinciGit00claude
andcommitted
skills: add pre-main-push gate
Blocks direct pushes to main: requires a user-supplied ScrapeGraph API key, validates the chosen v2 endpoint against 5 live sites, and only then opens a PR. Enforces the v1→v2 endpoint mapping and forbids logging the key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b73c530 commit c37f838

1 file changed

Lines changed: 120 additions & 0 deletions

File tree

.claude/skills/pre-main-push.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
name: pre-main-push
3+
description: Use before pushing to main. Blocks direct pushes to main - instead prompts the user for a ScrapeGraph API key, tests the chosen endpoint against 5 different websites, and only then opens a PR. Trigger when the user says "push to main", "merge to main", or invokes /pre-main-push directly.
4+
---
5+
6+
# Pre-main push gate
7+
8+
This skill enforces the rule: **never push directly to `main`**. Before any change lands on `main`, the agent must (1) collect an API key from the user, (2) verify the ScrapeGraph endpoint works against 5 different real websites, and (3) open a pull request instead of pushing.
9+
10+
If any step fails, STOP. Do not push, do not open the PR. Report the failure and wait for the user.
11+
12+
## Step 1 — Ask for the API key (ALWAYS FIRST)
13+
14+
Do this before anything else, even before checking git state. Do NOT proceed without a key.
15+
16+
Ask the user:
17+
18+
> I need your ScrapeGraph API key to validate the endpoint before opening the PR. Please paste it here (it will only be used for this session and will not be committed).
19+
20+
Rules:
21+
- Never read the key from `.env`, shell history, or any file silently — the user must paste it explicitly in this turn.
22+
- Never log, print, echo, or write the key to any file, including commit messages, PR descriptions, or test output.
23+
- Store it only in an environment variable for the current shell invocations: `SGAI_API_KEY=<pasted value>`.
24+
- If the user refuses or doesn't provide one, abort the skill and tell them the push is blocked.
25+
26+
## Step 2 — Pick the endpoint to test
27+
28+
Ask the user which v2 endpoint to validate against. The current v2 surface (per `transition-from-v1-to-v2.mdx` in this repo) is:
29+
30+
- **`scrape`** — raw page fetch with output formats (e.g. `format="markdown"`). Replaces v1 `markdownify`.
31+
- **`extract`** — structured extraction from a URL with a natural-language prompt. Replaces v1 `smartscraper`.
32+
- **`search`** — web search + extraction. Replaces v1 `searchscraper`.
33+
- **`crawl.start`** — multi-page async crawl (poll with `crawl.get`). Replaces v1 `smartcrawler`.
34+
- **`monitor.create`** (and related) — monitoring.
35+
36+
Default: `extract`. If the user names a v1 endpoint (`smartscraper`, `searchscraper`, `markdownify`, `smartcrawler`, `agenticscraper`, `sitemap`), translate it to the v2 equivalent and confirm with them before running.
37+
38+
REST host for v2 is `https://v2-api.scrapegraphai.com/api/<endpoint>` — note this is a **different subdomain** from v1 (`https://api.scrapegraphai.com/v1/...`). Don't mix them.
39+
40+
## Step 3 — Test against 5 different websites
41+
42+
Run the chosen endpoint against these 5 URLs (diverse domains, stable, safe to hit):
43+
44+
1. `https://example.com`
45+
2. `https://news.ycombinator.com`
46+
3. `https://en.wikipedia.org/wiki/Web_scraping`
47+
4. `https://scrapegraphai.com`
48+
5. `https://httpbin.org/html`
49+
50+
Use a short inline script (do NOT commit it — run it from a temp path or stdin). v2 SDK shape for `extract`:
51+
52+
```bash
53+
SGAI_API_KEY="$SGAI_API_KEY" python3 - <<'PY'
54+
import os, sys
55+
from scrapegraph_py import ScrapeGraphAI, ExtractRequest
56+
57+
urls = [
58+
"https://example.com",
59+
"https://news.ycombinator.com",
60+
"https://en.wikipedia.org/wiki/Web_scraping",
61+
"https://scrapegraphai.com",
62+
"https://httpbin.org/html",
63+
]
64+
sgai = ScrapeGraphAI() # reads SGAI_API_KEY from env
65+
failures = []
66+
for u in urls:
67+
try:
68+
res = sgai.extract(ExtractRequest(
69+
url=u,
70+
prompt="Summarize the page in one sentence.",
71+
))
72+
if getattr(res, "status", None) != "success":
73+
raise RuntimeError(f"status={getattr(res, 'status', '?')}")
74+
print(f"OK {u}")
75+
except Exception as e:
76+
print(f"FAIL {u} :: {e}")
77+
failures.append(u)
78+
sys.exit(1 if failures else 0)
79+
PY
80+
```
81+
82+
For other v2 endpoints, swap the method and request class:
83+
84+
| endpoint | method | request class (example) |
85+
|---|---|---|
86+
| `scrape` | `sgai.scrape(ScrapeRequest(url=..., formats=[MarkdownFormatConfig()]))` | `ScrapeRequest`, `MarkdownFormatConfig` |
87+
| `extract` | `sgai.extract(ExtractRequest(url=..., prompt=...))` | `ExtractRequest` |
88+
| `search` | `sgai.search(SearchRequest(query=..., prompt=...))` | `SearchRequest` |
89+
| `crawl.start` | `sgai.crawl.start(CrawlStartRequest(url=..., ...))` then poll `crawl.get` | `CrawlStartRequest` |
90+
91+
Before running: confirm the exact method/request-class names against the installed `scrapegraph_py >= 2.x` version (the v2 SDK). Field names (`url`, `prompt`) replaced v1's (`website_url`, `user_prompt`) — do not mix them. If the user named a v1 endpoint, translate per the mapping in Step 2 and confirm.
92+
93+
Pass criteria: **all 5 must succeed**. If any fails, abort — do NOT open the PR. Show the user which URLs failed and why.
94+
95+
Do not paste the API key into any tool output, commit message, or PR body. Reference it only via `$SGAI_API_KEY`.
96+
97+
## Step 4 — Open a PR (never push to main)
98+
99+
Only if step 3 passed with 5/5:
100+
101+
1. If the current branch is `main`, create a feature branch first:
102+
- `git checkout -b <username>/<type>/<short-description>` (see `gh-pr.md` skill for branch conventions)
103+
- Move the commits off `main` — do NOT push them to `origin/main`.
104+
2. Push the feature branch: `git push -u origin <branch>`
105+
3. Open the PR with `gh pr create`. In the PR body, include a short "Validation" section stating:
106+
> Endpoint `<name>` tested against 5 live websites — all passed. API key supplied by user at PR creation time (not stored).
107+
Do not include the key itself or any response data that might contain it.
108+
109+
Defer to the `creating-pr` skill (`gh-pr.md`) for PR title/body formatting and branch naming conventions.
110+
111+
## Hard stops
112+
113+
Abort the skill and report to the user if any of these are true:
114+
115+
- The user did not provide an API key.
116+
- Any of the 5 test URLs failed.
117+
- The current branch is `main` and there are uncommitted or unpushed changes that would require a force-push to move.
118+
- `gh` is not authenticated (`gh auth status` fails).
119+
120+
Never bypass these checks with `--force`, `--no-verify`, or by pushing directly to `origin/main`.

0 commit comments

Comments
 (0)