|
| 1 | +--- |
| 2 | +name: pre-main-push |
| 3 | +description: Use before pushing to main. Blocks direct pushes to main - instead prompts the user for a ScrapeGraph API key, tests the chosen endpoint against 5 different websites, and only then opens a PR. Trigger when the user says "push to main", "merge to main", or invokes /pre-main-push directly. |
| 4 | +--- |
| 5 | + |
| 6 | +# Pre-main push gate |
| 7 | + |
| 8 | +This skill enforces the rule: **never push directly to `main`**. Before any change lands on `main`, the agent must (1) collect an API key from the user, (2) verify the ScrapeGraph endpoint works against 5 different real websites, and (3) open a pull request instead of pushing. |
| 9 | + |
| 10 | +If any step fails, STOP. Do not push, do not open the PR. Report the failure and wait for the user. |
| 11 | + |
| 12 | +## Step 1 — Ask for the API key (ALWAYS FIRST) |
| 13 | + |
| 14 | +Do this before anything else, even before checking git state. Do NOT proceed without a key. |
| 15 | + |
| 16 | +Ask the user: |
| 17 | + |
| 18 | +> I need your ScrapeGraph API key to validate the endpoint before opening the PR. Please paste it here (it will only be used for this session and will not be committed). |
| 19 | +
|
| 20 | +Rules: |
| 21 | +- Never read the key from `.env`, shell history, or any file silently — the user must paste it explicitly in this turn. |
| 22 | +- Never log, print, echo, or write the key to any file, including commit messages, PR descriptions, or test output. |
| 23 | +- Store it only in an environment variable for the current shell invocations: `SGAI_API_KEY=<pasted value>`. |
| 24 | +- If the user refuses or doesn't provide one, abort the skill and tell them the push is blocked. |
| 25 | + |
| 26 | +## Step 2 — Pick the endpoint to test |
| 27 | + |
| 28 | +Ask the user which v2 endpoint to validate against. The current v2 surface (per `transition-from-v1-to-v2.mdx` in this repo) is: |
| 29 | + |
| 30 | +- **`scrape`** — raw page fetch with output formats (e.g. `format="markdown"`). Replaces v1 `markdownify`. |
| 31 | +- **`extract`** — structured extraction from a URL with a natural-language prompt. Replaces v1 `smartscraper`. |
| 32 | +- **`search`** — web search + extraction. Replaces v1 `searchscraper`. |
| 33 | +- **`crawl.start`** — multi-page async crawl (poll with `crawl.get`). Replaces v1 `smartcrawler`. |
| 34 | +- **`monitor.create`** (and related) — monitoring. |
| 35 | + |
| 36 | +Default: `extract`. If the user names a v1 endpoint (`smartscraper`, `searchscraper`, `markdownify`, `smartcrawler`, `agenticscraper`, `sitemap`), translate it to the v2 equivalent and confirm with them before running. |
| 37 | + |
| 38 | +REST host for v2 is `https://v2-api.scrapegraphai.com/api/<endpoint>` — note this is a **different subdomain** from v1 (`https://api.scrapegraphai.com/v1/...`). Don't mix them. |
| 39 | + |
| 40 | +## Step 3 — Test against 5 different websites |
| 41 | + |
| 42 | +Run the chosen endpoint against these 5 URLs (diverse domains, stable, safe to hit): |
| 43 | + |
| 44 | +1. `https://example.com` |
| 45 | +2. `https://news.ycombinator.com` |
| 46 | +3. `https://en.wikipedia.org/wiki/Web_scraping` |
| 47 | +4. `https://scrapegraphai.com` |
| 48 | +5. `https://httpbin.org/html` |
| 49 | + |
| 50 | +Use a short inline script (do NOT commit it — run it from a temp path or stdin). v2 SDK shape for `extract`: |
| 51 | + |
| 52 | +```bash |
| 53 | +SGAI_API_KEY="$SGAI_API_KEY" python3 - <<'PY' |
| 54 | +import os, sys |
| 55 | +from scrapegraph_py import ScrapeGraphAI, ExtractRequest |
| 56 | +
|
| 57 | +urls = [ |
| 58 | + "https://example.com", |
| 59 | + "https://news.ycombinator.com", |
| 60 | + "https://en.wikipedia.org/wiki/Web_scraping", |
| 61 | + "https://scrapegraphai.com", |
| 62 | + "https://httpbin.org/html", |
| 63 | +] |
| 64 | +sgai = ScrapeGraphAI() # reads SGAI_API_KEY from env |
| 65 | +failures = [] |
| 66 | +for u in urls: |
| 67 | + try: |
| 68 | + res = sgai.extract(ExtractRequest( |
| 69 | + url=u, |
| 70 | + prompt="Summarize the page in one sentence.", |
| 71 | + )) |
| 72 | + if getattr(res, "status", None) != "success": |
| 73 | + raise RuntimeError(f"status={getattr(res, 'status', '?')}") |
| 74 | + print(f"OK {u}") |
| 75 | + except Exception as e: |
| 76 | + print(f"FAIL {u} :: {e}") |
| 77 | + failures.append(u) |
| 78 | +sys.exit(1 if failures else 0) |
| 79 | +PY |
| 80 | +``` |
| 81 | + |
| 82 | +For other v2 endpoints, swap the method and request class: |
| 83 | + |
| 84 | +| endpoint | method | request class (example) | |
| 85 | +|---|---|---| |
| 86 | +| `scrape` | `sgai.scrape(ScrapeRequest(url=..., formats=[MarkdownFormatConfig()]))` | `ScrapeRequest`, `MarkdownFormatConfig` | |
| 87 | +| `extract` | `sgai.extract(ExtractRequest(url=..., prompt=...))` | `ExtractRequest` | |
| 88 | +| `search` | `sgai.search(SearchRequest(query=..., prompt=...))` | `SearchRequest` | |
| 89 | +| `crawl.start` | `sgai.crawl.start(CrawlStartRequest(url=..., ...))` then poll `crawl.get` | `CrawlStartRequest` | |
| 90 | + |
| 91 | +Before running: confirm the exact method/request-class names against the installed `scrapegraph_py >= 2.x` version (the v2 SDK). Field names (`url`, `prompt`) replaced v1's (`website_url`, `user_prompt`) — do not mix them. If the user named a v1 endpoint, translate per the mapping in Step 2 and confirm. |
| 92 | + |
| 93 | +Pass criteria: **all 5 must succeed**. If any fails, abort — do NOT open the PR. Show the user which URLs failed and why. |
| 94 | + |
| 95 | +Do not paste the API key into any tool output, commit message, or PR body. Reference it only via `$SGAI_API_KEY`. |
| 96 | + |
| 97 | +## Step 4 — Open a PR (never push to main) |
| 98 | + |
| 99 | +Only if step 3 passed with 5/5: |
| 100 | + |
| 101 | +1. If the current branch is `main`, create a feature branch first: |
| 102 | + - `git checkout -b <username>/<type>/<short-description>` (see `gh-pr.md` skill for branch conventions) |
| 103 | + - Move the commits off `main` — do NOT push them to `origin/main`. |
| 104 | +2. Push the feature branch: `git push -u origin <branch>` |
| 105 | +3. Open the PR with `gh pr create`. In the PR body, include a short "Validation" section stating: |
| 106 | + > Endpoint `<name>` tested against 5 live websites — all passed. API key supplied by user at PR creation time (not stored). |
| 107 | + Do not include the key itself or any response data that might contain it. |
| 108 | + |
| 109 | +Defer to the `creating-pr` skill (`gh-pr.md`) for PR title/body formatting and branch naming conventions. |
| 110 | + |
| 111 | +## Hard stops |
| 112 | + |
| 113 | +Abort the skill and report to the user if any of these are true: |
| 114 | + |
| 115 | +- The user did not provide an API key. |
| 116 | +- Any of the 5 test URLs failed. |
| 117 | +- The current branch is `main` and there are uncommitted or unpushed changes that would require a force-push to move. |
| 118 | +- `gh` is not authenticated (`gh auth status` fails). |
| 119 | + |
| 120 | +Never bypass these checks with `--force`, `--no-verify`, or by pushing directly to `origin/main`. |
0 commit comments