|
| 1 | +--- |
| 2 | +name: pre-main-push |
| 3 | +description: Use before pushing to main. Blocks direct pushes to main - instead prompts the user for a ScrapeGraph API key, tests the chosen endpoint against 5 different websites, and only then opens a PR. Trigger when the user says "push to main", "merge to main", or invokes /pre-main-push directly. |
| 4 | +--- |
| 5 | + |
| 6 | +# Pre-main push gate |
| 7 | + |
| 8 | +This skill enforces the rule: **never push directly to `main`**. Before any change lands on `main`, the agent must (1) collect an API key from the user, (2) verify the ScrapeGraph endpoint works against 5 different real websites, and (3) open a pull request instead of pushing. |
| 9 | + |
| 10 | +If any step fails, STOP. Do not push, do not open the PR. Report the failure and wait for the user. |
| 11 | + |
| 12 | +## Step 1 — Ask for the API key (ALWAYS FIRST) |
| 13 | + |
| 14 | +Do this before anything else, even before checking git state. Do NOT proceed without a key. |
| 15 | + |
| 16 | +Ask the user: |
| 17 | + |
| 18 | +> I need your ScrapeGraph API key to validate the endpoint before opening the PR. Please paste it here (it will only be used for this session and will not be committed). |
| 19 | +
|
| 20 | +Rules: |
| 21 | +- Never read the key from `.env`, shell history, or any file silently — the user must paste it explicitly in this turn. |
| 22 | +- Never log, print, echo, or write the key to any file, including commit messages, PR descriptions, or test output. |
| 23 | +- Store it only in an environment variable for the current shell invocations: `SGAI_API_KEY=<pasted value>`. |
| 24 | +- If the user refuses or doesn't provide one, abort the skill and tell them the push is blocked. |
| 25 | + |
| 26 | +## Step 2 — Pick the endpoint to test |
| 27 | + |
| 28 | +Ask the user which v2 endpoint to validate against. The current v2 surface (per `transition-from-v1-to-v2.mdx` in this repo) is: |
| 29 | + |
| 30 | +- **`scrape`** — raw page fetch with output formats (e.g. `format="markdown"`). Replaces v1 `markdownify`. |
| 31 | +- **`extract`** — structured extraction from a URL with a natural-language prompt. Replaces v1 `smartscraper`. |
| 32 | +- **`search`** — web search + extraction. Replaces v1 `searchscraper`. |
| 33 | +- **`crawl.start`** — multi-page async crawl (poll with `crawl.get`). Replaces v1 `smartcrawler`. |
| 34 | +- **`monitor.create`** (and related) — monitoring. |
| 35 | + |
| 36 | +Default: `extract`. If the user names a v1 endpoint (`smartscraper`, `searchscraper`, `markdownify`, `smartcrawler`, `agenticscraper`, `sitemap`), translate it to the v2 equivalent and confirm with them before running. |
| 37 | + |
| 38 | +REST host for v2 is `https://v2-api.scrapegraphai.com/api/<endpoint>` — note this is a **different subdomain** from v1 (`https://api.scrapegraphai.com/v1/...`). Don't mix them. |
| 39 | + |
| 40 | +## Step 3 — Test against 5 different websites |
| 41 | + |
| 42 | +Run the chosen endpoint against these 5 URLs (diverse domains, stable, safe to hit): |
| 43 | + |
| 44 | +1. `https://example.com` |
| 45 | +2. `https://news.ycombinator.com` |
| 46 | +3. `https://en.wikipedia.org/wiki/Web_scraping` |
| 47 | +4. `https://scrapegraphai.com` |
| 48 | +5. `https://httpbin.org/html` |
| 49 | + |
| 50 | +Use a short inline script (do NOT commit it — run it from a temp path or stdin). v2 SDK shape for `extract`: |
| 51 | + |
| 52 | +```bash |
| 53 | +SGAI_API_KEY="$SGAI_API_KEY" python3 - <<'PY' |
| 54 | +import os, sys |
| 55 | +from scrapegraph_py import ScrapeGraphAI, ExtractRequest |
| 56 | +
|
| 57 | +urls = [ |
| 58 | + "https://example.com", |
| 59 | + "https://news.ycombinator.com", |
| 60 | + "https://en.wikipedia.org/wiki/Web_scraping", |
| 61 | + "https://scrapegraphai.com", |
| 62 | + "https://httpbin.org/html", |
| 63 | +] |
| 64 | +sgai = ScrapeGraphAI() # reads SGAI_API_KEY from env |
| 65 | +failures = [] |
| 66 | +for u in urls: |
| 67 | + try: |
| 68 | + res = sgai.extract(ExtractRequest( |
| 69 | + url=u, |
| 70 | + prompt="Summarize the page in one sentence.", |
| 71 | + )) |
| 72 | + if getattr(res, "status", None) != "success": |
| 73 | + raise RuntimeError(f"status={getattr(res, 'status', '?')}") |
| 74 | + print(f"OK {u}") |
| 75 | + except Exception as e: |
| 76 | + print(f"FAIL {u} :: {e}") |
| 77 | + failures.append(u) |
| 78 | +sys.exit(1 if failures else 0) |
| 79 | +PY |
| 80 | +``` |
| 81 | + |
| 82 | +For other v2 endpoints, swap the method and request class: |
| 83 | + |
| 84 | +| endpoint | method | request class (example) | |
| 85 | +|---|---|---| |
| 86 | +| `scrape` | `sgai.scrape(ScrapeRequest(url=..., formats=[MarkdownFormatConfig()]))` | `ScrapeRequest`, `MarkdownFormatConfig` | |
| 87 | +| `extract` | `sgai.extract(ExtractRequest(url=..., prompt=...))` | `ExtractRequest` | |
| 88 | +| `search` | `sgai.search(SearchRequest(query=..., prompt=...))` | `SearchRequest` | |
| 89 | +| `crawl.start` | `sgai.crawl.start(CrawlStartRequest(url=..., ...))` then poll `crawl.get` | `CrawlStartRequest` | |
| 90 | + |
| 91 | +Before running: confirm the exact method/request-class names against the installed `scrapegraph_py >= 2.x` version (the v2 SDK). Field names (`url`, `prompt`) replaced v1's (`website_url`, `user_prompt`) — do not mix them. If the user named a v1 endpoint, translate per the mapping in Step 2 and confirm. |
| 92 | + |
| 93 | +Pass criteria: **all 5 must succeed**. If any fails, abort — do NOT open the PR. Show the user which URLs failed and why. |
| 94 | + |
| 95 | +Do not paste the API key into any tool output, commit message, or PR body. Reference it only via `$SGAI_API_KEY`. |
| 96 | + |
| 97 | +## Step 4 — Open a PR (never push to main) |
| 98 | + |
| 99 | +Only if step 3 passed with 5/5: |
| 100 | + |
| 101 | +1. If the current branch is `main`, create a feature branch first: |
| 102 | + - `git checkout -b <username>/<type>/<short-description>` (see `gh-pr.md` skill for branch conventions) |
| 103 | + - Move the commits off `main` — do NOT push them to `origin/main`. |
| 104 | +2. Push the feature branch: `git push -u origin <branch>` |
| 105 | +3. Open the PR with `gh pr create`. In the PR body, include a short "Validation" section stating: |
| 106 | + > Endpoint `<name>` tested against 5 live websites — all passed. API key supplied by user at PR creation time (not stored). |
| 107 | + Do not include the key itself or any response data that might contain it. |
| 108 | +4. After the PR is open, post the per-URL test results as a PR comment with `gh pr comment <number> --body ...`. Include one line per URL with status and `elapsed_ms` (and response key names if useful), e.g.: |
| 109 | + ``` |
| 110 | + [extract 1/5] https://example.com -> status=success elapsed_ms=534 keys=['page_title', 'main_heading'] |
| 111 | + ``` |
| 112 | + Never paste the API key, request headers, raw response bodies, or any field that might echo the key. If the endpoint returns content that could contain user data, summarize (keys only) instead of dumping the payload. |
| 113 | + |
| 114 | +Defer to the `creating-pr` skill (`gh-pr.md`) for PR title/body formatting and branch naming conventions. |
| 115 | + |
| 116 | +## Hard stops |
| 117 | + |
| 118 | +Abort the skill and report to the user if any of these are true: |
| 119 | + |
| 120 | +- The user did not provide an API key. |
| 121 | +- Any of the 5 test URLs failed. |
| 122 | +- The current branch is `main` and there are uncommitted or unpushed changes that would require a force-push to move. |
| 123 | +- `gh` is not authenticated (`gh auth status` fails). |
| 124 | + |
| 125 | +Never bypass these checks with `--force`, `--no-verify`, or by pushing directly to `origin/main`. |
0 commit comments