New-Grad-Jobs/.coderabbit.yaml at main · harryautomazione/New-Grad-Jobs · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# .coderabbit.yaml — New Grad Jobs
# Senior-level AI review configuration for a solo-maintained OSS repository.
# Philosophy: CodeRabbit is the Lead Architect. It plans before code is written,
# and enforces architectural intent during review. Correctness is table stakes.
# The target is production-grade systems thinking.
# Docs: https://docs.coderabbit.ai/

language: "en-US"
early_access: true

# ============================================================================
# ISSUE ENRICHMENT — The Lead Architect
# When a contributor opens an issue with a trigger label, CodeRabbit
# researches the full codebase and posts a step-by-step implementation plan.
# ============================================================================
issue_enrichment:
  planning:
    enabled: true
    instructions: |
      ## CRITICAL: Repository Policy Enforcement for AI Agent Prompts

      Every AI agent prompt you generate MUST include the following instructions
      verbatim. PRs that violate these rules will be automatically blocked by CI.

      ### PR Title (Conventional Commits — enforced by CI)
      The PR title MUST follow Conventional Commits format:
        `<type>(<scope>): <description>`
      Examples: `fix(scraper): ...`, `feat(config): ...`, `refactor(tests): ...`
      If the agent names the PR generically (e.g., "Fix dates"), CI will reject it.

      ### PR Description (Linked Issue Enforcer — enforced by CI)
      The PR description MUST include the line `Fixes #<issue_number>` on its own line.
      Without this, the Linked Issue Enforcer check will block the PR from merging.

      ### Changelog (PR Changelog Reminder — enforced by CI)
      The agent MUST update `CHANGELOG.md` by adding a concise entry under `## [Unreleased]`
      in the appropriate section (`### Fixed`, `### Added`, `### Changed`).
      If the required subsection (e.g., `### Fixed`) does not exist under `## [Unreleased]`,
      the agent MUST create it. Do not create a duplicate `## [Unreleased]` header.
      Omitting this will cause the PR Changelog Reminder check to flag it.

      ### Scope Control (Clean Diff Policy)
      The agent MUST NOT:
        - Reformat or change whitespace on lines not directly related to the fix
        - Perform bulk auto-formatting, import reordering, or style cleanups
        - Manually edit `README.md` or `jobs.json` — both are auto-generated and
          protected by CI. If a logic fix causes these files to regenerate differently
          when the scraper runs, that is expected and acceptable. What is rejected is
          direct hand-edits or agent-generated content in those files.
      Only modify the specific lines of logic described in the issue.

      ### Pre-merge Verification
      Before opening the PR, the agent should run (or instruct the contributor to run):
        - `pytest -q -o addopts='' tests/` — all tests must pass
        - `pre-commit run --all-files` — all hooks must pass
      Do not open a PR that would fail these checks.
    auto_planning:
      enabled: true
      labels:
        - "plan-me"

# ============================================================================
# CODE REVIEWS — The Senior Engineer
# Reviews must go beyond syntax. Every plan and review must demonstrate
# production-grade systems thinking: flow control, observability, failure modes.
# ============================================================================
reviews:
  profile: "assertive"
  request_changes_workflow: true
  assess_linked_issues: true

  auto_review:
    enabled: true
    drafts: false
    base_branches:
      - "main"

  # ===== SCAM AND ABUSE DETECTION (checked BEFORE any code review) =====
  # These patterns were seen in PR #102 from Xanoutas (2026-03-05).
  review_instructions: |
    ## ⚠️  SCAM DETECTION — Run before any other check

    Scan the PR title, description, every comment, and every added file for:

    ### 1. Payment Solicitation (CRITICAL — immediate request_changes)
    Flag and request changes if any of the following appear anywhere in the PR:
    - Cryptocurrency wallet addresses:
      - EVM/Ethereum: `0x` followed by 40 hex characters
      - Solana: base58 string 32–44 chars (e.g. `C4PcQjqDW4a5Pvhx5ZFPvAodkGiVG49q8dMvpskqSvuH`)
      - Bitcoin: strings starting with `1`, `3`, or `bc1q`
    - Phrases: "Payment address", "EVM:", "Solana:", "BTC:", "ETH:", "tip jar", "donate", "if applicable"
    Response: "**Payment solicitation is not permitted.** This repository does not pay contributors. Flagging for maintainer (@ambicuity) to close and block."

    ### 2. Empty PRs Claiming to Implement a Fix (MAJOR)
    Flag if ALL of the following are true:
    - PR title contains "fix" or "chore"
    - The ONLY changed file is a `.md` file
    - The markdown file is a copy of the PR description or contains a script that is NOT in the actual diff
    Response: "This PR adds only documentation — the actual fix (config.yml changes, code changes) is missing from the diff."

    ### 3. AI-Generated Stub (MAJOR)
    Flag if:
    - Description says "This PR was generated automatically" but there are no code changes
    - A Python script appears in the PR description but not in any changed file
    - The script uses `requests.get()` directly (violates this repo's architectural constraint)
    Response: "This appears to be an AI-generated stub without a real implementation. See CONTRIBUTING.md for requirements."

  # ===== PATH-SPECIFIC REVIEW DIRECTIVES =====
  path_instructions:

    # ── Core Scraper Engine ──────────────────────────────────────────────────
    - path: "scripts/update_jobs.py"
      instructions: |
        This file is the entire data pipeline. Apply senior-level scrutiny, not just linting.

        ## Correctness (Non-Negotiable)
        1. ALL HTTP calls must use `create_optimized_session()`. Bare `requests.get()` is a hard reject.
        2. No bare `except:` clauses. Catch specific exceptions and log them with context.
        3. `ThreadPoolExecutor` workers must not mutate shared state without locks.
        4. Date parsing must handle: None, NaN, float, empty string, Unix ms timestamps (Lever),
           and "Posted X Days Ago" strings (JobSpy). The following functions are HIGH-RISK ZONES
           for timezone bugs and must use `_as_utc_naive()` + `datetime.now(timezone.utc)`:
           `is_recent_job()`, `format_posted_date()`, `get_iso_date()`, `get_sort_date()`,
           and `save_market_history()`. See issue #96.
        5. Every NEW public function requires a Google-style docstring and full type hints.
           For EXISTING functions being modified, add type hints to any changed or touched
           signatures to incrementally improve the baseline. Full signature refactors of
           unmodified functions are not required and add unnecessary diff noise.

        ## Systems Thinking (Distinguish Good Plans from Great Ones)
        6. CONCURRENCY ≠ RATE LIMITING. If a PR adds a `threading.Semaphore` to limit
           concurrent requests, flag that this controls simultaneous connections, not
           requests-per-second. A semaphore of N=50 can still send 50 requests in 1ms.
           A mature solution requires a token bucket or per-domain executor partition.
        7. Reject hardcoded concurrency limits (e.g., `greenhouse_max=50`) without documented
           rationale. Ask: Is this based on actual API documentation? Per-IP? Per-second?
        8. If a PR adds rate limiting logic, demand observability: the code must log when
           a domain semaphore is contended, when a 429 is received, and how long backoff waited.
        9. Flag any implementation that adds control flow without logging. We must be able
           to diagnose production behavior from GitHub Actions logs alone.
        10. Magic numbers (timeouts, max_workers, page sizes, rate limits) must be extracted
            to named constants at the module level with a comment explaining their origin.

        ## Architecture
        11. If a PR introduces a new abstraction (class, manager, dispatcher), question whether
            it should live in `update_jobs.py` or be extracted to a separate module. The file
            is already ~2000 lines. New architectural patterns with significant scope belong
            in their own file.
        12. Reject any introduction of: PostgreSQL, MongoDB, Redis, React, Airflow, or Celery.
            State lives in git. Orchestration is GitHub Actions.

    # ── Configuration ────────────────────────────────────────────────────────
    - path: "config.yml"
      instructions: |
        All parameters controlling scraper behavior belong here, not in Python.
        1. Reject PRs that hardcode in `update_jobs.py` what should be in `config.yml`.
        2. Validate required top-level keys: `filtering`, `greenhouse`, `lever`, `readme`.
        3. If a PR adds a new config key, confirm it is read and used in `update_jobs.py`.
        4. Reject any secrets or API tokens committed to this file.

    # ── GitHub Actions Workflows ─────────────────────────────────────────────
    - path: ".github/workflows/**"
      instructions: |
        SECURITY-CRITICAL. Apply zero-trust review.
        1. Workflow permissions must be minimally scoped (`contents: read` unless write is justified and documented).
        2. Flag any `pull_request_target` trigger — these have write access to the base repo and can be exploited if the workflow checks out untrusted code.
        3. All `actions/` steps should pin to a specific version tag (e.g. `@v7`). Pinning to a full commit SHA is encouraged for high-risk steps but is not required for standard actions already in use. Do NOT flag Dependabot PRs that bump from one version tag to another.
        4. Reject any workflow that echoes or logs GitHub secrets or tokens.
        5. `github-script` steps must only use the minimum required GitHub API methods.
        6. Flag any new workflow triggereable by external contributors without an approval gate or user-allowlist check.

    # ── Tests ────────────────────────────────────────────────────────────────
    - path: "tests/**"
      instructions: |
        Tests must be deterministic, offline, and meaningful.
        1. No live network calls. All HTTP must be mocked (use `unittest.mock.patch`).
        2. No `datetime.now()` without injection — parameterize the reference date.
        3. Edge cases are mandatory: empty inputs, None, NaN, Unicode strings, very long titles,
           timezone-aware ISO timestamps (e.g., `2024-01-01T10:00:00+05:30`).
        4. Every new function in `update_jobs.py` must have at least one test.
        5. Reject tests that only cover the happy path. A test without an edge case is
           a test that will fail in production.
        6. Coverage for a new function should be verifiable by reading the test file alone.

    # ── Documentation ────────────────────────────────────────────────────────
    - path: "docs/**"
      instructions: |
        1. All markdown uses proper heading hierarchy (single H1, sequential H2/H3).
        2. Code blocks must specify a language identifier.
        3. No broken internal links.
        4. `README.md` is auto-generated every 5 minutes. Any PR that manually edits it is
           an automatic reject. Programmatic regeneration as a side-effect of a logic fix is acceptable.

    # ── Catch-all ────────────────────────────────────────────────────────────
    - path: "**/*"
      instructions: |
        GLOBAL STANDARDS — applies to every file in every PR.
        0. SCAM CHECK FIRST: Before reviewing code, scan the PR description and all added
           files for crypto wallet addresses (0x..., Solana base58, bc1q...) or phrases like
           "Payment address", "EVM:", "Solana:", "tip jar". If found, request changes
           immediately: "Payment solicitation is not permitted. @ambicuity please close and block."
        1. Every PR must link to an issue via `Fixes #N`. No issue link = no merge.
        2. Reject PRs that manually edit `README.md` or `jobs.json` — both are auto-generated.
           Programmatic changes caused by logic fixes are acceptable.
        3. Reject any new runtime dependencies without documented justification.
        4. Reject PRs that are clearly AI stubs: no tests, no docstrings, script in description
           but NOT in the diff (the contributor added a markdown file instead of actual code).
        5. PRs should be small and focused. Flag PRs changing more than 3 unrelated areas.


# ============================================================================
# CHAT — The On-Demand Advisor
# ============================================================================
chat:
  auto_reply: true

# ============================================================================
# KNOWLEDGE BASE — The Memory
# CodeRabbit learns from its own past reviews to maintain consistency.
# ============================================================================
knowledge_base:
  learnings:
    scope: "auto"
  issues:
    scope: "auto"
  pull_requests:
    scope: "auto"