Add Cursor Bugbot rules from AGENTS.md (#5280)

qgallouedec · web-flow · commit bad33f28fafe · 2026-03-23T09:10:41.000-06:00
diff --git a/.cursor/BUGBOT.md b/.cursor/BUGBOT.md
@@ -0,0 +1,97 @@
+# BUGBOT.md
+
+## Repository-specific guidance
+
+### Main code vs experimental code
+
+The repository is separated into **main code** and **experimental code**.
+
+* **Main code** should remain stable, consistent, and well-tested.
+* **Experimental code** may be less stable and may contain inconsistent patterns or limited testing.
+
+Small non-invasive improvements that make experimental code more consistent with the main codebase are encouraged, but avoid large refactors.
+
+### Paper implementations
+
+If a PR implements a method, algorithm, or training approach from a research paper, it must also add a corresponding subsection to `paper_index.md`.
+
+When reviewing such PRs, ensure that `paper_index.md` was updated.
+
+### Code duplication and consistency
+
+Trainers in this repository are **self-contained by design**. Shared logic (generation, reward computation, metric logging, weight syncing, etc.) is deliberately duplicated across trainers rather than abstracted into a shared base class.
+
+This is intentional: each trainer must be readable, modifiable, and evolvable in isolation. The base class (`_BaseTrainer`) provides only minimal utilities (model card generation). Everything else — vLLM generation paths, `_get_per_token_logps_and_entropies`, `_calculate_rewards`, `_prepare_inputs`, metric logging — is copied in full.
+
+**The tradeoff**: duplication is accepted, but **consistency is mandatory**. When the same logic appears in multiple trainers, the duplicated blocks must stay aligned:
+
+- Same variable names (`self._last_loaded_step`, `self._metrics[mode]`, …)
+- Same control flow structure (if/elif/else branches in the same order)
+- Same comments (word-for-word when the logic is identical)
+- Divergences only where the trainer's semantics require it (e.g., GRPO extracts logprobs from vLLM, RLOO discards them)
+
+**Consistency over correctness**: this is a strong requirement. When duplicating code, reproduce it exactly — even if you believe the original has a bug. Do not silently fix the issue in your copy. Instead, keep your copy consistent with the source and report the problem so it can be fixed across all trainers in a dedicated PR. A correct-but-inconsistent codebase is harder to maintain than a consistently-wrong one that can be fixed in a single sweep.
+
+**When modifying duplicated code**: if you change a pattern that exists in multiple trainers (e.g., the vLLM generation path in `_generate_single_turn`), apply the same change to all other trainers. A fix in GRPO often implies the same fix in RLOO, and vice versa. Not propagating a change is a bug.
+
+**When reviewing**: if a PR touches duplicated logic, verify that all copies are updated consistently. A common mistake is fixing one trainer and forgetting the others.
+
+### Simplicity
+
+This codebase values **leanness and simplicity above all**. Prefer straightforward, inline code over abstractions, helpers, or utilities — even at the cost of some robustness or generality.
+
+Concretely:
+
+- Do not add layers of indirection (registries, factory patterns, plugin systems). A contributor should be able to read a trainer top to bottom and understand the full flow.
+- Prefer a simple implementation that covers 90% of cases over a complex one that covers 100%. A function that handles the common path in 20 lines is better than a catch-all that handles every edge case in 80.
+- Do not add defensive code, fallback paths, or configuration options "just in case". Only handle cases that actually exist today.
+- Avoid `hasattr` and `getattr`. Their use is almost always a symptom of overly defensive programming or a disguised version check (e.g., "this attribute was added in version X"). Instead, either drop the conditional entirely or express the version check explicitly with a version comparison. There is nearly always a cleaner alternative.
+- When in doubt, prefer less code. Every new function, parameter, or branch is maintenance burden. The best abstraction is often no abstraction.
+
+## Documentation
+
+### Docstrings
+
+Docstrings must follow the repository format below. Do **not** convert docstrings to other styles (Google, NumPy, etc.).
+
+Rules:
+
+* Types appear in backticks inside parentheses: (`str`)
+* Optional parameters are marked with `*optional*`
+* Defaults are written as: `defaults to <value>`
+* When the default is `None`, prefer ```(`str`, *optional*)``` instead of ```(`str` or `None`, *optional*, defaults to `None`)```
+* Union types use `or`: `str` or `None`
+* References to classes use the format: [`~transformers.PreTrainedModel`]
+* Class docstrings may group parameters using headers such as: `> Parameters for X:`
+
+Example:
+
+````python
+def method(self, param1: str, param2: int = 1, param3: float | None = None):
+    """
+    Brief one-line description of what this does.
+
+    Args:
+        param1 (`str`):
+            Description of required param.
+        param2 (`int`, *optional*, defaults to `1`):
+            Description of optional param with default.
+        param3 (`float`, *optional*):
+            Description of optional param without explicit default.
+
+    Returns:
+        `dict` with keys:
+            - `key1` (`list[int]`):
+                Description of this key.
+
+    Examples:
+
+    ```python
+    >>> my_func("hello")
+    ```
+    """
+````
+
+### Links to papers
+
+When linking to papers, use `https://huggingface.co/papers/<id>` instead of `https://arxiv.org/abs/<id>` (same ID suffix system).