Skip to content

Commit 195b629

Browse files
authored
Add repository-specific guidance for agents (AGENTS.md) (#5236)
1 parent 1850da5 commit 195b629

File tree

1 file changed

+96
-0
lines changed

1 file changed

+96
-0
lines changed

AGENTS.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# AGENTS.md
2+
3+
## Repository-specific guidance
4+
5+
### Main code vs experimental code
6+
7+
The repository is separated into **main code** and **experimental code**.
8+
9+
* **Main code** should remain stable, consistent, and well-tested.
10+
* **Experimental code** may be less stable and may contain inconsistent patterns or limited testing.
11+
12+
Small non-invasive improvements that make experimental code more consistent with the main codebase are encouraged, but avoid large refactors.
13+
14+
### Paper implementations
15+
16+
If a PR implements a method, algorithm, or training approach from a research paper, it must also add a corresponding subsection to `paper_index.md`.
17+
18+
When reviewing such PRs, ensure that `paper_index.md` was updated.
19+
20+
### Code duplication and consistency
21+
22+
Trainers in this repository are **self-contained by design**. Shared logic (generation, reward computation, metric logging, weight syncing, etc.) is deliberately duplicated across trainers rather than abstracted into a shared base class.
23+
24+
This is intentional: each trainer must be readable, modifiable, and evolvable in isolation. The base class (`_BaseTrainer`) provides only minimal utilities (model card generation). Everything else — vLLM generation paths, `_get_per_token_logps_and_entropies`, `_calculate_rewards`, `_prepare_inputs`, metric logging — is copied in full.
25+
26+
**The tradeoff**: duplication is accepted, but **consistency is mandatory**. When the same logic appears in multiple trainers, the duplicated blocks must stay aligned:
27+
28+
- Same variable names (`self._last_loaded_step`, `self._metrics[mode]`, …)
29+
- Same control flow structure (if/elif/else branches in the same order)
30+
- Same comments (word-for-word when the logic is identical)
31+
- Divergences only where the trainer's semantics require it (e.g., GRPO extracts logprobs from vLLM, RLOO discards them)
32+
33+
**Consistency over correctness**: this is a strong requirement. When duplicating code, reproduce it exactly — even if you believe the original has a bug. Do not silently fix the issue in your copy. Instead, keep your copy consistent with the source and report the problem so it can be fixed across all trainers in a dedicated PR. A correct-but-inconsistent codebase is harder to maintain than a consistently-wrong one that can be fixed in a single sweep.
34+
35+
**When modifying duplicated code**: if you change a pattern that exists in multiple trainers (e.g., the vLLM generation path in `_generate_single_turn`), apply the same change to all other trainers. A fix in GRPO often implies the same fix in RLOO, and vice versa. Not propagating a change is a bug.
36+
37+
**When reviewing**: if a PR touches duplicated logic, verify that all copies are updated consistently. A common mistake is fixing one trainer and forgetting the others.
38+
39+
### Simplicity
40+
41+
This codebase values **leanness and simplicity above all**. Prefer straightforward, inline code over abstractions, helpers, or utilities — even at the cost of some robustness or generality.
42+
43+
Concretely:
44+
45+
- Do not add layers of indirection (registries, factory patterns, plugin systems). A contributor should be able to read a trainer top to bottom and understand the full flow.
46+
- Prefer a simple implementation that covers 90% of cases over a complex one that covers 100%. A function that handles the common path in 20 lines is better than a catch-all that handles every edge case in 80.
47+
- Do not add defensive code, fallback paths, or configuration options "just in case". Only handle cases that actually exist today.
48+
- When in doubt, prefer less code. Every new function, parameter, or branch is maintenance burden. The best abstraction is often no abstraction.
49+
50+
## Documentation
51+
52+
### Docstrings
53+
54+
Docstrings must follow the repository format below. Do **not** convert docstrings to other styles (Google, NumPy, etc.).
55+
56+
Rules:
57+
58+
* Types appear in backticks inside parentheses: (`str`)
59+
* Optional parameters are marked with `*optional*`
60+
* Defaults are written as: `defaults to <value>`
61+
* When the default is `None`, prefer ```(`str`, *optional*)``` instead of ```(`str` or `None`, *optional*, defaults to `None`)```
62+
* Union types use `or`: `str` or `None`
63+
* References to classes use the format: [`~transformers.PreTrainedModel`]
64+
* Class docstrings may group parameters using headers such as: `> Parameters for X:`
65+
66+
Example:
67+
68+
````python
69+
def method(self, param1: str, param2: int = 1, param3: float | None = None):
70+
"""
71+
Brief one-line description of what this does.
72+
73+
Args:
74+
param1 (`str`):
75+
Description of required param.
76+
param2 (`int`, *optional*, defaults to `1`):
77+
Description of optional param with default.
78+
param3 (`float`, *optional*):
79+
Description of optional param without explicit default.
80+
81+
Returns:
82+
`dict` with keys:
83+
- `key1` (`list[int]`):
84+
Description of this key.
85+
86+
Examples:
87+
88+
```python
89+
>>> my_func("hello")
90+
```
91+
"""
92+
````
93+
94+
### Links to papers
95+
96+
When linking to papers, use `https://huggingface.co/papers/<id>` instead of `https://arxiv.org/abs/<id>` (same ID suffix system).

0 commit comments

Comments
 (0)