Skip to content

Commit 2aa9506

Browse files
Fix docstring interlinks (#4221)
1 parent d6eeb29 commit 2aa9506

30 files changed

+178
-169
lines changed

docs/source/best_of_n.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ best_of_n = BestOfNSampler(model, tokenizer, queries_to_scores, length_sampler=o
4444
```
4545

4646
There is the option of setting the generation settings (like `temperature`, `pad_token_id`) at the time of instance creation as opposed to when calling the `generate` method.
47-
This is done by passing a `GenerationConfig` from the `transformers` library at the time of initialization
47+
This is done by passing a [`~transformers.GenerationConfig`] from the `transformers` library at the time of initialization
4848

4949
```python
5050

docs/source/customization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ trainer.train()
112112

113113
## Use the accelerator cache optimizer
114114

115-
When training large models, you should better handle the accelerator cache by iteratively clearing it. To do so, simply pass `optimize_device_cache=True` to `DPOConfig`:
115+
When training large models, you should better handle the accelerator cache by iteratively clearing it. To do so, simply pass `optimize_device_cache=True` to [`DPOConfig`]:
116116

117117
```python
118118
training_args = DPOConfig(..., optimize_device_cache=True)

docs/source/judges.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ pip install trl[judges]
1313

1414
## Using the provided judges
1515

16-
TRL provides several judges out of the box. For example, you can use the `HfPairwiseJudge` to compare two completions using a pre-trained model from the Hugging Face model hub:
16+
TRL provides several judges out of the box. For example, you can use the [`HfPairwiseJudge`] to compare two completions using a pre-trained model from the Hugging Face model hub:
1717

1818
```python
1919
from trl import HfPairwiseJudge

docs/source/logging.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
As reinforcement learning algorithms are historically challenging to debug, it's important to pay careful attention to logging.
44
By default, TRL trainers like [`PPOTrainer`] and [`GRPOTrainer`] save a lot of relevant information to supported experiment trackers like Trackio, Weights & Biases (wandb) or TensorBoard.
55

6-
Upon initialization, pass the `report_to` argument to the respective configuration object (e.g., [`PPOConfig`] for `PPOTrainer`, or [`GRPOConfig`] for `GRPOTrainer`):
6+
Upon initialization, pass the `report_to` argument to the respective configuration object (e.g., [`PPOConfig`] for [`PPOTrainer`], or [`GRPOConfig`] for [`GRPOTrainer`]):
77

88
```python
99
# For PPOTrainer
@@ -19,7 +19,7 @@ grpo_config = GRPOConfig(
1919
)
2020
```
2121

22-
If you want to log with TensorBoard, you might also need to specify logging directories, for example, by adding `logging_dir=PATH_TO_LOGS` to the configuration object (e.g., `PPOConfig` or `GRPOConfig`).
22+
If you want to log with TensorBoard, you might also need to specify logging directories, for example, by adding `logging_dir=PATH_TO_LOGS` to the configuration object (e.g., [`PPOConfig`] or [`GRPOConfig`]).
2323

2424
## PPO Logging
2525

@@ -83,9 +83,9 @@ Here's a brief explanation for the logged metrics provided in the data for the G
8383

8484
### Policy and Loss Metrics
8585

86-
* `kl`: The mean Kullback-Leibler (KL) divergence between the current policy and the reference policy. This is logged only if `beta` (the KL coefficient in `GRPOConfig`) is non-zero.
86+
* `kl`: The mean Kullback-Leibler (KL) divergence between the current policy and the reference policy. This is logged only if `beta` (the KL coefficient in [`GRPOConfig`]) is non-zero.
8787
* `entropy`: Average entropy of token predictions across generated completions.
88-
* If Liger GRPOLoss is used (`use_liger_loss: True` in `GRPOConfig`):
88+
* If Liger GRPOLoss is used (`use_liger_loss: True` in [`GRPOConfig`]):
8989
* `clip_ratio`: The fraction of policy updates where the probability ratio was clipped according to the GRPO loss's epsilon bounds.
9090
* If standard GRPOLoss is used (`use_liger_loss: False`):
9191
* `clip_ratio/low_mean`: The mean fraction of instances where the probability ratio `r_t(θ)` was clipped at the lower bound `1 - epsilon_low` (occurs when advantage is negative and ratio is below the bound).

docs/source/paper_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ training_args = DPOConfig(
338338
)
339339
```
340340

341-
For the unpaired version, the user should utilize `BCOConfig` and `BCOTrainer`.
341+
For the unpaired version, the user should utilize [`BCOConfig`] and [`BCOTrainer`].
342342

343343
### Self-Play Preference Optimization for Language Model Alignment
344344

docs/source/peft_integration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Note: if you don't want to log with `wandb` remove `log_with="wandb"` in the scr
2020

2121
## How to use it?
2222

23-
Simply declare a `PeftConfig` object in your script and pass it through `.from_pretrained` to load the TRL+PEFT model.
23+
Simply declare a [`~peft.PeftConfig`] object in your script and pass it through `.from_pretrained` to load the TRL+PEFT model.
2424

2525
```python
2626
from peft import LoraConfig

docs/source/reducing_memory_usage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ Packing, introduced in [Raffel et al., 2020](https://huggingface.co/papers/1910.
7777
Packing reduces padding by merging several sequences in one row when possible. We use an advanced method to be near-optimal in the way we pack the dataset. To enable packing, use `packing=True` in the [`SFTConfig`].
7878

7979
> [!TIP]
80-
> In TRL 0.18 and earlier, packing used a more aggressive method that reduced padding to almost nothing, but had the downside of breaking sequence continuity for a large fraction of the dataset. To revert to this strategy, use `packing_strategy="wrapped"` in `SFTConfig`.
80+
> In TRL 0.18 and earlier, packing used a more aggressive method that reduced padding to almost nothing, but had the downside of breaking sequence continuity for a large fraction of the dataset. To revert to this strategy, use `packing_strategy="wrapped"` in [`SFTConfig`].
8181
8282
```python
8383
from trl import SFTConfig

trl/data_utils.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,7 @@ def maybe_apply_chat_template(
271271
messages, where each message is a dictionary with keys `"role"` and `"content"`. Additionally, the example
272272
may contain a `"chat_template_kwargs"` key, which is a dictionary of additional keyword arguments to pass
273273
to the chat template renderer.
274-
tokenizer (`PreTrainedTokenizerBase`):
274+
tokenizer ([`~transformers.PreTrainedTokenizerBase`]):
275275
Tokenizer to apply the chat template with.
276276
tools (`list[Union[dict, Callable]]`, *optional*):
277277
A list of tools (callable functions) that will be accessible to the model. If the template does not support
@@ -328,7 +328,7 @@ def unpair_preference_dataset(
328328
Unpair a preference dataset.
329329
330330
Args:
331-
dataset (`Dataset` or `DatasetDict`):
331+
dataset ([`~datasets.Dataset`] or [`~datasets.DatasetDict`]):
332332
Preference dataset to unpair. The dataset must have columns `"chosen"`, `"rejected"` and optionally
333333
`"prompt"`.
334334
num_proc (`int`, *optional*):
@@ -337,7 +337,7 @@ def unpair_preference_dataset(
337337
Meaningful description to be displayed alongside with the progress bar while mapping examples.
338338
339339
Returns:
340-
`Dataset`: The unpaired preference dataset.
340+
[`~datasets.Dataset`]: The unpaired preference dataset.
341341
342342
Example:
343343
@@ -371,7 +371,7 @@ def maybe_unpair_preference_dataset(
371371
Unpair a preference dataset if it is paired.
372372
373373
Args:
374-
dataset (`Dataset` or `DatasetDict`):
374+
dataset ([`~datasets.Dataset`] or [`~datasets.DatasetDict`]):
375375
Preference dataset to unpair. The dataset must have columns `"chosen"`, `"rejected"` and optionally
376376
`"prompt"`.
377377
num_proc (`int`, *optional*):
@@ -380,7 +380,8 @@ def maybe_unpair_preference_dataset(
380380
Meaningful description to be displayed alongside with the progress bar while mapping examples.
381381
382382
Returns:
383-
`Dataset` or `DatasetDict`: The unpaired preference dataset if it was paired, otherwise the original dataset.
383+
[`~datasets.Dataset`] or [`~datasets.DatasetDict`]: The unpaired preference dataset if it was paired, otherwise
384+
the original dataset.
384385
385386
Example:
386387
@@ -473,7 +474,7 @@ def maybe_extract_prompt(example: dict[str, list]) -> dict[str, list]:
473474
'rejected': [{'role': 'assistant', 'content': 'It is green.'}]}
474475
```
475476
476-
Or, with the `map` method of `datasets.Dataset`:
477+
Or, with the `map` method of [`~datasets.Dataset`]:
477478
478479
```python
479480
>>> from trl import extract_prompt
@@ -664,7 +665,7 @@ def pack_dataset(
664665
Pack sequences in a dataset into chunks of size `seq_length`.
665666
666667
Args:
667-
dataset (`Dataset` or `DatasetDict`):
668+
dataset ([`~datasets.Dataset`] or [`~datasets.DatasetDict`]):
668669
Dataset to pack
669670
seq_length (`int`):
670671
Target sequence length to pack to.
@@ -679,8 +680,8 @@ def pack_dataset(
679680
Additional keyword arguments to pass to the dataset's map method when packing examples.
680681
681682
Returns:
682-
`Dataset` or `DatasetDict`: The dataset with packed sequences. The number of examples may decrease as sequences
683-
are combined.
683+
[`~datasets.Dataset`] or [`~datasets.DatasetDict`]: The dataset with packed sequences. The number of examples
684+
may decrease as sequences are combined.
684685
685686
Example:
686687
```python
@@ -720,15 +721,15 @@ def truncate_dataset(
720721
Truncate sequences in a dataset to a specified `max_length`.
721722
722723
Args:
723-
dataset (`Dataset` or `DatasetDict`):
724+
dataset ([`~datasets.Dataset`] or [`~datasets.DatasetDict`]):
724725
Dataset to truncate.
725726
max_length (`int`):
726727
Maximum sequence length to truncate to.
727728
map_kwargs (`dict`, *optional*):
728729
Additional keyword arguments to pass to the dataset's map method when truncating examples.
729730
730731
Returns:
731-
`Dataset` or `DatasetDict`: The dataset with truncated sequences.
732+
[`~datasets.Dataset`] or [`~datasets.DatasetDict`]: The dataset with truncated sequences.
732733
733734
Example:
734735
```python

trl/mergekit_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ def merge_models(config: MergeConfig, out_path: str):
264264
Merge two models using mergekit
265265
266266
Args:
267-
config (`MergeConfig`): The merge configuration.
267+
config ([`MergeConfig`]): The merge configuration.
268268
out_path (`str`): The output path for the merged model.
269269
"""
270270
if not is_mergekit_available():

trl/models/modeling_base.py

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -57,14 +57,17 @@
5757

5858

5959
class PreTrainedModelWrapper(nn.Module):
60-
r"""
61-
A wrapper class around a (`transformers.PreTrainedModel`) to be compatible with the (`~transformers.PreTrained`)
62-
class in order to keep some attributes and methods of the (`~transformers.PreTrainedModel`) class.
60+
"""
61+
Wrapper for a [`~transformers.PreTrainedModel`] implemented as a standard PyTorch [`torch.nn.Module`].
62+
63+
This class provides a compatibility layer that preserves the key attributes and methods of the original
64+
[`~transformers.PreTrainedModel`], while exposing a uniform interface consistent with PyTorch modules. It enables
65+
seamless integration of pretrained Transformer models into custom training, evaluation, or inference workflows.
6366
6467
Attributes:
65-
pretrained_model (`transformers.PreTrainedModel`):
68+
pretrained_model ([`~transformers.PreTrainedModel`]):
6669
The model to be wrapped.
67-
parent_class (`transformers.PreTrainedModel`):
70+
parent_class ([`~transformers.PreTrainedModel`]):
6871
The parent class of the model to be wrapped.
6972
supported_args (`list`):
7073
The list of arguments that are supported by the wrapper class.
@@ -111,19 +114,20 @@ def __init__(
111114
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
112115
r"""
113116
Instantiates a new model from a pretrained model from `transformers`. The pretrained model is loaded using the
114-
`from_pretrained` method of the `transformers.PreTrainedModel` class. The arguments that are specific to the
115-
`transformers.PreTrainedModel` class are passed along this method and filtered out from the `kwargs` argument.
117+
`from_pretrained` method of the [`~transformers.PreTrainedModel`] class. The arguments that are specific to the
118+
[`~transformers.PreTrainedModel`] class are passed along this method and filtered out from the `kwargs`
119+
argument.
116120
117121
Args:
118-
pretrained_model_name_or_path (`str` or `transformers.PreTrainedModel`):
122+
pretrained_model_name_or_path (`str` or [`~transformers.PreTrainedModel`]):
119123
The path to the pretrained model or its name.
120-
*model_args (`list`, *optional*)):
124+
*model_args (`list`, *optional*):
121125
Additional positional arguments passed along to the underlying model's `from_pretrained` method.
122126
**kwargs (`dict`, *optional*):
123127
Additional keyword arguments passed along to the underlying model's `from_pretrained` method. We also
124-
pre-process the kwargs to extract the arguments that are specific to the `transformers.PreTrainedModel`
125-
class and the arguments that are specific to trl models. The kwargs also support
126-
`prepare_model_for_kbit_training` arguments from `peft` library.
128+
pre-process the kwargs to extract the arguments that are specific to the
129+
[`~transformers.PreTrainedModel`] class and the arguments that are specific to trl models. The kwargs
130+
also support `prepare_model_for_kbit_training` arguments from `peft` library.
127131
"""
128132
if kwargs is not None:
129133
peft_config = kwargs.pop("peft_config", None)
@@ -507,8 +511,8 @@ def add_and_load_reward_modeling_adapter(
507511
def push_to_hub(self, *args, **kwargs):
508512
r"""
509513
Push the pretrained model to the hub. This method is a wrapper around
510-
`transformers.PreTrainedModel.push_to_hub`. Please refer to the documentation of
511-
`transformers.PreTrainedModel.push_to_hub` for more information.
514+
[`~transformers.PreTrainedModel.push_to_hub`]. Please refer to the documentation of
515+
[`~transformers.PreTrainedModel.push_to_hub`] for more information.
512516
513517
Args:
514518
*args (`list`, *optional*):
@@ -521,8 +525,8 @@ def push_to_hub(self, *args, **kwargs):
521525
def save_pretrained(self, *args, **kwargs):
522526
r"""
523527
Save the pretrained model to a directory. This method is a wrapper around
524-
`transformers.PreTrainedModel.save_pretrained`. Please refer to the documentation of
525-
`transformers.PreTrainedModel.save_pretrained` for more information.
528+
[`~transformers.PreTrainedModel.save_pretrained`]. Please refer to the documentation of
529+
[`~transformers.PreTrainedModel.save_pretrained`] for more information.
526530
527531
Args:
528532
*args (`list`, *optional*):
@@ -596,14 +600,14 @@ def create_reference_model(
596600
Creates a static reference copy of a model. Note that model will be in `.eval()` mode.
597601
598602
Args:
599-
model (`PreTrainedModelWrapper`): The model to be copied.
603+
model ([`PreTrainedModelWrapper`]): The model to be copied.
600604
num_shared_layers (`int`, *optional*):
601605
The number of initial layers that are shared between both models and kept frozen.
602606
pattern (`str`, *optional*): The shared layers are selected with a string pattern
603607
(e.g. "transformer.h.{layer}" for GPT2) and if a custom pattern is necessary it can be passed here.
604608
605609
Returns:
606-
`PreTrainedModelWrapper`
610+
[`PreTrainedModelWrapper`]
607611
"""
608612
if is_deepspeed_zero3_enabled():
609613
raise ValueError(
@@ -665,13 +669,13 @@ def create_reference_model(
665669

666670

667671
class GeometricMixtureWrapper(GenerationMixin):
668-
r"""
672+
"""
669673
Geometric Mixture generation wrapper that samples from the logits of two model's geometric mixture.
670674
671675
Args:
672-
model (`PreTrainedModel`): The model to be wrapped.
673-
ref_model (`PreTrainedModel`): The reference model.
674-
generation_config (`GenerationConfig`): The generation config.
676+
model ([`~transformers.PreTrainedModel`]): The model to be wrapped.
677+
ref_model ([`~transformers.PreTrainedModel`]): The reference model.
678+
generation_config ([`~transformers.GenerationConfig`]): The generation config.
675679
mixture_coef (`float`, *optional* - default: 0.5): The mixture coefficient.
676680
"""
677681

0 commit comments

Comments
 (0)