You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix documentation typos and list rendering (#6066)
* Fix list being rendered incorrectly in webdocs
I assume this extra blank line will fix the list not being correctly formatted on https://unity-technologies.github.io/ml-agents/#releases-documentation
* Fix typos in docs
* Fix more mis-rendered lists
Add a blank line before bulleted lists in markdown files to avoid them being rendered as in-paragraph sentences that all start with hyphens.
* Fix typos in python comments used to generate docs
Copy file name to clipboardExpand all lines: docs/Python-Custom-Trainer-Plugin.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ capabilities. we introduce an extensible plugin system to define new trainers ba
5
5
in `Ml-agents` Package. This will allow rerouting `mlagents-learn` CLI to custom trainers and extending the config files
6
6
with hyper-parameters specific to your new trainers. We will expose a high-level extensible trainer (both on-policy,
7
7
and off-policy trainers) optimizer and hyperparameter classes with documentation for the use of this plugin. For more
8
-
infromation on how python plugin system works see [Plugin interfaces](Training-Plugins.md).
8
+
information on how python plugin system works see [Plugin interfaces](Training-Plugins.md).
9
9
## Overview
10
10
Model-free RL algorithms generally fall into two broad categories: on-policy and off-policy. On-policy algorithms perform updates based on data gathered from the current policy. Off-policy algorithms learn a Q function from a buffer of previous data, then use this Q function to make decisions. Off-policy algorithms have three key benefits in the context of ML-Agents: They tend to use fewer samples than on-policy as they can pull and re-use data from the buffer many times. They allow player demonstrations to be inserted in-line with RL data into the buffer, enabling new ways of doing imitation learning by streaming player data.
0 commit comments