|
| 1 | +# Profiling ML-Agents in Python |
| 2 | + |
| 3 | +ML-Agents provides a lightweight profiling system, in order to identity hotspots in the training process and help spot |
| 4 | +regressions from changes. |
| 5 | + |
| 6 | +Timers are hierarchical, meaning that the time tracked in a block of code can be further split into other blocks if |
| 7 | +desired. This also means that a function that is called from multiple places in the code will appear in multiple |
| 8 | +places in the timing output. |
| 9 | + |
| 10 | +All timers operate using a "global" instance by default, but this can be overridden if necessary (mainly for testing). |
| 11 | + |
| 12 | +## Adding Profiling |
| 13 | + |
| 14 | +There are two ways to indicate code should be included in profiling. The simplest way is to add the `@timed` |
| 15 | +decorator to a function or method of interested. |
| 16 | + |
| 17 | +```python |
| 18 | +class TrainerController: |
| 19 | + # .... |
| 20 | + @timed |
| 21 | + def advance(self, env: EnvManager) -> int: |
| 22 | + # do stuff |
| 23 | +``` |
| 24 | + |
| 25 | +You can also used the `hierarchical_timer` context manager. |
| 26 | + |
| 27 | + |
| 28 | +``` python |
| 29 | +with hierarchical_timer("communicator.exchange"): |
| 30 | + outputs = self.communicator.exchange(step_input) |
| 31 | +``` |
| 32 | + |
| 33 | +The context manager may be easier than the `@timed` decorator for profiling different parts of a large function, or |
| 34 | +profiling calls to abstract methods that might not use decorator. |
| 35 | + |
| 36 | +## Output |
| 37 | +By default, at the end of training, timers are collected and written in json format to |
| 38 | +`{summaries_dir}/{run_id}_timers.json`. The output consists of node objects with the following keys: |
| 39 | + * name (string): The name of the block of code. |
| 40 | + * total (float): The total time in seconds spent in the block, including child calls. |
| 41 | + * count (int): The number of times the block was called. |
| 42 | + * self (float): The total time in seconds spent in the block, excluding child calls. |
| 43 | + * children (list): A list of child nodes. |
| 44 | + * is_parallel (bool): Indicates that the block of code was executed in multiple threads or processes (see below). This |
| 45 | + is optional and defaults to false. |
| 46 | + |
| 47 | +### Parallel execution |
| 48 | +For code that executes in multiple processes (for example, SubprocessEnvManager), we periodically send the timer |
| 49 | +information back to the "main" process, aggregate the timers there, and flush them in the subprocess. Note that |
| 50 | +(depending on the number of processes) this can result in timers where the total time may exceed the parent's total |
| 51 | +time. This is analogous to the difference between "real" and "user" values reported from the unix `time` command. In the |
| 52 | +timer output, blocks that were run in parallel are indicated by the `is_parallel` flag. |
| 53 | + |
0 commit comments