diff --git a/README.md b/README.md
index 1da552d7..450528d8 100644
--- a/README.md
+++ b/README.md
@@ -217,6 +217,11 @@ Because JAX installation is different depending on your CUDA version, Haiku does
 
 First, follow these instructions to install JAX with the relevant accelerator support.
 
+``` 
+pip install -r requirements.txt
+``` 
+
+
 ## General Information
 The project entrypoint is `pax/experiment.py`. The simplest command to run a game would be: 
 
diff --git a/docs/getting-started/agents.md b/docs/getting-started/agents.md
index fb5aae31..d7c9711f 100644
--- a/docs/getting-started/agents.md
+++ b/docs/getting-started/agents.md
@@ -1,9 +1,92 @@
 # Agents 
 
-## Agent 1
+## Overview 
+
+Pax provides a number of fixed opponents and learning agents to train and train against. 
+
+## Specifying an Agent
+<!-- TODO: This isn't how Pax works atm. However, taken from Github README, so 
+assuming it is on the TODOs to add later. -->
+Pax comes installed with an `Agent` class and several predefined agents. To specify an agent, import the `Agent` class and specify the agent parameters. 
+
+```
+import jax.numpy as jnp
+import Agent
+
+args = {"hidden": 16, "observation_spec": 5}
+rng = jax.random.PRNGKey(0)
+bs = 1
+init_hidden = jnp.zeros((bs, args.hidden))
+obs = jnp.ones((bs, 5))
+
+agent = Agent(args)
+state, memory = agent.make_initial_state(rng, init_hidden)
+action, state, mem = agent.policy(rng, obs, mem)
+
+state, memory, stats = agent.update(
+    traj_batch, obs, state, mem
+)
+
+mem = agent.reset_memory(mem, False)
+```
+
+To run an experiment with a specific agent, use a pre-made `.yaml` file located in `conf/...` or create your own, and specify the agent. In the below example, `agent1` is a learning agent that learns via PPO and `agent2` is an agent that only chooses the Cooperate action. 
+
+```
+# Agents  
+agent1: 'PPO'
+agent2: 'Altruistic'
+
+...
+```
+
+## List of Agents
+
+```{note}
+Fixed agents are game-specific, while learning agents like PPO can be used in both games. 
+```
+
+### agent1, agent2
+
+#### Fixed
+
+Matrix games
+
+|  Agent      |  Description   | 
+| ----------- | ----------- |
+| **`Altruistic`**  | Always chooses the Cooperate (C) action. |
+| **`Defect`**     | Always chooses the Defect (D) action. |
+| **`GrimTrigger`**   | Chooses the C action on the first turn and reciprocates with the C action until the opponent chooses D, where Grim switches to only choosing D.|
+| **`HyperAltruistic`**  | Infinite matrix game variant of `Altruistic`. Always chooses the Cooperate (C) action.|
+| **`HyperDefect`**  | Infinite matrix game variant of `Defect`. Always chooses the Defect (D) action.|
+| **`HyperTFT`**  | Infinite matrix game variant of `TitForTat`. Chooses the C action on the first turn and reciprocates the opponent's last action.|
+| **`Random`**        | Randomly chooses the C or D action. |
+| **`TitForTat`**    | Chooses the C action on the first turn and reciprocates the opponent's last action.|
+
+
+Coin Game
+
+|   Agent      |    Description| 
+| ----------- | ----------- |
+| **`EvilGreedy`** | Attempts to pick up the closest coin. If equidistant to two colored coins, then it chooses its opponents color coin.|
+| **`GoodGreedy`** | Attempts to pick up the closest coin. If equidistant to two colored coins, then it chooses its own color coin. |
+| **`RandomGreedy`**  | Attempts to pick up the closest coin. If equidistant to two colored coins, then it randomly chooses a color coin. |
+| **`Stay`**     | Agent does not move.|
+
+#### Learning
+
+|  Agent      |   Description | 
+| ----------- | ----------- |
+| **`Naive`**  | Simple learning agent that learns via REINFORCE. |
+| **`NaiveEx`**  | Infinite matrix game variant of `Naive`. Simple learning agent that learns via REINFORCE. |
+| **`MFOS`**  | Meta-learning algorithm for opponent shaping. |
+| **`PPO`**  | Learning agent parameterised by a multilayer perceptron that learns via PPO. |
+| **`PPO_memory`** | Learning agent parameterised by a multilayer perceptron with a memory component that learns via PPO. |
+| **`Tabular`** | Learning agent parameterised by a single layer perceptron that learns via PPO. |
+
+```{note}
+`PPO_memory` serves as the core learning algorithm for both **Good Shepherd (GS)** and **Context and History Aware Other Shaping (CHAOS)** when the training with meta-learning.
+```
 
-Lorem ipsum.
 
-## Agent 2
 
-Lorem ipsum.
diff --git a/docs/getting-started/environments.md b/docs/getting-started/environments.md
index 69b807b7..3f720a9e 100644
--- a/docs/getting-started/environments.md
+++ b/docs/getting-started/environments.md
@@ -1,9 +1,94 @@
 # Environments
 
-## Environment 1
+## Overview 
+Pax supports two environments for learning agents to train within: matrix games and grid-world games. 
+
+## Specifying the Environment
+
+Pax environments are similar to gymnax. To specify an environment, import the environment and specify the environment parameters. 
+
+```
+from pax.envs.iterated_matrix_game import (
+    IteratedMatrixGame,
+    EnvParams,
+)
+
+env = IteratedMatrixGame(num_inner_steps=5)
+env_params = EnvParams(payoff_matrix=payoff)
+
+# 0 = Defect, 1 = Cooperate
+actions = (jnp.ones(()), jnp.ones(()))
+obs, env_state = env.reset(rng, env_params)
+done = False
+
+while not done:
+    obs, env_state, rewards, done, info = env.step(
+        rng, env_state, actions, env_params
+    )
+```
+
+To specify the parameters for the environment: 
+
+```
+...
+# Environment  
+env_id: coin_game
+env_type: meta
+egocentric: True
+env_discount: 0.96
+payoff: [[1, 1, -2], [1, 1, -2]]
+...
+```
+
+## List of Environment Parameters
+
+### env_id 
+|       Name | Description   | 
+| :----------- | :----------- |
+|`iterated_matrix_game`| Classic normal form game with a 2x2 payoff matrix repeatedly played over `n` steps. |                       
+|`infinite_matrix_game` | Special case of the classic normal form game that calculates an exact value, simulating an infinite game. 
+|`coin_game`    | Classic grid-world social dilemma environment.          |               
+
+### env_type
+
+|       Name | Description   | 
+| :----------- | :----------- |
+|`sequential`| Classic normal form game with a 2x2 payoff matrix repeatedly played over `n` steps. |                       
+|`meta`| Meta-learning regime, where an agent learns via meta-learning.     |
+
+### egocentric 
+|       Name | Description   | 
+| :----------- | :----------- |
+|*bool*| If `True`, sets an agent in the Coin Game environment to an egocentric view, empirically found to be more appropriate for other shaping. Else, sets an agent in  to a non-egocentric view, in line with the original version. |
+
+### env_discount 
+<!-- TODO: Possibly deprecate. -->
+|       Name | Description   | 
+| :----------- | :----------- |
+|*Numeric*| Meta-learning discount factor. Between 0 and 1. |     
+
+### payoff 
+|       Name | Description   | 
+| :----------- | :----------- |
+|*Array*| Custom payoff for game. |                       
+
+Example: 
+
+```
+# if playing Coin Game 
+payoff: [[1, 1, -2], [1, 1, -2]]
+```
+
+```
+# if playing Matrix Games
+payoff: [[-1, -1], [-3, 0], [0, -3], [-2, -2]]
+```
+
+```{note}
+Docstrings are under constuction. Please check back later. 
+```
+
+
 
-Lorem ipsum.
 
-## Environment 2
 
-Lorem ipsum.
diff --git a/docs/getting-started/evaluation.md b/docs/getting-started/evaluation.md
new file mode 100644
index 00000000..d4e2145d
--- /dev/null
+++ b/docs/getting-started/evaluation.md
@@ -0,0 +1,81 @@
+# Saving & Loading
+
+Pax provides an easy way to save and load your models. 
+
+## Overview 
+
+Saving and loading allows users to save or load models locally or from Weight and Biases. Users can configure the experiment `.yaml` file to set up the save and load file path, either locally or online. 
+
+## List of Saving Parameters
+
+### save 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*bool* | If `True`, the model is saved to the filepath specified by `save_dir`. |
+
+
+### save_dir 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*String* | Filepath used to save a model. | 
+
+### save_interval 
+
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Int*  | Number of iterations between saving a model. | 
+
+Example
+```
+# config.yaml
+save: True
+save_interval: 10
+save_dir: "./exp/${wandb.group}/${wandb.name}"
+```
+
+## List of Loading Parameters
+
+### model_path 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*String* | Filepath to load the model. | 
+
+### run_path 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*String* | If using Weights and Biases (i.e. `wandb.log=True`), this is the  run path of the model used to load the model.  | 
+
+Example
+```
+# config.yaml
+run_path: ucl-dark/cg/3mpgbfm2
+model_path: exp/coin_game-EARL-PPO_memory-vs-Random/run-seed-0/2022-09-08_20.41.03.643377/generation_30
+```
+
+### wandb 
+
+```{note}
+The following parameters are used for Weights and Biases specific features.  
+```
+
+```
+wandb:
+  entity: "ucl-dark"
+  project: cg
+  group: 'EARL-${agent1}-vs-${agent2}'
+  name: run-seed-${seed}
+  log: False
+```
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|`entity` | Weights and Biases entity. |
+|`project` | Weights and Biases project name.  |
+|`group` | Weights and Biases group name.  |
+|`name` | Weights and Biases run name.  |
+|`log` | Weights and Biases run name.  |
+
+
+
+
+
+
diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md
index 58232513..4653f9cf 100644
--- a/docs/getting-started/installation.md
+++ b/docs/getting-started/installation.md
@@ -1,7 +1,3 @@
 # Installation
 
-Pax is written in pure Python, but depends on C++ code via JAX.
-
-Because JAX installation is different depending on your CUDA version, Haiku does not list JAX as a dependency in requirements.txt.
-
-First, follow these instructions to install JAX with the relevant accelerator support.
+PAX will soon be available to install via the [Python Package Index](https://github.com/akbir/pax). For full installation instructions, please refer to the [Install Guide](https://github.com/akbir/pax) in the project README.
diff --git a/docs/getting-started/runners.md b/docs/getting-started/runners.md
index 899f24fb..2d385db3 100644
--- a/docs/getting-started/runners.md
+++ b/docs/getting-started/runners.md
@@ -1,9 +1,95 @@
 # Runner 
 
-## Runner 1
+## Overview 
 
-Lorem ipsum.
+Pax provides a number of experiment runners useful for different use cases of training and evaluating reinforcement learning agents. 
 
-## Runner 2
+## Specifying a Runner
 
-Lorem ipsum.
+Pax centers around its runners, pieces of custom experiment logic that leverage the speed of JAX. After specifying the environment and agents, a runner carries out the experiment. The code below shows a portion of a runner that carries out a rollout and updates the agent:  
+
+```
+def _rollout(carry, unused):
+    """Runner for inner episode"""
+    (
+        rngs,
+        obs,
+        a_state,
+        a_mem,
+        env_state,
+        env_params,
+    ) = carry
+
+    # unpack rngs
+    rngs = self.split(rngs, 4)
+    action, a_state, new_a_mem = agent1.batch_policy(
+        a_state,
+        obs[0],
+        a_mem,
+    )
+
+    next_obs, env_state, rewards, done, info = env.step(
+        rngs,
+        env_state,
+        (action, action),
+        env_params,
+    )
+
+    traj = Sample(
+        obs1,
+        action,
+        rewards[0],
+        new_a1_mem.extras["log_probs"],
+        new_a1_mem.extras["values"],
+        done,
+        a1_mem.hidden,
+    )
+
+    return (
+        rngs,
+        next_obs,
+        a1_state,
+        new_a1_mem,
+        env_state,
+        env_params,
+    ), (
+        traj1,
+        traj2,
+    )
+
+
+agent = Agent(args)
+state, memory = agent.make_initial_state(rng, init_hidden)
+
+for _ in range(num_updates):
+    final_timestep, batch_trajectory = jax.lax.scan(
+        _rollout,
+        ((obs, env_state, rng), rollout_length),
+        10,
+    )
+
+    _, obs, rewards, a1_state, a1_mem, _, _ = final_timestep
+
+    state, memory, stats = agent.update(
+        batch_trajectory, obs[0], state, memory
+    )
+```
+
+To specify the runner in an experiment, use a pre-made `.yaml` file located in `conf/...` or create your own, and specify the runner with `runner`. In the below example, the `evo` flag and the `EvoRunner` used.
+
+```
+...
+# Runner 
+runner: evo 
+...
+```
+
+## List of Runners 
+
+### runner 
+|   Runner      |    Description| 
+| ----------- | ----------- |
+| **`eval`**   | Evaluation runner, where a single, pre-trained agent is evaluated. |
+| **`evo`** | Evolution runner, where two independent agents are trained via Evolutionary Strategies (ES). |
+| **`rl`** | Multi-agent runner, where two independent agents are trained via reinforcement learning.  |
+| **`sarl`**  | Single-agent runner, where a single agent is trained via reinforcement learning.  |
\ No newline at end of file
diff --git a/docs/getting-started/training.md b/docs/getting-started/training.md
new file mode 100644
index 00000000..f4adbbe3
--- /dev/null
+++ b/docs/getting-started/training.md
@@ -0,0 +1,178 @@
+# Training 
+
+Pax provides fully configurable training parameters for experiments. 
+
+## Overview 
+
+Training parameters allow users to fully specify the training protocol of their experiment. Users can configure the experiment `.yaml` file to specify details such as episode length, number of environments, and much more. 
+
+## List of Training Parameters
+
+### ppo
+<!-- 
+TODO: 
+- with_memory is possibly deprecated.
+- with_cnn is possibly deprecated. 
+- kernel_shape is possibly deprecated. 
+- separate is possibly deprecated. 
+- decide if we should remove the table and write out parameters with a ####. 
+  -->
+
+<!-- #### num_minibatches
+|  Type | Description   | 
+| :----------- | :----------- |
+| *int*| Number of minibatches.  |    -->
+
+|       Name | Type | Description   | 
+| :----------- | :----------- | :----------- |
+| `num_minibatches`   | *int*| Number of minibatches.  |   
+| `num_epochs`   | *int* | Number of epochs.   |   
+| `gamma`   | *Numeric*| Discount factor $\gamma$.  |   
+| `gae_lambda`   | *Numeric*| Generalized advantage estimate $\lambda$ factor.  |   
+| `ppo_clipping_epsilon`   | *Numeric*| Clipping factor $\epsilon$.   |   
+| `value_coeff`   | *Numeric*| Value coefficient.   |   
+| `clip_value`   | *Numeric*| Clip value.   |   
+| `max_gradient_norm`   | *Numeric*| Max gradient norm.   |   
+| `anneal_entropy`   | *bool* | Whether to anneal the entropy term.   |   
+| `entropy_coeff_start`   | *Numeric*| Starting entropy annealing coefficient.  |   
+| `entropy_coeff_horizon`   |*Numeric*|  Number of iterations before entropy coefficient reaches `entropy_coeff_end`   |   
+| `entropy_coeff_end`   | *Numeric*| Ending entropy annealing coefficient.  |   
+| `lr_scheduling`   | *bool* | Whether to annealing the learning rate.   |   
+| `learning_rate`   | *Numeric*| Learning rate.   |   
+| `adam_epsilon`   | *Numeric*| Adam epsilon.   |   
+| `with_memory`   | *bool*| Whether to use memory.  |   
+| `with_cnn`   |*bool* | Whether to use a CNN in Coin Game.  |   
+| `output_channels`   | *int*| Number of output channels.   |   
+| `kernel_shape`   | *Array*|  Size of kernel shape.   |   
+| `separate`   | *bool*| Whether to use separate networks in CNN.  |   
+| `hidden_size`   | *Numeric*| Hidden size of memory layer.  |   
+
+Example
+```
+# config.yaml
+ppo:
+  num_minibatches: 8
+  num_epochs: 2 
+  gamma: 0.96
+  gae_lambda: 0.95
+  ppo_clipping_epsilon: 0.2
+  value_coeff: 0.5
+  clip_value: True
+  ...  
+```
+
+### es 
+
+|       Name | Type | Description   | 
+| :----------- | :----------- | :----------- |
+| `algo`   | *String*| Algorithm to use. Currently supports `[OpenES, CMA_ES, SimpleGA]`|   
+| `sigma_init`   | *Numeric* | Initial scale of isotropic Gaussian noise   |   
+| `sigma_decay`   | *Numeric*| Multiplicative decay factor  |   
+| `sigma_limit`   | *Numeric*| Smallest possible scale  |   
+| `init_min`   | *Numeric*| Range of parameter mean initialization - Min  |   
+| `init_max`   | *Numeric*| Range of parameter mean initialization - Max  |   
+| `clip_min`   | *Numeric*| Range of parameter proposals - Min  |   
+| `clip_max`   | *Numeric*| Range of parameter proposals - Max  |   
+| `lrate_init`   | *Numeric* | Initial learning rate   |   
+| `lrate_decay`   | *Numeric*| Multiplicative decay factor |   
+| `lrate_limit`   |*Numeric*|  Smallest possible lrate |   
+| `beta_1`   | *Numeric*| Adam - beta_1 |   
+| `beta_2`   | *Numeric* | Adam - beta_2 |   
+| `eps`   | *Numeric*| eps constant,  |   
+| `elite_ratio`   | *Numeric*| Percentage of elites to keep.  |   
+
+Example
+```
+# config.yaml 
+es: 
+  algo: OpenES        
+  sigma_init: 0.04   
+  sigma_decay: 0.999  
+  sigma_limit: 0.01  
+  init_min: 0.0       
+  init_max: 0.0       
+  clip_min: -1e10     
+  clip_max: 1e10     
+  lrate_init: 0.1    
+  lrate_decay: 0.9999 
+  lrate_limit: 0.001  
+  beta_1: 0.99        
+  beta_2: 0.999       
+  eps: 1e-8           
+  elite_ratio: 0.1
+```
+
+## List of Training Hyperparameters
+
+### num_devices 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric* | Number of devices used to train the agent. Values greater than `1` require multiple GPUs.|
+
+```{note}
+The following piece of code can used to debug multi-devices on CPU if run at the top of `experiment.py`. 
+```
+
+```
+import os
+from jax.config import config
+os.environ["XLA_FLAGS"] = "--xla_force_host_platform_device_count=2"
+config.update('jax_disable_jit', True)
+```
+
+### num_envs 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric* | Number of environments used to train the agent.| 
+
+### num_generations 
+
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric*  | Number of generations to train the agent when training with evolution.| 
+
+### num_inner_steps 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric* | Number of inner steps within an episode. Set equal to `num_steps` when running `env: sequential`| 
+
+### num_opps 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric* | Number of opponents in each environment. Typically set to `1`.  | 
+
+### num_steps 
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric* | Number of steps in a meta episode. |
+
+Example: 
+```
+num_inner_steps: 16 # Episode length
+num_steps: 9600     # Steps in a meta-episode
+```
+
+Following the formula `number of episodes = num_steps / num_inner_steps`, we can calculate the number of episodes. In this example, each rollout will contain 600 episodes of length 16 (`600 episodes = 9600 steps / 16 steps per episode`). 
+
+### popsize
+|       Name | Description   | 
+| :----------- | :----------- |                 
+|*Numeric*  | Size of population when training with evolution. | 
+
+### top_k 
+|       Name | Description   | 
+| :----------- | :----------- |
+| *Numeric*    | Number of agents to show when training with evolution.  |   
+
+Example
+```
+# config.yaml
+top_k: 5
+popsize: 128 
+num_envs: 50
+num_opps: 1
+num_devices: 2
+num_steps: 9600
+num_inner_steps: 16 
+num_generations: 2000
+```
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 1bb44563..e5d0352f 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,10 +1,14 @@
-# Pax - Multi-Agent Learning in JAX
+# Pax
 
-Pax is an experiment runner for multi-agent research built on top of JAX. It supports "other agent shaping", "multi agent RL" and "single agent RL" experiments. It supports regular and meta agents, and evolutionary and RL-based optimisation. 
+````{note}
+We are under construction at this time. Please check back later. 
+````
+
+Pax is an experiment platform for multi-agent shaping research built on top of JAX. It provides support for other-agent shaping and single/multi-agent reinforcement learning experiments with matrix/2D **environments**, regular/meta-learning **agents**, and evolutionary/RL-based optimisation **runners**. 
 
 > *Pax (noun) - a period of peace that has been forced on a large area, such as an empire or even the whole world*
 
-Pax is composed of 3 components: Environments, Agents and Runners.
+<!-- Pax is composed of 3 components: Environments, Agents and Runners. -->
 
 
 ```{toctree}
@@ -17,4 +21,6 @@ getting-started/installation
 getting-started/environments
 getting-started/agents
 getting-started/runners
+getting-started/training
+getting-started/evaluation
 ```