Release Reproducible results, automatic `VecEnv` wrapping, env checker and more usability improvements · hill-a/stable-baselines

Breaking Changes:

The seed argument has been moved from learn() method to model constructor
in order to have reproducible results
allow_early_resets of the Monitor wrapper now default to True
make_atari_env now returns a DummyVecEnv by default (instead of a SubprocVecEnv)
this usually improves performance.
Fix inconsistency of sample type, so that mode/sample function returns tensor of tf.int64 in CategoricalProbabilityDistribution/MultiCategoricalProbabilityDistribution (@seheevic)

Add n_cpu_tf_sess to model constructor to choose the number of threads used by Tensorflow
Environments are automatically wrapped in a DummyVecEnv if needed when passing them to the model constructor
Added stable_baselines.common.make_vec_env helper to simplify VecEnv creation
Added stable_baselines.common.evaluation.evaluate_policy helper to simplify model evaluation
VecNormalize changes:
- Now supports being pickled and unpickled (@AdamGleave).
- New methods .normalize_obs(obs) and normalize_reward(rews) apply normalization
  to arbitrary observation or rewards without updating statistics (@shwang)
- .get_original_reward() returns the unnormalized rewards from the most recent timestep
- .reset() now collects observation statistics (used to only apply normalization)
Add parameter exploration_initial_eps to DQN. (@jdossgollin)
Add type checking and PEP 561 compliance.
Note: most functions are still not annotated, this will be a gradual process.
DDPG, TD3 and SAC accept non-symmetric action spaces. (@Antymon)
Add check_env util to check if a custom environment follows the gym interface (@araffin and @justinkterry)

Fix seeding, so it is now possible to have deterministic results on cpu
Fix a bug in DDPG where predict method with deterministic=False would fail
Fix a bug in TRPO: mean_losses was not initialized causing the logger to crash when there was no gradients (@MarvineGothic)
Fix a bug in cmd_util from API change in recent Gym versions
Fix a bug in DDPG, TD3 and SAC where warmup and random exploration actions would end up scaled in the replay buffer (@Antymon)

nprocs (ACKTR) and num_procs (ACER) are deprecated in favor of n_cpu_tf_sess which is now common
to all algorithms
VecNormalize: load_running_average and save_running_average are deprecated in favour of using pickle.

Add plotting to the Monitor example (@rusu24edward)
Add Snake Game AI project (@pedrohbtp)
Add note on the support Tensorflow versions.
Remove unnecessary steps required for Windows installation.
Remove DummyVecEnv creation when not needed
Added make_vec_env to the examples to simplify VecEnv creation
Add QuaRL project (@srivatsankrishnan)
Add Pwnagotchi project (@evilsocket)
Fix multiprocessing example (@rusu24edward)
Fix result_plotter example
Add JNRR19 tutorial (by @edbeeching, @hill-a and @araffin)
Updated notebooks link
Fix typo in algos.rst, "containes" to "contains" (@SyllogismRXS)
Fix outdated source documentation for load_results
Add PPO_CPP project (@Antymon)
Add section on C++ portability of Tensorflow models (@Antymon)
Update custom env documentation to reflect new gym API for the close() method (@justinkterry)
Update custom env documentation to clarify what step and reset return (@justinkterry)
Add RL tips and tricks for doing RL experiments
Corrected lots of typos
Add spell check to documentation if available