0.2.0
Highlights
- Using stable releases for TensorFlow (>=2.3.0), Reverb, and TensorFlow Probability.
- Added Critic Regularized Regression (code, paper)
- Added Discrete Batch-Constrained Deep Q-learning (code, paper)
- Added
EnvironmentLoop.run_episode()for running a single episode. - Update
EnvironmentLoop.run()to takenum_steps, allowing the control of step count rather than just episode count. - Add more distribution types (e.g. GaussianMixture) which can be used by policies.
- Added a environment wrapper for action repeats.
- Improvements/tuning to datasets exposed by
make_dataset. - Add support for nested / multidimensional rewards and discounts.
Minor changes and fixes
ConstantInfologger for logging constant information.- Added a
should_updateparameter to theEnvironmentLoop. - Various modifications and optimizations to the
make_reverb_dataset()function. - Improvements to typing and pytype usage.
- Other minor bug and documentation fixes.