Skip to content

Commit ef10189

Browse files
MWeltevredeMiffyliaraffin
authored
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination (#948)
* Prohibit simultaneous use of optimize_memory_buffer and handle_timeout_termination * Modify test to avoid unsupported buffer configuration * Change from assertion to raising of ValueError * Update changelog * Update style for consistency * Use handle_timeout_termination when possible Co-authored-by: Anssi <[email protected]> Co-authored-by: Antonin Raffin <[email protected]>
1 parent d64bcb4 commit ef10189

File tree

3 files changed

+12
-1
lines changed

3 files changed

+12
-1
lines changed

docs/misc/changelog.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Bug Fixes:
3333
- Added a check for unbounded actions
3434
- Fixed issues due to newer version of protobuf (tensorboard) and sphinx
3535
- Fix exception causes all over the codebase (@cool-RR)
36+
- Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
3637

3738
Deprecations:
3839
^^^^^^^^^^^^^
@@ -979,4 +980,4 @@ And all the contributors:
979980
@wkirgsn @AechPro @CUN-bjy @batu @IljaAvadiev @timokau @kachayev @cleversonahum
980981
@eleurent @ac-93 @cove9988 @theDebugger811 @hsuehch @Demetrio92 @thomasgubler @IperGiove @ScheiklP
981982
@simoninithomas @armandpl @manuel-delverme @Gautam-J @gianlucadecola @buoyancy99 @caburu @xy9485
982-
@Gregwar @ycheng517 @quantitative-technologies @bcollazo @git-thor @TibiGG @cool-RR
983+
@Gregwar @ycheng517 @quantitative-technologies @bcollazo @git-thor @TibiGG @cool-RR @MWeltevrede

stable_baselines3/common/buffers.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ class ReplayBuffer(BaseBuffer):
164164
at a cost of more complexity.
165165
See https://github.com/DLR-RM/stable-baselines3/issues/37#issuecomment-637501195
166166
and https://github.com/DLR-RM/stable-baselines3/pull/28#issuecomment-637559274
167+
Cannot be used in combination with handle_timeout_termination.
167168
:param handle_timeout_termination: Handle timeout termination (due to timelimit)
168169
separately and treat the task as infinite horizon task.
169170
https://github.com/DLR-RM/stable-baselines3/issues/284
@@ -188,6 +189,12 @@ def __init__(
188189
if psutil is not None:
189190
mem_available = psutil.virtual_memory().available
190191

192+
# there is a bug if both optimize_memory_usage and handle_timeout_termination are true
193+
# see https://github.com/DLR-RM/stable-baselines3/issues/934
194+
if optimize_memory_usage and handle_timeout_termination:
195+
raise ValueError(
196+
"ReplayBuffer does not support optimize_memory_usage = True and handle_timeout_termination = True simultaneously."
197+
)
191198
self.optimize_memory_usage = optimize_memory_usage
192199

193200
self.observations = np.zeros((self.buffer_size, self.n_envs) + self.obs_shape, dtype=observation_space.dtype)

tests/test_save_load.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,9 @@ def test_warn_buffer(recwarn, model_class, optimize_memory_usage):
375375
select_env(model_class),
376376
buffer_size=100,
377377
optimize_memory_usage=optimize_memory_usage,
378+
# we cannot use optimize_memory_usage and handle_timeout_termination
379+
# at the same time
380+
replay_buffer_kwargs={"handle_timeout_termination": not optimize_memory_usage},
378381
policy_kwargs=dict(net_arch=[64]),
379382
learning_starts=10,
380383
)

0 commit comments

Comments
 (0)