-
Notifications
You must be signed in to change notification settings - Fork 9
Implement Atari environment using envpool #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,90 @@ | ||||||||||||||||||||||||||||||||||||||||||
| from functools import cached_property | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| import cv2 | ||||||||||||||||||||||||||||||||||||||||||
| import numpy as np | ||||||||||||||||||||||||||||||||||||||||||
| from environments.gym_environment import GymEnvironment | ||||||||||||||||||||||||||||||||||||||||||
| from gymnasium import spaces | ||||||||||||||||||||||||||||||||||||||||||
| from util.configurations import AtariConfig | ||||||||||||||||||||||||||||||||||||||||||
| import envpool | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| class AtariEnvironment(GymEnvironment): | ||||||||||||||||||||||||||||||||||||||||||
| def __init__(self, config: AtariConfig, seed: int, evaluation: bool) -> None: | ||||||||||||||||||||||||||||||||||||||||||
| super().__init__(config) | ||||||||||||||||||||||||||||||||||||||||||
| self.env = envpool.make_gymnasium( | ||||||||||||||||||||||||||||||||||||||||||
| config.task, | ||||||||||||||||||||||||||||||||||||||||||
| num_envs=1, | ||||||||||||||||||||||||||||||||||||||||||
| seed=seed, | ||||||||||||||||||||||||||||||||||||||||||
| img_width=config.frame_width, | ||||||||||||||||||||||||||||||||||||||||||
| img_height=config.frame_height, | ||||||||||||||||||||||||||||||||||||||||||
| episodic_life=evaluation, | ||||||||||||||||||||||||||||||||||||||||||
| reward_clip=evaluation, | ||||||||||||||||||||||||||||||||||||||||||
| stack_num=config.frames_to_stack, | ||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||
| if config.display == 1: | ||||||||||||||||||||||||||||||||||||||||||
| self.name = f"{config.task}-{seed}" | ||||||||||||||||||||||||||||||||||||||||||
| cv2.namedWindow(self.name, cv2.WINDOW_GUI_NORMAL) | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| self.reset() | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| @cached_property | ||||||||||||||||||||||||||||||||||||||||||
| def max_action_value(self) -> float: | ||||||||||||||||||||||||||||||||||||||||||
| return self.env.action_space.high[0] | ||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
| @cached_property | ||||||||||||||||||||||||||||||||||||||||||
| def min_action_value(self) -> float: | ||||||||||||||||||||||||||||||||||||||||||
| return self.env.action_space.low[0] | ||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+32
to
+36
|
||||||||||||||||||||||||||||||||||||||||||
| return self.env.action_space.high[0] | |
| @cached_property | |
| def min_action_value(self) -> float: | |
| return self.env.action_space.low[0] | |
| if isinstance(self.env.action_space, spaces.Box): | |
| return self.env.action_space.high[0] | |
| elif isinstance(self.env.action_space, spaces.Discrete): | |
| return self.env.action_space.n - 1 | |
| else: | |
| raise ValueError(f"Unhandled action space type: {type(self.env.action_space)}") | |
| @cached_property | |
| def min_action_value(self) -> float: | |
| if isinstance(self.env.action_space, spaces.Box): | |
| return self.env.action_space.low[0] | |
| elif isinstance(self.env.action_space, spaces.Discrete): | |
| return 0 | |
| else: | |
| raise ValueError(f"Unhandled action space type: {type(self.env.action_space)}") |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling self.env.reset() without using seed parameter doesn't actually seed the environment. This method should accept and use the seed parameter to properly seed the environment, similar to how OpenAIEnvironment handles it with self.env.reset(seed=seed).
| self.env.reset() | |
| self.env.reset(seed=seed) |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The window name is created using the task name and seed, but if the config.display is not 1, the self.name attribute won't be set. This will cause an AttributeError when render() is called and tries to use self.name. Consider initializing self.name regardless of the display setting or adding a conditional check in the render method.
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grab_frame method assumes that if state.shape has 4 dimensions, it's RGB and takes the last 3 channels with self.state[-3:]. However, with envpool's frame stacking (stack_num=4), the observation shape will be (4, 84, 84) for grayscale Atari games. This means len(self.state.shape) == 3, not 4, so it would go to the else branch. The logic should check the actual channel dimension rather than the number of dimensions to distinguish between RGB and grayscale stacked frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluation parameter is inverted for
episodic_lifeandreward_clip. In Atari, these settings should be disabled during evaluation (set to False) for standard evaluation, and enabled during training (set to True). Currently, when evaluation=True, these are set to True, which is the opposite of the intended behavior.