Skip to content

[Question] Does hyperparameter tuning support custom vectorized environments? #439

@antoinedang

Description

@antoinedang

❓ Question

Hello,
I have implemented a custom Vectorized Environment using Mujoco (which adheres to stable baseline 3's VecEnv standard), but I haven't found any evidence of RL Zoo 3 supporting (or not supporting) vectorized environments. When I pass my environment name in after registering it with OpenAI gym, RL Zoo 3 always tries to put a VecEnv wrapper on it (such as dummy or subprocenv) and it crashes with an error due to the fact that the interface for a normal Env is not the same as a VecEnv. I am wondering if there is a way (an argument I missed or some source code I could modify) such that I can directly pass in the name of a vectorized environment and RL Zoo 3 will skip the step of wrapping it in a DummyVecEnv and/or SubProcEnv wrapper. I've tried vec_env_wrapper argument in my hyperparameters config, setting env_wrapper to None, and many Google and source code searches but haven't found anything. It doesn't sound like RL Zoo 3 supports this out of the box, but I'm wondering if this is by choice, if I missed a section in the documentation or a past issue already raised, or if I can update the source code so it works for me? (I dont know much about the inner workings of RL Zoo 3, but it seems like an additional argument such as "is_env_vectorized" and an if statement would do the trick).

For context, my hyperparameter config is:

default_hyperparams = dict(
    policy = 'MlpPolicy',
    n_timesteps = 1e7,
    batch_size = 256,
    n_steps = 512,
    gamma = 0.95,
    learning_rate = 3.56987e-05,
    ent_coef = 0.00238306,
    clip_range = 0.3,
    n_epochs = 5,
    gae_lambda = 0.9,
    max_grad_norm = 2,
    vf_coef = 0.431892,
    policy_kwargs = dict(
                        log_std_init = -2,
                        ortho_init = False,
                        activation_fn = nn.ReLU,
                        net_arch = dict(pi=[256, 256], vf=[256, 256])
                    )
    )


hyperparams = {
    "GPUHumanoid": default_hyperparams
}

And I am calling the train.py script with arguments relevant to hyperparameter tuning with this script:

sys.argv = ["python", "-optimize",
                "--algo", "ppo",
                "--env", "GPUHumanoid",
                "--log-folder", "data/tuning_logs",
                "-n", "50000",
                "--n-trials", "1000",
                "--n-jobs", "2",
                "--sampler", "tpe",
                "--pruner", "median",
                "--env-kwargs", "num_envs:256",
                "--conf-file", "simulation.hyperparam_config"]
train()

The error code I get when I use the above arguments + hyperparameter config is:

/usr/local/lib/python3.10/dist-packages/gymnasium/utils/passive_env_checker.py:189: UserWarning: WARN: The result returned by `env.reset()` was not a tuple of the form `(obs, info)`, where `obs` is a observation and `info` is a dictionary containing additional information. Actual type: `<class 'numpy.ndarray'>`
  logger.warn(
too many values to unpack (expected 2)

It seems like its expecting the Env interface, and not VecEnv, but I can see from the source code that Envs are wrapped in a DummyVecEnv after being gym.make()'d.

Is there something I am missing? Any help would be greatly appreciated!

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions