Using a pretrained NN as part of the environment #222

Balint-H · 2025-09-30T15:36:26Z

Balint-H
Sep 30, 2025

Hello!

I'm seeking general advice for using a fixed-parameter, pre-trained neural network as part of a playground environment, e.g. a feature extractor that processes the state for observations, or an action post-processor, etc.

I tried this by loading the parameters into a flax model in the environments init. I the also stored the apply() of the model as an environment member variable. I figured this would let me use it within step.

Even though I'm using a tiny NN, performing inference with a fixed NN inside step leads to silent crashes (no error message, just return code 9 or 1, WSL crashes completely, and I have to reboot). Are there any working examples of this approach in playground/brax environments online?

If not, I can create a minimal reproduction of my crashes and discuss it here.

System info:
Running on WSL
Training with brax PPO
python 3.10
brax 0.13.0
jax 0.6.2
jax-cuda12-pjrt 0.6.2
jax-cuda12-plugin 0.6.2
jaxlib 0.6.2
jaxopt 0.8.5
msgpack 1.1.1
mujoco 3.3.5
mujoco-mjx 3.3.5
nvidia-cublas-cu12 12.9.1.4
nvidia-cuda-cupti-cu12 12.9.79
nvidia-cuda-nvcc-cu12 12.9.86
nvidia-cuda-nvrtc-cu12 12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12 9.13.0.50
nvidia-cufft-cu12 11.4.1.4
nvidia-cusolver-cu12 11.7.5.82
nvidia-cusparse-cu12 12.5.10.65
nvidia-nccl-cu12 2.28.3
nvidia-nvjitlink-cu12 12.9.86
nvidia-nvshmem-cu12 3.4.5
opt-einsum 3.4.0
optax 0.2.5
playground 0.0.5

btaba · 2025-10-02T16:54:39Z

btaba
Oct 2, 2025
Maintainer

I haven't tried this before, but it's not clear how adding the network to the env will hook-up to training. Does a feature extractor or action post-processor need to be part of the environment?

Code pointer in brax https://github.com/google/brax/blob/ab34392416af8a40045934a0ee02206babd34857/brax/training/networks.py#L368

0 replies

Balint-H · 2025-10-09T14:34:48Z

Balint-H
Oct 9, 2025
Author

The benefit of integrating it to the environment would be that you don't need to alter/reimplement the RL training scripts. Since these NNs would be frozen/not trained, then there is no need to expose their parameters beyond the environment.

As usual, once I start to make a MRE the issue changes. With small MLP type networks this approach seems to work easily without issues! I'll share my not-quite minimal UV project in case someone wants to try something similar:
https://github.com/Balint-H/nn_env

The issue persists with convolutional layers baked in the env. I'll keep investigating and give an update here if I figure it out!

1 reply

Balint-H Oct 13, 2025
Author

Conv nets seem to work with small and medium envs. It really starting to look like it could be an OOM error, except I don't get the usual warnings or error messages but a crash.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using a pretrained NN as part of the environment #222

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using a pretrained NN as part of the environment #222

Uh oh!

Uh oh!

Balint-H Sep 30, 2025

Replies: 2 comments · 1 reply

Uh oh!

btaba Oct 2, 2025 Maintainer

Uh oh!

Uh oh!

Balint-H Oct 9, 2025 Author

Uh oh!

Balint-H Oct 13, 2025 Author

Balint-H
Sep 30, 2025

Replies: 2 comments 1 reply

btaba
Oct 2, 2025
Maintainer

Balint-H
Oct 9, 2025
Author

Balint-H Oct 13, 2025
Author