You're making it more complicated to test your code by using some "unknown" environment, this is not very helpful.... Sorry ;( Using something like the gym "cartpole" would make it much easier...?