I would like to see if the PPO method is better than the Decision Transformer method of learning to maximize reward.