About the computation of Advantage and State Value in PPO 

In your implementation of Critic, you feed the network of the observation and action and output 1-dim value. Can I make the inference that It is Q(s,a) ?
But the advantage you given is
`values = self.critic_target(states_var, actions_var).detach()
 advantages = rewards_var - values`
It is the estimation of q_t minus Q(s_t,a)
I think it should be Advantage = q_t - V(s_t)