Why DQV is on-policy？

I saw that DQV used samples sampled with the behavior policy (the epsilon-greedy policy), not the current policy (the greedy policy).  Why do you divide DQV into the on-policy methods?