Thanks for your nice work!
As far as I know, the original paper of the NExT-GQA dataset mentioned that the training set of the NExT-GQA dataset does not contain the annotations for temporal grounding. Could you please tell me if your work uses the validation and test sets of this dataset? Or does this dataset have a version where the training set also includes the annotations for temporal grounding?