Thank you for your work!
I noticed that "PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving" mentioned "State-of-the-art end-to-end AV stacks use temporal information in inputs, and as a result have inferior performance in the first frame due to zero initialization of input features and ego vehicle’s state. AD-MLP addresses this by excluding the first two frames in their evaluation protocol".
I wonder that in the code, how AD-MLP achieve this, i.e., excluding the first two frames in the evaluation protocol?