In the code for many of the decoder models, you have self.agvpool = nn.AdaptiveAvgPool2d((1,1)) (for example, in the builder/models/detector_models/resnet_dilation_lstm.py file at Line 119), which, if I understand it correctly, averages out the output from the CNN module at channel and time dimension so that the output is 1-by-1 at those two dimensions. I can understand that for the channel dimension, but not for the time-series dimension. As this output will then be sent to the LSTM module, the whole point of which is to process time-series signals. If it is of length 1 at time dimension, then why does it need an LSTM? Am I missing anything?