Hi,
Thanks for providing this helpful codebase. I am trying to use imagenet video dataset also, and I might have to implement a python version since I use python in my project. I do not understand the details of this codebase. Will the code generate image chips with the objects in their centers, and will the generated chips be fixed in their sizes?