Skip to content

agentic-learning-ai-lab/midway-network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics

This project hosts the code for implementing the Midway Network (ICLR 2026) architecture for self-supervised learning of visual representations for recognition and motion from videos.

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
Christopher Hoang, Mengye Ren
International Conference on Learning Representations 2026
arXiv (arXiv 2510.05558)

Pretrained models

data backbone epochs download
BDD100K 300 ViT-S full checkpoint configs
BDD100K 300 ViT-B full checkpoint configs
Walking Tours Venice 100 ViT-S full checkpoint configs

Code Structure

.
├── configs                   # directory in which all experiment '.yaml' configs are stored
├── src                       # the package
│   ├── main.py               #   main training loop for midway network
│   ├── midway.py             #   model definition
│   ├── utils.py              #   shared utilities
│   ├── vision_transformer.py #   encoder definition
│   └── datasets              #   datasets, data loaders
└── main.py                   # entrypoint to launch Midway Network pre-training locally or on SLURM cluster

Config files: Note that all experiment parameters are specified in config files (as opposed to command-line-arguments). See the configs/exp directory for example config files.

Launching Midway Network pre-training

submit.py is an entrypoint script for launching experiments with submitit and hydra. The actual implementation is in src/main.py, which parses the experiment config file and runs Midway Network pre-training.

Training

Here is an example of how to run Midway Network ViT-S WT-Venice pre-training on a local, 2 GPU machine with config configs/exp/midway_wt_venice.yaml:

torchrun --standalone --nnodes=1 --nproc-per-node=2 submit.py \
    compute=local \
    exp=midway_bdd \
    name='midway-wt-venice-local'

Here is an example of how to run Midway Network ViT-B BDD pre-training on a SLURM cluster with 2 GPUs with config configs/exp/midway_bdd_vit_base.yaml:

python submit.py \
    compute/greene=2x1 compute/greene/node=ah \
    compute.timeout=1700 \
    compute.cpus_per_task=20 \
    exp=midway_bdd_vit_base \
    name='midway-bdd-vit-b-slurm'

See the scripts directory for example scripts. Note: Use scripts/decode_walking_tours.py to extract a Walking Tours (or any other) video into PNG frames so that it is compatiable with the decoded_walking_tours dataloader.

Evaluation

We use MMSegmentation to evaluate on semantic segmentation and the evaluation setup in CroCo v2 to evaluate on optical flow.


Requirements

  • Python 3.10 (or newer)
  • PyTorch 2.2.0
  • torchvision 0.17.1 (build from source, for video_reader)
  • ffmpeg 5.1.2 (from conda-forge, for video_reader)
  • Other dependencies: decord, ffprobe-python, flow-vis, hydra-core, numpy, scipy, timm==0.3.2, wandb

Importing this version of timm will raise an import error, see here for a fix.

We provide an example environment.yaml file.


License

See the LICENSE file for details about the license under which this code is made available.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation:

@inproceedings{hoang:2026:midway-network,
  title={Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics}, 
    author={Chris Hoang and Mengye Ren},
  booktitle={International Conference on Learning Representations},  
  year={2026}
}

About

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors