Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics

Paper | Website

This project hosts the code for implementing the Midway Network (ICLR 2026) architecture for self-supervised learning of visual representations for recognition and motion from videos.

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics
Christopher Hoang, Mengye Ren
International Conference on Learning Representations 2026
arXiv (arXiv 2510.05558)

Pretrained models

data	backbone	epochs	download
BDD100K	300	ViT-S	full checkpoint	configs
BDD100K	300	ViT-B	full checkpoint	configs
Walking Tours Venice	100	ViT-S	full checkpoint	configs

Code Structure

.
├── configs                   # directory in which all experiment '.yaml' configs are stored
├── src                       # the package
│   ├── main.py               #   main training loop for midway network
│   ├── midway.py             #   model definition
│   ├── utils.py              #   shared utilities
│   ├── vision_transformer.py #   encoder definition
│   └── datasets              #   datasets, data loaders
└── main.py                   # entrypoint to launch Midway Network pre-training locally or on SLURM cluster

Config files: Note that all experiment parameters are specified in config files (as opposed to command-line-arguments). See the configs/exp directory for example config files.

Launching Midway Network pre-training

submit.py is an entrypoint script for launching experiments with submitit and hydra. The actual implementation is in src/main.py, which parses the experiment config file and runs Midway Network pre-training.

Training

Here is an example of how to run Midway Network ViT-S WT-Venice pre-training on a local, 2 GPU machine with config configs/exp/midway_wt_venice.yaml:

torchrun --standalone --nnodes=1 --nproc-per-node=2 submit.py \
    compute=local \
    exp=midway_bdd \
    name='midway-wt-venice-local'

Here is an example of how to run Midway Network ViT-B BDD pre-training on a SLURM cluster with 2 GPUs with config configs/exp/midway_bdd_vit_base.yaml:

python submit.py \
    compute/greene=2x1 compute/greene/node=ah \
    compute.timeout=1700 \
    compute.cpus_per_task=20 \
    exp=midway_bdd_vit_base \
    name='midway-bdd-vit-b-slurm'

See the scripts directory for example scripts. Note: Use scripts/decode_walking_tours.py to extract a Walking Tours (or any other) video into PNG frames so that it is compatiable with the decoded_walking_tours dataloader.

Evaluation

We use MMSegmentation to evaluate on semantic segmentation and the evaluation setup in CroCo v2 to evaluate on optical flow.

Requirements

Python 3.10 (or newer)
PyTorch 2.2.0
torchvision 0.17.1 (build from source, for video_reader)
ffmpeg 5.1.2 (from conda-forge, for video_reader)
Other dependencies: decord, ffprobe-python, flow-vis, hydra-core, numpy, scipy, timm==0.3.2, wandb

Importing this version of timm will raise an import error, see here for a fix.

We provide an example environment.yaml file.

License

See the LICENSE file for details about the license under which this code is made available.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation:

@inproceedings{hoang:2026:midway-network,
  title={Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics}, 
    author={Chris Hoang and Mengye Ren},
  booktitle={International Conference on Learning Representations},  
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics

Paper | Website

Pretrained models

Code Structure

Launching Midway Network pre-training

Training

Evaluation

Requirements

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
submit.py		submit.py

Folders and files

Latest commit

History

Repository files navigation

Midway Network: Learning Representations for Recognition and Motion from Latent Dynamics

Paper | Website

Pretrained models

Code Structure

Launching Midway Network pre-training

Training

Evaluation

Requirements

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages