From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

We propose a novel training paradigm for multimodal learning through our BPE Image Tokenizer, which applies the principles of text tokenization to visual data. Unlike conventional approaches, our method directly incorporates structural prior information into image tokens, enabling Transformer models to better learn across modalities. Our Being-VL-0 model demonstrates how this paradigm effectively bridges the gap between visual and textual representation learning.

For more details, please refer to our paper: From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities (ICLR'25).

News

[2025-06-26]: 🔥🔥 We publish Being-VL-0.5, which is accepted by ICCV 2025! Check our paper here. The code is migrated to the new repository: BeingBeyond/Being-VL-0.5.
[2025-01-23]: 🎉🎉 Being-VL-0 is accepted by ICLR 2025! Check our paper here.
[2024-10-03]: We publish Being-VL-0, the first version of Being-VL series.

Code

We have refactored the codebase to be more modular and user-friendly. The new repository is available at BeingBeyond/Being-VL-0.5.

Citation

If you find our work useful, please consider citing us and give a star to our repository! 🌟🌟🌟

Being-VL-0.5

@inproceedings{zhang2025beingvl05,
  title={Unified Multimodal Understanding via Byte-Pair Visual Encoding},
  author={Zhang, Wanpeng and Feng, Yicheng and Luo, Hao and Li, Yijiang and Yue, Zihao and Zheng, Sipeng and Lu, Zongqing},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

Being-VL-0

@inproceedings{zhang2025beingvl0,
  title={From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities},
  author={Zhang, Wanpeng and Xie, Zilong and Feng, Yicheng and Li, Yijiang and Xing, Xingrun and Zheng, Sipeng and Lu, Zongqing},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=3TnLGGHhNx}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs/images		docs/images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

News

Code

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities

News

Code

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages