Skip to content

ByteDance-Seed/bamboo_mixer

Repository files navigation

👋 Hi, everyone!
We are ByteDance Seed team.

You can get to know us better through the following channels👇

plot

A Predictive and Generative Solution for Liquid Electrolyte Formulation

We are extremely delighted to release Bamboo-Mixer. Bamboo-Mixer introduces a unified method to not only predict electrolyte properties accurately such as conductivity and anion ratio, but also allows conditional generation of electrolyte formulation based on target properties.

This innovation provides a generative workflow to design molecular mixture beyond electrolyte. The generated samples from the workflow are tested and proved further with experimental validation..

Updates

  • Feb 2025: Citation and README updated.
  • Jan 2025: Cleanup codebase for improved readability and better open source&data.
  • Aug 2025: First public release. Use this tagged version (v0.0.1) to reproduce the results reported in our manuscript.

Branches

  • paper: The first publicly released and archived branch with tagged version (v0.0.1). This branch matches the data and trained model checkpoint open-sourced on huggingface. This branch is recommended if you would like to use the pretrained checkpoint.
  • main: The current-and-future working branch, with significant cleanups made on the paper branch. This branch uses the data included in the data_oss folder and does not use anything from huggingface. This branch is recommended if you would like to modify the code, retrain the entire framework, or make pull-requests.

Data preparation

There are two different types of data: MonoData and FormulaData

  • MonoData: single molecule data with molecular property labels.
  • FormulaData: Molecular mixture data for liquid electrolytes with conductivity and anion ratio information.

More details about how to prepare data for different stages of training and inference can be found in Data Preparation Instructions.

Training

Predictive model (Three-stage training):

  1. Molecular pretraining: Multi-task learning of molecular property predictions.
  2. Computational pretraining: Prediction of formulation property from MD simulation.
  3. Experimental fine-tuning: Fine-tuning with experimental conductivity data.

Generative model (Dual-stage training):

  1. Conditional diffusion: conditional DDPM to generate electrolyte embedding.
  2. Decoder and distance matching: decode electrolyte embedding and obtain formulation.

See Training Instructions for more details.

Inference and Generation

Molecular property prediction (11 molecular properties):

  • Tm: melting point (K)
  • bp: boiling point (K)
  • nD: refractive index (unitless)
  • nD_liquid: refractive index (liquid, unitless)
  • pka_a: pKa (unitless)
  • pka_b: pKb (unitless)
  • dc: dielectric constant (unitless)
  • ST: surface tension (mN/m)
  • density: mass density (g/cm^3)
  • vis: viscosity (cP)
  • vapP: vapor pressure (Pa)

Predictive model (2 electrolyte properties):

  • Conductivity (mS/cm).
  • Anion ratio (unitless, between 0 and 1).

Generative model (3 generated files):

  • Electrolyte embedding.
  • BoM vector of formulation.
  • Corresponding conductivity and anion ratio.

See Inference Instructions for more details.

Data availability

The datasets and checkpoints can be found on HuggingFace: huggingface/bamboo_mixer

Contributing

We welcome contributions to Bamboo-Mixer! If you have suggestions or improvements, please refers to CONTRIBUTING.md

License

This project is licensed under the Apache License, Version 2.0.

Citation

If you find Bamboo-Mixer useful for your research and applications, feel free to give us a star ⭐ or cite us using:

@article{yang2025unifiedpredictivegenerativesolution,
	author = {Yang, Zhenze and Wu, Yifan and Han, Xu and Zhang, Ziqing and Lai, Haoen and Mu, Zhenliang and Zheng, Tianze and Liu, Siyuan and Pu, Zhichen and Wang, Zhi and Yu, Zhiao and Gong, Sheng and Yan, Wen},
	date = {2026/01/28},
	doi = {10.1038/s42256-025-01173-w},
	isbn = {2522-5839},
	journal = {Nature Machine Intelligence},
	title = {A unified predictive and generative solution for liquid electrolyte formulation},
	url = {https://doi.org/10.1038/s42256-025-01173-w},
	year = {2026}
}

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published