Add LSTMDoubleFit model for low-dimensional perovskite design #205

Wei-jie-Wu · 2025-11-13T12:31:42Z

✅ Description

📘 Overview

This PR contributes the Feature-Guided Inverse Design (LSTMDoubleFit) model for the inverse design of organic A-site cations in low-dimensional perovskites.
The project integrates descriptor calculation, LSTM-based generative learning, and feature-constrained molecular optimization into a unified Paddle-based workflow.

This work reproduces and extends the study:

Feature-Guided Inverse Design of Organic A-Site Cations for Perovskite Dimensional Engineering, Wei-jie Wu et al., 2025.

🧠 Model Workflow

Descriptor Calculation (Cal.py)
- Calculates molecular descriptors (e.g., ATSC1pe, MATS2c, SlogP_VSA2) from input SMILES.
- Results are stored in CSV files under Modeldata/.
Dataset Preparation
- Before training, merge all CSV files under the Modeldata/ directory into a single dataset:
```
cat Modeldata/*.csv > Modeldata.csv
```
  The merged file Modeldata.csv will serve as the unified training dataset.
Model Training and Generation (Best_Seq2seq.py)
- Implements an LSTM-based sequence-to-sequence model for SMILES reconstruction and generation.
- Inputs: one-hot encoded SMILES sequences + three physicochemical descriptors.
- Outputs: property-conditioned SMILES sequences (new organic cations).
Feature-Guided DoubleFit Model (MolecularDoubleFitting.py)
- Performs secondary regression to enforce property–structure consistency.
- Refines generated molecules according to target perovskite dimensional features.
Postprocessing
- Generated molecules are filtered, ranked, and optionally validated through structural optimization workflows.

📁 Directory Structure

project/
└── Feature-Guided Inverse Design of LDPs/
├── Best_Seq2seq.py # Main LSTM model: training & molecular generation
├── Cal_ATSC1pe_MATS2c.py # Descriptor calculator (ATSC1pe, MATS2c)
├── Cal_SlogP_VSA2.py # Descriptor calculator (SlogP_VSA2)
├── MolecularDoubleFitting.py # Feature-guided molecular fitting model
├── MSEcalculation.py # Evaluation metrics
├── ModelandDataAnalysis.py # Dataset statistics & analysis
├── Modeldata/ # Folder containing split CSV datasets
├── GreatMolecular.xlsx # High-quality generated molecules
├── NewMolecules.xlsx # Newly generated candidates
├── README.md # Project documentation
└── data_parts/ # (Optional) Split dataset parts (<100 MB each)

⚙️ How to Run

1. Environment

pip install paddlepaddle scikit-learn pandas numpy tqdm rdkit
2. Prepare dataset
Merge CSV files in Modeldata/ into a single file:
cat Modeldata/*.csv > Modeldata.csv
3. Train and generate molecules
python Best_Seq2seq.py
4. Feature-guided molecular refinement
python MolecularDoubleFitting.py
📊 Dataset Note
The full dataset (~200 MB) was split into smaller CSV files under Modeldata/
to comply with GitHub’s 100MB per-file limit.
They must be merged before training as described above.
🚀 Results
LSTM reconstruction accuracy: >95%
Enhanced novelty and property diversity in generated cations
Generated organic A-site cations exhibit favorable dimensional preferences for RP- and DJ-type perovskites.
💡 Key Contributions
DoubleFit Learning Mechanism: Joint optimization of molecular structure and descriptor features.
Feature-Constrained Generation: Enables directionally controlled molecular design.
Descriptor-Integrated Workflow: Fully compatible with PaddlePaddle for training and inference.
🧑‍💻 Author
Weijie Wu
South China Normal University

update code from origin repo

move data_utils.py to dataset/utils.py delete useless code

fix bug

* fix: fix chgnet model download link * fix: set nan to 0

* feat: add task readme * fix error * update logo

* fix: update reshape * fix: fix

* feat: add task readme * fix error * update logo * Add files via upload * Update README.md * Add files via upload * Update README.md

* feat: add task readme * fix error * update logo * Add files via upload * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * Delete docs/paddlematerial_overview_en.png * Delete docs/paddlematerial_overview_ch.png

* feat: add task readme * fix error * update logo * Add files via upload * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * Delete docs/paddlematerial_overview_en.png * Delete docs/paddlematerial_overview_ch.png * Delete docs/logo_ppmat.png * Delete docs/ppmat_overview_en.png * Add files via upload * Update README.md * Update README.md * Update README.md * fix conflict

* feat: add task readme * fix error * update logo * Add files via upload * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * Add files via upload * Update README.md * Delete docs/paddlematerial_overview_en.png * Delete docs/paddlematerial_overview_ch.png * Delete docs/logo_ppmat.png * Delete docs/ppmat_overview_en.png * Add files via upload * Update README.md * Update README.md * Update README.md * fix conflict * fix words error

* Update README.md * Update README.md

* matbench_dataset * 训练文件 * Delete megnet_matbench_bulk_modulus_t_20250731_041800_s_42 directory * Delete megnet_matbench_shear_modulus_t_20250731_041740_s_42 directory * matbench数据集适配 * 修改PR * jarvis数据集适配 * megnet_readme修改 * 修改requirements，修改jarvis_dataset

* add DiffNMR * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs of diffprior * fix bug * fix bugs

…addlePaddle#196) support metax

…set name=alex_mp_20 for mattergen training with alex_mp20 dataset. (PaddlePaddle#200) * fix diffnmr model and config. * fix AlexMP20MatterGenDataset name=alex_mp_20 for mattergen training with alex_mp20 dataset.

paddle-bot · 2025-11-13T12:31:49Z

Thanks for your contribution!

CLAassistant · 2025-11-13T12:31:55Z

All committers have signed the CLA.

leeleolay · 2025-11-18T02:49:41Z

Thanks for your contribution!
Please fetch the newest version repo codes and pull your codes.We recommend to use the ppmat architecture to fit your model. If these is some problem of adaption, please contact us!

leeleolay

please revise this PR

leeleolay

please revise this PR

zhiminzhang0830 and others added 30 commits July 23, 2024 09:21

update

1d675e9

update readme

0d019b8

update config

801973f

update

8c6f81c

del pdb

b758057

update config

eed6358

update readme

7f23f6d

add io code

beee69c

del useless code and add prediction_keys

9cbcea7

update main

60386f5

update megnet 3d pretrain cfg

96de807

update 2d config

f6b3124

update

ee840a6

add old config

7beb17b

add result for 3d exp

5b62e30

update readme

3ed118d

update readme

bebdc95

Initial commit

f628de1

Merge remote-tracking branch 'pp4mat/master' into develop

717d7a4

Merge pull request PaddlePaddle#1 from zhiminzhang0830/develop

511bbbd

update code from origin repo

feat: atom type diffusion with d3pm

854babe

chore: delete useless code

fb57507

chore: move data utils to dataset

d880928

move data_utils.py to dataset/utils.py delete useless code

chore: move some function to utils

7e9aa60

chore: rename diff_utils.py to noise_schedule.py

5776255

chore: move time embedding to time_embedding.py file

1a350a9

chore: replace MAX_ATOMIC_NUM with num_classes

ee0465e

fix: change the atomic type ID to start from 0

6646796

chore: delete useless code

a82cd26

refactor: refactor time generation

9c9a361

leeleolay and others added 23 commits July 5, 2025 20:06

Update Install_cn.md

8ddfba0

fix bug

Update README.md

9172dc6

Update README.md

a44ac6d

Update README.md

a1a601b

feat: add ml2ddb (PaddlePaddle#173)

1b5ef0b

feat: add partner (PaddlePaddle#174)

2bc9f50

fix: fix chgnet model download link (PaddlePaddle#175)

50ea8ef

fix: set nan to 0 (PaddlePaddle#177)

b4ac426

* fix: fix chgnet model download link * fix: set nan to 0

update logo (PaddlePaddle#178)

d6d7d97

* feat: add task readme * fix error * update logo

fix: update reshape (PaddlePaddle#179)

f7052aa

* fix: update reshape * fix: fix

update readme (PaddlePaddle#180)

9879800

* feat: add task readme * fix error * update logo * Add files via upload * Update README.md * Add files via upload * Update README.md

update readme (PaddlePaddle#187)

672be93

* Update README.md * Update README.md

integrate DiffNMR and fix bugs (PaddlePaddle#189)

663316d

* add DiffNMR * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs of diffprior * fix bug * fix bugs

fix (PaddlePaddle#192)

86dc396

set output data type as 'int32' of paddle.cumsum for CHGNet model. (P…

ab2c8ef

…addlePaddle#196) support metax

update overview map and metax support info (PaddlePaddle#201)

2356071

Update jarvis_dataset.py (PaddlePaddle#203)

b0b5116

Add Feature-Guided Inverse Design of LDPs model under project directory

6384e31

paddle-bot bot added the contributor External developers label Nov 13, 2025

leeleolay added the non-compeleted need to revise label Dec 7, 2025

leeleolay reviewed Dec 7, 2025

View reviewed changes

leeleolay requested changes Dec 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LSTMDoubleFit model for low-dimensional perovskite design #205

Add LSTMDoubleFit model for low-dimensional perovskite design #205

Uh oh!

Wei-jie-Wu commented Nov 13, 2025

Uh oh!

paddle-bot bot commented Nov 13, 2025

Uh oh!

CLAassistant commented Nov 13, 2025 •

edited

Loading

Uh oh!

leeleolay commented Nov 18, 2025 •

edited

Loading

Uh oh!

leeleolay left a comment

Uh oh!

leeleolay left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Add LSTMDoubleFit model for low-dimensional perovskite design #205

Are you sure you want to change the base?

Add LSTMDoubleFit model for low-dimensional perovskite design #205

Uh oh!

Conversation

Wei-jie-Wu commented Nov 13, 2025

📘 Overview

🧠 Model Workflow

📁 Directory Structure

⚙️ How to Run

1. Environment

Uh oh!

paddle-bot bot commented Nov 13, 2025

Uh oh!

CLAassistant commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leeleolay commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leeleolay left a comment

Choose a reason for hiding this comment

Uh oh!

leeleolay left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

CLAassistant commented Nov 13, 2025 •

edited

Loading

leeleolay commented Nov 18, 2025 •

edited

Loading