Property prediction model training fails when training data is much similar

### Summary

A  researcher trains a d33(piezoelectric coefficient) prediction model with 2 training data using property prediction head. The two data is perovskite-type(ABO3). The composition(Pb1000Ti340Nb400Mg140In120O3000) and the spatial arrangement of atoms of the two data are completely the same. The only difference between the two data is that Ti,Nb,Mg,In randomly occupy the B sites in 10✖️10✖️10 perovskite supercells. The two pictures shows the x-direction projection of B sites in the two data. I use `StructureMatcher` of `pymatgen` to confirm that the two structure can not match. The label of the two data is 411 and 587, respectively. 

<img width="698" height="652" alt="Image" src="https://github.com/user-attachments/assets/4b15074f-1e71-4f20-8cba-8d0e26e31c55" />

<img width="664" height="654" alt="Image" src="https://github.com/user-attachments/assets/4598c11d-eca7-4a96-80f1-37da551cb241" />

I use the settings of DPA-3.1-3M and train 1000 steps(500 epoch), but find there is no precision, the `dp test` result is
```
# d33_database/1 - 0: data_property pred_property
4.111499938964843750e+02 4.762894698179545685e+02
# d33_database/2 - 0: data_property pred_property
5.865300292968750000e+02 4.762894713410120744e+02
```
It seems that dp model can not distinguish the two similar structure.

The reproduce `input.json` and data file are:
[input.json](https://github.com/user-attachments/files/23020817/input.json)
[d33_database.tgz](https://github.com/user-attachments/files/23020837/d33_database.tgz)

### DeePMD-kit Version

3.1.0

### Backend and its version

pytorch

### Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Python version: 3.12.11
CUDA Version: 12.4

### Details

See above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Property prediction model training fails when training data is much similar #5021

Summary

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Property prediction model training fails when training data is much similar #5021

Description

Summary

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions