Skip to content
This repository was archived by the owner on Jul 20, 2025. It is now read-only.

Commit fe8857c

Browse files
committed
Update README and Python requirements
1 parent 8d01f10 commit fe8857c

File tree

2 files changed

+31
-15
lines changed

2 files changed

+31
-15
lines changed

README.md

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1-
Data-driven predictions from the crystalline structure
1+
Data-driven predictions: from crystal structure to physical properties and vice versa
22
======
33

4+
[![DOI](https://zenodo.org/badge/110734326.svg)](https://zenodo.org/badge/latestdoi/110734326)
5+
46
![Materials simulations ab datum](https://raw.githubusercontent.com/mpds-io/mpds-ml-labs/master/crystallographer_mpds_cc_by_40.png "Materials simulation ab datum")
57

6-
Live demo
8+
9+
Live demos
710
------
811

9-
[mpds.io/ml](https://mpds.io/ml)
12+
[mpds.io/ml](https://mpds.io/ml) and [mpds.io/materials-design](https://mpds.io/materials-design)
13+
1014

1115
Rationale
1216
------
@@ -22,6 +26,9 @@ This is the proof of concept, how a relatively unsophisticated statistical model
2226
- linear thermal expansion coefficient
2327
- band gap (or its absense, _i.e._ whether a crystal is conductor or insulator)
2428

29+
Further, a reverse task of predicting the possible crystalline structure from a set of given properties is solved. The suitable chemical elements are found, and the resulted structure is generated (if possible) based on the available MPDS prototypes.
30+
31+
2532
Installation
2633
------
2734

@@ -33,17 +40,19 @@ cd REPO_FOLDER
3340
pip install -r requirements.txt
3441
```
3542

36-
Currently only *Python 2* is supported (*Python 3* support is coming).
43+
Currently only *Python 2* is supported (*Python 3* support is almost there).
44+
3745

3846
Preparation
3947
------
4048

4149
The model is trained on the MPDS data using the MPDS API and the scripts `train_regressor.py` and `train_classifier.py`. Some subset of the full MPDS data is opened and possible to obtain via MPDS API [for free](https://mpds.io/open-data-api).
4250

51+
4352
Architecture and usage
4453
------
4554

46-
Can be used either as a standalone command-line application or as a client-server application. In the latter case, the client and the server communicate over HTTP, and any client able to execute HTTP requests is supported, be it a `curl` command-line client or rich web-browser user interface. As an example of the latter, a simple HTML5 app `index.html` is supplied in the `webassets` folder. Server part is a Flask app:
55+
Can be used either as a standalone command-line application or as a client-server application. In the latter case, the client and the server communicate over HTTP, and any client able to execute HTTP requests is supported, be it a `curl` command-line client or rich web-browser user interface. For example, the simple HTML5 apps `props.html` and `design.html` are supplied in the `webassets` folder. Server part is a Flask app:
4756

4857
```python
4958
python mpds_ml_labs/app.py
@@ -57,7 +66,10 @@ Used descriptor and model details
5766

5867
The term _descriptor_ stands for the compact information-rich representation, allowing the convenient mathematical treatment of the encoded complex data (_i.e._ crystalline structure). Any crystalline structure is populated to a certain relatively big fixed volume of minimum one cubic nanometer. Then the descriptor is constructed using the periodic numbers of atoms and the lengths of their radius-vectors. The details are in the file `mpds_ml_labs/prediction.py`.
5968

60-
As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the _mean absolute error_ and _R2 coefficient of determination_ is saved. In order to estimate the prediction quality of the binary _classifier_ model, the _fraction incorrect_ (_i.e._ _error percentage_) is saved. The evaluation process is repeated at least 30 times to achieve a statistical reliability.
69+
As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the _mean absolute error_ and _R2 coefficient of determination_ is saved. In order to estimate the prediction quality of the binary _classifier_ model, the _fraction incorrect_ (_i.e._ the _error percentage_) is saved. The evaluation process is repeated at least 30 times to achieve a statistical reliability.
70+
71+
For generating the crystal structure from the physical properties, see `mpds_ml_labs/test_design.py`.
72+
6173

6274
API
6375
------
@@ -66,18 +78,21 @@ At the local server:
6678

6779
```shell
6880
curl -XPOST http://localhost:5000/predict -d "structure=data_in_CIF_or_POSCAR"
81+
curl -XPOST http://localhost:5000/design -d "numerics=ranges_of_values_of_the_8_properties_in_JSON"
6982
```
7083

71-
At the demonstration Tilde server (may be switched off):
84+
At the demonstration MPDS server (may be switched off):
7285

7386
```shell
74-
curl -XPOST https://tilde.pro/services/predict -d "structure=data_in_CIF_or_POSCAR"
87+
curl -XPOST https://labs.mpds.io/predict -d "structure=data_in_CIF_or_POSCAR"
88+
curl -XPOST https://labs.mpds.io/design -d "numerics=ranges_of_values_of_the_8_properties_in_JSON"
7589
```
7690

91+
7792
Credits
7893
------
7994

80-
This project is built on top of the following open-source scientific software:
95+
This project is built on top of the open-source scientific software, such as:
8196

8297
- [scikit-learn](http://scikit-learn.org)
8398
- [pandas](https://pandas.pydata.org)
@@ -87,17 +102,17 @@ This project is built on top of the following open-source scientific software:
87102
- [cifplayer](http://tilde-lab.github.io/player.html)
88103
- [MPDS API client](http://developer.mpds.io)
89104

105+
90106
License
91107
------
92108

93109
- The client and the server code: *LGPL-2.1+*
94-
- The [open part](https://mpds.io/open-data-api) of the MPDS data (5%): *CC BY 4.0*
95-
- The closed part of the MPDS data (95%): *commercial*
110+
- The machine-learning MPDS data generated as presented here: *CC BY 4.0*
111+
- The [open part](https://mpds.io/open-data-api) of the experimental MPDS data (5%): *CC BY 4.0*
112+
- The closed part of the experimental MPDS data (95%): *commercial*
113+
96114

97115
Citation
98116
------
99117

100-
[![DOI](https://zenodo.org/badge/110734326.svg)](https://zenodo.org/badge/latestdoi/110734326)
101-
102-
Also please feel free to cite:
103-
- Blokhin E, Villars P, PAULING FILE and MPDS materials data infrastructure, in preparation, **2018**
118+
- Blokhin E, Villars P, Quantitative trends in physical properties of inorganic compounds via machine learning, [arXiv](https://arxiv.org/abs/1806.03553), **2018**

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ pycodcif == 0.8.9
66
spglib == 1.9.9
77
pandas
88
sklearn
9+
imblearn
910
mpds_client
1011
progressbar
1112
pg8000

0 commit comments

Comments
 (0)