You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 20, 2025. It is now read-only.

5
7
6
-
Live demo
8
+
9
+
Live demos
7
10
------
8
11
9
-
[mpds.io/ml](https://mpds.io/ml)
12
+
[mpds.io/ml](https://mpds.io/ml) and [mpds.io/materials-design](https://mpds.io/materials-design)
13
+
10
14
11
15
Rationale
12
16
------
@@ -22,6 +26,9 @@ This is the proof of concept, how a relatively unsophisticated statistical model
22
26
- linear thermal expansion coefficient
23
27
- band gap (or its absense, _i.e._ whether a crystal is conductor or insulator)
24
28
29
+
Further, a reverse task of predicting the possible crystalline structure from a set of given properties is solved. The suitable chemical elements are found, and the resulted structure is generated (if possible) based on the available MPDS prototypes.
30
+
31
+
25
32
Installation
26
33
------
27
34
@@ -33,17 +40,19 @@ cd REPO_FOLDER
33
40
pip install -r requirements.txt
34
41
```
35
42
36
-
Currently only *Python 2* is supported (*Python 3* support is coming).
43
+
Currently only *Python 2* is supported (*Python 3* support is almost there).
44
+
37
45
38
46
Preparation
39
47
------
40
48
41
49
The model is trained on the MPDS data using the MPDS API and the scripts `train_regressor.py` and `train_classifier.py`. Some subset of the full MPDS data is opened and possible to obtain via MPDS API [for free](https://mpds.io/open-data-api).
42
50
51
+
43
52
Architecture and usage
44
53
------
45
54
46
-
Can be used either as a standalone command-line application or as a client-server application. In the latter case, the client and the server communicate over HTTP, and any client able to execute HTTP requests is supported, be it a `curl` command-line client or rich web-browser user interface. As an example of the latter, a simple HTML5 app `index.html`is supplied in the `webassets` folder. Server part is a Flask app:
55
+
Can be used either as a standalone command-line application or as a client-server application. In the latter case, the client and the server communicate over HTTP, and any client able to execute HTTP requests is supported, be it a `curl` command-line client or rich web-browser user interface. For example, the simple HTML5 apps `props.html`and `design.html` are supplied in the `webassets` folder. Server part is a Flask app:
47
56
48
57
```python
49
58
python mpds_ml_labs/app.py
@@ -57,7 +66,10 @@ Used descriptor and model details
57
66
58
67
The term _descriptor_ stands for the compact information-rich representation, allowing the convenient mathematical treatment of the encoded complex data (_i.e._ crystalline structure). Any crystalline structure is populated to a certain relatively big fixed volume of minimum one cubic nanometer. Then the descriptor is constructed using the periodic numbers of atoms and the lengths of their radius-vectors. The details are in the file `mpds_ml_labs/prediction.py`.
59
68
60
-
As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the _mean absolute error_ and _R2 coefficient of determination_ is saved. In order to estimate the prediction quality of the binary _classifier_ model, the _fraction incorrect_ (_i.e.__error percentage_) is saved. The evaluation process is repeated at least 30 times to achieve a statistical reliability.
69
+
As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the _mean absolute error_ and _R2 coefficient of determination_ is saved. In order to estimate the prediction quality of the binary _classifier_ model, the _fraction incorrect_ (_i.e._ the _error percentage_) is saved. The evaluation process is repeated at least 30 times to achieve a statistical reliability.
70
+
71
+
For generating the crystal structure from the physical properties, see `mpds_ml_labs/test_design.py`.
- Blokhin E, Villars P, PAULING FILE and MPDS materials data infrastructure, in preparation, **2018**
118
+
- Blokhin E, Villars P, Quantitative trends in physical properties of inorganic compounds via machine learning, [arXiv](https://arxiv.org/abs/1806.03553), **2018**
0 commit comments