Skip to content
This repository was archived by the owner on Jul 20, 2025. It is now read-only.

Commit 3d73374

Browse files
committed
Update README
1 parent 47a302e commit 3d73374

File tree

1 file changed

+48
-12
lines changed

1 file changed

+48
-12
lines changed

README.md

Lines changed: 48 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,22 +43,28 @@ pip install -r requirements.txt
4343
Currently only *Python 2* is supported (*Python 3* support is almost there).
4444

4545

46-
Preparation
46+
Preparation for work
4747
------
4848

49-
The model is trained on the MPDS data using the MPDS API and the scripts `train_regressor.py` and `train_classifier.py`. Some subset of the full MPDS data is opened and possible to obtain via MPDS API [for free](https://mpds.io/open-data-api).
49+
The model is trained on the MPDS data using the MPDS API and the scripts `train_regressor.py` and `train_classifier.py`. Some subset of the full MPDS data is opened and possible to obtain via MPDS API [for free](https://mpds.io/open-data-api). If the training is performed on the limited (_e.g._ opened) data subset, the scripts must be modified to make queries accordingly. The MPDS API returns an HTTP error code `402` if a user's request is authenticated, but not authorized. See a [full list of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes).
50+
51+
The code tries to use the settings exemplified in a template:
52+
53+
```shell
54+
cp data/settings.ini.sample data/settings.ini
55+
```
5056

5157

5258
Architecture and usage
5359
------
5460

55-
Can be used either as a standalone command-line application or as a client-server application. In the latter case, the client and the server communicate over HTTP, and any client able to execute HTTP requests is supported, be it a `curl` command-line client or rich web-browser user interface. For example, the simple HTML5 apps `props.html` and `design.html` are supplied in the `webassets` folder. Server part is a Flask app:
61+
The code can be used either as a *standalone command-line* application or as a *client-server* application.
5662

57-
```python
58-
python mpds_ml_labs/app.py
59-
```
63+
Examples of the *standalone command-line* architecture are the scripts `mpds_ml_labs/test_props_cmd.py` and `mpds_ml_labs/test_design_cmd.py`.
6064

61-
Web-browser user interface is then available under `http://localhost:5000`. By default, to serve the requests the development Flask server is used. Therefore an _AS-IS_ deployment in an online environment without the suitable WSGI container is **highly discouraged**. For the production environments under the high load it is recommended to use something like [TensorFlow Serving](https://www.tensorflow.org/serving).
65+
In the case of the *client-server* architecture, the client and the server communicate over HTTP using a simple API, and any client able to execute HTTP requests is supported, be it a `curl` command-line client, a Python script or the rich web-browser user interface. Examples of the Python scripts are `mpds_ml_labs/test_props_client.py` and `mpds_ml_labs/test_design_client.py`.
66+
67+
Server part is a Flask app `mpds_ml_labs/app.py`. The simple HTML5 client apps `props.html` and `design.html`, supplied in the `webassets` folder, are served by a Flask app under `http://localhost:5000`. By default, to serve the requests the development Flask server is used. Therefore an _AS-IS_ deployment in an online environment without the suitable WSGI container is **highly discouraged**. For the production environments under the high load it is recommended to use something like [TensorFlow Serving](https://www.tensorflow.org/serving).
6268

6369

6470
Used descriptor and model details
@@ -68,24 +74,54 @@ The term _descriptor_ stands for the compact information-rich representation, al
6874

6975
As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the _mean absolute error_ and _R2 coefficient of determination_ is saved. In order to estimate the prediction quality of the binary _classifier_ model, the _fraction incorrect_ (_i.e._ the _error percentage_) is saved. The evaluation process is repeated at least 30 times to achieve a statistical reliability.
7076

71-
For generating the crystal structure from the physical properties, see `mpds_ml_labs/test_design.py`.
77+
Generating the crystal structure from the physical properties is done as follows. The decision-tree properties predictions of nearly 115k distinct MPDS phases are used for the radius-based neighbor learning. This allows to extrapolate the possible chemical elements for almost any given combination of physical properties. The results of the neighbor learning are approximately 7.6M rows, stored in a Postgres table `ml_knn`:
78+
79+
```sql
80+
CREATE TABLE ml_knn (
81+
id INT PRIMARY KEY,
82+
z SMALLINT NOT NULL,
83+
y SMALLINT NOT NULL,
84+
x SMALLINT NOT NULL,
85+
k SMALLINT NOT NULL,
86+
w SMALLINT NOT NULL,
87+
m SMALLINT NOT NULL,
88+
d SMALLINT NOT NULL,
89+
t SMALLINT NOT NULL,
90+
els VARCHAR(19)
91+
);
92+
CREATE SEQUENCE ml_knn_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1;
93+
ALTER SEQUENCE ml_knn_id_seq OWNED BY ml_knn.id;
94+
ALTER TABLE ONLY ml_knn ALTER COLUMN id SET DEFAULT nextval('ml_knn_id_seq'::regclass);
95+
CREATE INDEX prop_z ON ml_knn USING btree(z);
96+
CREATE INDEX prop_y ON ml_knn USING btree(y);
97+
CREATE INDEX prop_x ON ml_knn USING btree(x);
98+
CREATE INDEX prop_k ON ml_knn USING btree(k);
99+
CREATE INDEX prop_w ON ml_knn USING btree(w);
100+
CREATE INDEX prop_m ON ml_knn USING btree(m);
101+
CREATE INDEX prop_d ON ml_knn USING btree(d);
102+
CREATE INDEX prop_t ON ml_knn USING btree(t);
103+
```
104+
105+
The full contents of this table can be provided by request. The found elements matching the given property ranges are used to compile a crystal structure based on the available MPDS structure prototypes (via the MPDS API). See `mpds_ml_labs/test_design_cmd.py`.
72106

73107

74108
API
75109
------
76110

77-
At the local server:
111+
These are examples of using the `curl` command-line client.
112+
113+
For the local server:
78114

79115
```shell
80116
curl -XPOST http://localhost:5000/predict -d "structure=data_in_CIF_or_POSCAR"
81-
curl -XPOST http://localhost:5000/design -d "numerics=ranges_of_values_of_the_8_properties_in_JSON"
117+
curl -XPOST http://localhost:5000/design -d "numerics=ranges_of_values_of_8_properties_in_JSON"
82118
```
83119

84-
At the demonstration MPDS server (may be switched off):
120+
For the demonstration MPDS server:
85121

86122
```shell
87123
curl -XPOST https://labs.mpds.io/predict -d "structure=data_in_CIF_or_POSCAR"
88-
curl -XPOST https://labs.mpds.io/design -d "numerics=ranges_of_values_of_the_8_properties_in_JSON"
124+
curl -XPOST https://labs.mpds.io/design -d "numerics=ranges_of_values_of_8_properties_in_JSON"
89125
```
90126

91127

0 commit comments

Comments
 (0)