Minor fixing

blokhin · blokhin · commit 7b3c36f6eff1 · 2018-05-01T00:27:30.000+02:00
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,5 @@
 *.py[co]
-data/settings.ini
+data/settings.*
 *DS_Store*
 pip-selfcheck.json
 pip-log.txt
diff --git a/README.md b/README.md
@@ -57,7 +57,7 @@ Used descriptor and model details
 
 The term _descriptor_ stands for the compact information-rich representation, allowing the convenient mathematical treatment of the encoded complex data (_i.e._ crystalline structure). Any crystalline structure is populated to a certain relatively big fixed volume of minimum one cubic nanometer. Then the descriptor is constructed using the periodic numbers of atoms and the lengths of their radius-vectors. The details are in the file `mpds_ml_labs/prediction.py`.
 
-As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the metrics of _mean absolute error_ and _R2 coefficient of determination_ are used. In order to estimate the prediction quality of the _classifier_ model (binary case), the simple error percentage is used (`(false positives + false negatives)/all outcome`). The evaluation process is repeated at least 30 times to achieve a statistical reliability.
+As a machine-learning model an ensemble of decision trees ([random forest regressor](http://scikit-learn.org/stable/modules/ensemble.html)) is used, as implemented in [scikit-learn](http://scikit-learn.org) Python machine-learning toolkit. The whole MPDS dataset can be used for training. In order to estimate the prediction quality of the _regressor_ model, the _mean absolute error_ and _R2 coefficient of determination_ is saved. In order to estimate the prediction quality of the binary _classifier_ model, the _fraction incorrect_ (_i.e._ _error percentage_) is saved. The evaluation process is repeated at least 30 times to achieve a statistical reliability.
 
 API
 ------
diff --git a/train_regressor.py b/train_regressor.py
@@ -141,7 +141,7 @@ def tune_model(data_file):
     parameter_a = results[-1][0]
 
     results = []
-    for parameter_b in range(5, 31):
+    for parameter_b in range(10, 101, 2):
         avg_mae, avg_r2 = estimate_regr_quality(get_regr(a=parameter_a, b=parameter_b), X, y)
         results.append([parameter_b, avg_mae, avg_r2])
         print("%s\t\t\t%s\t\t\t%s" % (parameter_b, avg_mae, avg_r2))