@@ -11,7 +11,7 @@ variables `x = (x~1~, x~2~,...,x~C~)` (called explanatory variables) based on a
1111image:http://latex.codecogs.com/gif.latex?h(x)%20=%20\theta_{0}%20+%20\sum_{j=1}^C%20\theta_{j}%20x_{j}[]
1212
1313This plugin enhances Elasticsearch's query engine by two new aggregations, which utilize the index data during search
14- for estimating a linear regression model in order to expose information like prediction of a value for the target variable,
14+ as training data for estimating a linear regression model in order to expose information like prediction of a value for the target variable,
1515anomaly detection and measuring the accuracy or rather predictiveness of the model.
1616Estimation is performed regarding the https://en.wikipedia.org/wiki/Ordinary_least_squares[OLS]
1717(ordinary least-squares) approach over the search result set.
@@ -40,8 +40,8 @@ regarding the estimated model with respect to a set of given input values for th
4040 of the linear hypothesis function ``h(x)``.
4141
4242Assuming the data consists of documents representing sold house prices with features
43- like number of bedrooms, bathrooms and size etc. we can predict or validate
44- the price for our house in Morro Bay with 2000 square feet, 4 bedrooms and 2 bathrooms.
43+ like number of bedrooms, bathrooms and size etc. we can let predict or validate
44+ the price for our house in Morro Bay with 2000 square feet, 4 bedrooms and 2 bathrooms by:
4545
4646[source,js]
4747--------------------------------------------------
@@ -70,7 +70,7 @@ Assuming the data consists of documents representing sold house prices with feat
7070 have to be passed in array form in the order corresponding to the features listed in the `fields` attribute.
7171 The size of the `inputs` array is `C` equivalent to the number of the explanatory variables.
7272
73- And the following may be the response with the estimated price for our house:
73+ And the following may be the response with the estimated price of around $ 581,458 for our house:
7474
7575[source,js]
7676--------------------------------------------------
@@ -166,7 +166,14 @@ Do not forget to restart the node after installing.
166166|===
167167
168168## Algorithm
169- ...
169+ This implementation is based on a new parallel, single-pass OLS estimation algorithm for multiple linear regression
170+ (not yet published). By aggregating
171+ over the data only once and in parallel the algorithm is ideally suited for large-scale, distributed data sets and
172+ in this respect surpasses the majority of existing multi-pass analytical OLS estimators or iterative optimization algorithms.
173+
174+ The overall complexity of the implemented algorithm to estimate the regression coefficients is `O(N C² + C³)`, where
175+ `N` denotes the size of the training data set (the number of documents in the search result set) and `C` the number
176+ of the indicated explanatory variables (fields).
170177
171178## Examples
172179...
0 commit comments