Skip to content

Commit fde15c0

Browse files
committed
Documentation
1 parent 9324468 commit fde15c0

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

README.adoc

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ variables `x = (x~1~, x~2~,...,x~C~)` (called explanatory variables) based on a
1111
image:http://latex.codecogs.com/gif.latex?h(x)%20=%20\theta_{0}%20+%20\sum_{j=1}^C%20\theta_{j}%20x_{j}[]
1212

1313
This plugin enhances Elasticsearch's query engine by two new aggregations, which utilize the index data during search
14-
for estimating a linear regression model in order to expose information like prediction of a value for the target variable,
14+
as training data for estimating a linear regression model in order to expose information like prediction of a value for the target variable,
1515
anomaly detection and measuring the accuracy or rather predictiveness of the model.
1616
Estimation is performed regarding the https://en.wikipedia.org/wiki/Ordinary_least_squares[OLS]
1717
(ordinary least-squares) approach over the search result set.
@@ -40,8 +40,8 @@ regarding the estimated model with respect to a set of given input values for th
4040
of the linear hypothesis function ``h(x)``.
4141

4242
Assuming the data consists of documents representing sold house prices with features
43-
like number of bedrooms, bathrooms and size etc. we can predict or validate
44-
the price for our house in Morro Bay with 2000 square feet, 4 bedrooms and 2 bathrooms.
43+
like number of bedrooms, bathrooms and size etc. we can let predict or validate
44+
the price for our house in Morro Bay with 2000 square feet, 4 bedrooms and 2 bathrooms by:
4545

4646
[source,js]
4747
--------------------------------------------------
@@ -70,7 +70,7 @@ Assuming the data consists of documents representing sold house prices with feat
7070
have to be passed in array form in the order corresponding to the features listed in the `fields` attribute.
7171
The size of the `inputs` array is `C` equivalent to the number of the explanatory variables.
7272

73-
And the following may be the response with the estimated price for our house:
73+
And the following may be the response with the estimated price of around $ 581,458 for our house:
7474

7575
[source,js]
7676
--------------------------------------------------
@@ -166,7 +166,14 @@ Do not forget to restart the node after installing.
166166
|===
167167

168168
## Algorithm
169-
...
169+
This implementation is based on a new parallel, single-pass OLS estimation algorithm for multiple linear regression
170+
(not yet published). By aggregating
171+
over the data only once and in parallel the algorithm is ideally suited for large-scale, distributed data sets and
172+
in this respect surpasses the majority of existing multi-pass analytical OLS estimators or iterative optimization algorithms.
173+
174+
The overall complexity of the implemented algorithm to estimate the regression coefficients is `O(N C² + C³)`, where
175+
`N` denotes the size of the training data set (the number of documents in the search result set) and `C` the number
176+
of the indicated explanatory variables (fields).
170177

171178
## Examples
172179
...

0 commit comments

Comments
 (0)