@@ -145,6 +145,27 @@ and the last for the response variable. The above request returns the following
145145}
146146--------------------------------------------------
147147
148+ === Data conditions
149+ Due to algorithmic constraints both aggregations result an empty response, if
150+ * the search result size is less or equal than the number of indicated explanatory variables,
151+ * values of the explanatory variables in the search result set is linearly dependent (that means
152+ that a column can be written as a linear combination of the other columns)
153+
154+
155+ ## Algorithm
156+ This implementation is based on a new parallel, single-pass OLS estimation algorithm for multiple linear regression
157+ (not yet published). By aggregating
158+ over the data only once and in parallel the algorithm is ideally suited for large-scale, distributed data sets and
159+ in this respect surpasses the majority of existing multi-pass analytical OLS estimators or iterative optimization algorithms.
160+
161+ The overall complexity of the implemented algorithm to estimate the regression coefficients is `O(N C² + C³)`, where
162+ `N` denotes the size of the training data set (the number of documents in the search result set) and `C` the number
163+ of the indicated explanatory variables (fields).
164+
165+ ## Examples
166+ ...
167+
168+
148169## Installation
149170
150171### Elasticsearch 5.x
@@ -165,18 +186,6 @@ Do not forget to restart the node after installing.
165186| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip[5.3.0.1] | 5.3.0 | Jun 1, 2017
166187|===
167188
168- ## Algorithm
169- This implementation is based on a new parallel, single-pass OLS estimation algorithm for multiple linear regression
170- (not yet published). By aggregating
171- over the data only once and in parallel the algorithm is ideally suited for large-scale, distributed data sets and
172- in this respect surpasses the majority of existing multi-pass analytical OLS estimators or iterative optimization algorithms.
173-
174- The overall complexity of the implemented algorithm to estimate the regression coefficients is `O(N C² + C³)`, where
175- `N` denotes the size of the training data set (the number of documents in the search result set) and `C` the number
176- of the indicated explanatory variables (fields).
177-
178- ## Examples
179- ...
180189
181190## License
182191Copyright 2017 Scaleborn UG (haftungsbeschränkt).
0 commit comments