Skip to content

Commit 2faad6d

Browse files
committed
#2 Don't fail aggregation in case of algorithmic conditions not satisfied by the data => just serve the empty aggregation
1 parent fe54c29 commit 2faad6d

File tree

22 files changed

+286
-179
lines changed

22 files changed

+286
-179
lines changed

README.adoc

Lines changed: 94 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,9 @@ regarding the estimated model with respect to a set of given input values for th
3333
`value`:: The predicted value for the response variable computed using the estimated linear hypothesis
3434
function ``h(x)`` with `x` given by `C` input values for the explanatory variables
3535
`x = [x~1~, x~2~,...,x~C~]`.
36-
`coefficients`:: Estimated slope coefficients
37-
image:http://latex.codecogs.com/gif.latex?\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
36+
`coefficients`:: Estimated coefficients
37+
image:http://latex.codecogs.com/gif.latex?\theta_0,%20\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
3838
of the linear linear hypothesis function ``h(x)``.
39-
`intercept`:: Estimated intercept coefficient image:http://latex.codecogs.com/gif.latex?\theta_0%20[]
40-
of the linear hypothesis function ``h(x)``.
4139

4240
Assuming the data consists of documents representing sold house prices with features
4341
like number of bedrooms, bathrooms and size etc. we can let predict or validate
@@ -80,11 +78,11 @@ And the following may be the response with the estimated price of around $ 581,4
8078
"my_house_price": {
8179
"value": 581458.3087492324,
8280
"coefficients": [
81+
227990.63952712028,
8382
248.92285661317254,
8483
-68297.7720278421,
8584
64406.52205356777
86-
],
87-
"intercept": 227990.63952712028
85+
]
8886
}
8987
}
9088
}
@@ -99,11 +97,9 @@ The `linreg_stats` aggregation computes statistics for the estimated linear regr
9997
`rss`:: Residual sum of squares as a measure of the discrepancy between the data and the estimated model.
10098
The lower the `rss` number, the smaller the error of the prediction, and the better the model.
10199
`mse`:: Mean squared error or rather `rss` divided by the number of documents consumed for model estimation.
102-
`coefficients`:: Slope coefficients
103-
image:http://latex.codecogs.com/gif.latex?\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
100+
`coefficients`:: Estimated coefficients
101+
image:http://latex.codecogs.com/gif.latex?\theta_0,%20\theta_1,%20\theta_2,%20\theta_3,.%20.%20.,%20\theta_C%20[]
104102
of the linear linear hypothesis function ``h(x)``.
105-
`intercept`:: Intercept coefficient image:http://latex.codecogs.com/gif.latex?\theta_0%20[]
106-
of the linear hypothesis function ``h(x)``.
107103

108104
Assuming the data consists of documents representing house prices we can compute statistics for
109105
the estimated best fitting linear hypothesis function which predicts house prices based on number of
@@ -135,11 +131,11 @@ and the last for the response variable. The above request returns the following
135131
"rss": 49523788338938.734,
136132
"mse": 63410740510.80504,
137133
"coefficients": [
134+
47553.18737564783,
138135
-100544.0725894584,
139136
45981.15827544966,
140137
309.6013051477475
141-
],
142-
"intercept": 47553.18737564783
138+
]
143139
}
144140
}
145141
}
@@ -180,7 +176,8 @@ Do not forget to restart the node after installing.
180176
[frame="all"]
181177
|===
182178
| Plugin version | Elasticsearch version | Release date
183-
| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip[5.3.0.1] | 5.3.0 | Jun 1, 2017
179+
| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.2.zip[5.3.0.2] | 5.3.0 | Jul 16, 2017
180+
| https://github.com/scaleborn/elasticsearch-linear-regression/releases/download/5.3.0.1/elasticsearch-linear-regression-5.3.0.1.zip[5.3.0.1] | 5.3.0 | Jun 30, 2017
184181
|===
185182

186183
## Examples
@@ -198,7 +195,7 @@ https://github.com/scaleborn/elasticsearch-linear-regression/tree/master/example
198195
./bin/logstash -f house-prices-import.conf
199196
....
200197

201-
The indexed data will have this form:
198+
The indexed documents will have this form:
202199
[source,js]
203200
--------------------------------------------------
204201
{
@@ -250,16 +247,97 @@ $ 650,000 to pay for the desired house in "Morro Bay".
250247
"dream_house_price": {
251248
"value": 649918.0709489314,
252249
"coefficients": [
250+
228318.6161854365,
253251
249.02340193904183,
254252
-68314.4830871133,
255253
64248.05007337558
256-
],
257-
"intercept": 228318.6161854365
254+
]
258255
}
259256
}
260257
}
261258
--------------------------------------------------
262259

260+
By using sub aggregations we are able to find out the estimated prices per location:
261+
[source,js]
262+
--------------------------------------------------
263+
/houses/_search?size=0
264+
{
265+
"aggs": {
266+
"locations": {
267+
"terms": {
268+
"field": "location.keyword",
269+
"size": 15
270+
},
271+
"aggs": {
272+
"dream_house_price": {
273+
"linreg_predict": {
274+
"fields": ["size", "bedrooms", "bathrooms", "price"],
275+
"inputs": [2000, 3, 2]
276+
}
277+
}
278+
}
279+
}
280+
}
281+
}
282+
--------------------------------------------------
283+
284+
The response uncovers that "Arroyo Grande" would be
285+
the most expensive region for our dream house:
286+
287+
[source,js]
288+
--------------------------------------------------
289+
{
290+
"aggregations": {
291+
"locations": {
292+
"buckets": [
293+
{
294+
"key": "Santa Maria-Orcutt",
295+
"doc_count": 265,
296+
"dream_house_price": {
297+
"value": 256251.9105297585,
298+
"coefficients": [
299+
26437.192829649313,
300+
81.19071633227178,
301+
6825.9128627023265,
302+
23477.773223729317
303+
]
304+
}
305+
},
306+
{
307+
"key": "Paso Robles",
308+
"doc_count": 85,
309+
"dream_house_price": {
310+
"value": 365620.0386191703,
311+
"coefficients": [
312+
42958.257094706176,
313+
151.7000907380368,
314+
6486.477078139843,
315+
-98.91559301451247
316+
]
317+
}
318+
},
319+
...
320+
{
321+
"key": " Arroyo Grande",
322+
"doc_count": 12,
323+
"dream_house_price": {
324+
"value": 1140196.791331573,
325+
"coefficients": [
326+
728566.7474390095,
327+
1956.6474540196602,
328+
-706891.620925945,
329+
-690495.0006844609
330+
]
331+
}
332+
}
333+
...
334+
]
335+
}
336+
}
337+
}
338+
--------------------------------------------------
339+
340+
263341
## License
264342
Copyright 2017 Scaleborn UG (haftungsbeschränkt).
265343

gradle.properties

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ wagon-ssh-external.version=2.10
44
commons-math3.version=3.6.1
55
group=org.scaleborn.elasticsearch.plugin
66
name=elasticsearch-linear-regression
7-
version=5.3.0.1
7+
version=5.3.0.2

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/predict/InternalPrediction.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
import org.elasticsearch.common.logging.Loggers;
2626
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
2727
import org.scaleborn.elasticsearch.linreg.aggregation.support.BaseInternalAggregation;
28-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
28+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
2929

3030
/**
3131
* Created by mbok on 11.04.17.

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/predict/PredictionResults.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
import org.elasticsearch.common.xcontent.XContentBuilder;
2323
import org.elasticsearch.search.aggregations.InternalAggregation.CommonFields;
2424
import org.scaleborn.elasticsearch.linreg.aggregation.support.ModelResults;
25-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
25+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
2626

2727
/**
2828
* Created by mbok on 11.04.17.

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/stats/InternalStats.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
import org.scaleborn.linereg.calculation.statistics.Statistics;
2828
import org.scaleborn.linereg.calculation.statistics.StatsCalculator;
2929
import org.scaleborn.linereg.calculation.statistics.StatsModel;
30-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
30+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
3131

3232
/**
3333
* Created by mbok on 21.03.17.

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/stats/StatsResults.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import org.scaleborn.elasticsearch.linreg.aggregation.support.ModelResults;
2424
import org.scaleborn.linereg.calculation.statistics.Statistics;
2525
import org.scaleborn.linereg.calculation.statistics.Statistics.DefaultStatistics;
26-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
26+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
2727

2828
/**
2929
* Created by mbok on 07.04.17.

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/support/BaseInternalAggregation.java

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,12 @@
3030
import org.elasticsearch.search.aggregations.InternalAggregation;
3131
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
3232
import org.scaleborn.linereg.calculation.intercept.InterceptCalculator;
33-
import org.scaleborn.linereg.evaluation.DerivationEquation;
34-
import org.scaleborn.linereg.evaluation.DerivationEquationBuilder;
35-
import org.scaleborn.linereg.evaluation.DerivationEquationSolver;
36-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
37-
import org.scaleborn.linereg.evaluation.commons.CommonsMathSolver;
33+
import org.scaleborn.linereg.estimation.DerivationEquation;
34+
import org.scaleborn.linereg.estimation.DerivationEquationBuilder;
35+
import org.scaleborn.linereg.estimation.DerivationEquationSolver;
36+
import org.scaleborn.linereg.estimation.DerivationEquationSolver.EstimationException;
37+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
38+
import org.scaleborn.linereg.estimation.commons.CommonsMathSolver;
3839

3940
/**
4041
* Created by mbok on 07.04.17.
@@ -142,9 +143,7 @@ public InternalAggregation doReduce(final List<InternalAggregation> aggregations
142143

143144
// return empty result if all samples are null
144145
if (aggs.isEmpty()) {
145-
return buildInternalAggregation(this.name, this.featuresCount, null, null,
146-
pipelineAggregators(),
147-
getMetaData());
146+
return buildEmptyInternalAggregation();
148147
}
149148

150149
final S composedSampling = buildSampling(this.featuresCount);
@@ -154,14 +153,34 @@ public InternalAggregation doReduce(final List<InternalAggregation> aggregations
154153
composedSampling.merge((S) ((BaseInternalAggregation) aggs.get(i)).sampling);
155154
}
156155

157-
final M evaluatedResults = evaluateResults(composedSampling);
156+
if (composedSampling.getCount() <= composedSampling.getFeaturesCount()) {
157+
LOGGER.debug(
158+
"Insufficient amount of training data for model estimation, at least {} are required, given {}",
159+
composedSampling.getFeaturesCount() + 1, composedSampling.getCount());
160+
return buildEmptyInternalAggregation();
161+
}
162+
163+
M evaluatedResults = null;
164+
try {
165+
evaluatedResults = evaluateResults(composedSampling);
166+
} catch (final EstimationException e) {
167+
LOGGER.debug(
168+
"Failed to estimate model", e);
169+
return buildEmptyInternalAggregation();
170+
}
158171

159172
LOGGER.debug("Evaluated results: {}", evaluatedResults);
160173
return buildInternalAggregation(this.name, this.featuresCount, composedSampling,
161174
evaluatedResults,
162175
pipelineAggregators(), getMetaData());
163176
}
164177

178+
private InternalAggregation buildEmptyInternalAggregation() {
179+
return buildInternalAggregation(this.name, this.featuresCount, null, null,
180+
pipelineAggregators(),
181+
getMetaData());
182+
}
183+
165184
protected abstract A buildInternalAggregation(final String name, final int featuresCount,
166185
final S linRegSampling,
167186
final M results,
@@ -171,12 +190,12 @@ protected abstract M buildResults(S composedSampling, SlopeCoefficients slopeCoe
171190
double intercept);
172191

173192

174-
private M evaluateResults(final S composedSampling) {
175-
// Linear regression evaluation
193+
private M evaluateResults(final S composedSampling) throws EstimationException {
194+
// Linear regression estimation
176195
final DerivationEquation derivationEquation = derivationEquationBuilder
177196
.buildDerivationEquation(composedSampling);
178197
final SlopeCoefficients slopeCoefficients = derivationEquationSolver
179-
.solveCoefficients(derivationEquation);
198+
.estimateCoefficients(derivationEquation);
180199
final M buildResults = buildResults(composedSampling, slopeCoefficients,
181200
interceptCalculator.calculate(slopeCoefficients, composedSampling, composedSampling));
182201
return buildResults;

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/support/BaseSampling.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
package org.scaleborn.elasticsearch.linreg.aggregation.support;
1818

1919
import java.io.IOException;
20-
import org.scaleborn.linereg.evaluation.SlopeCoefficientsSampling.SlopeCoefficientsSamplingProxy;
20+
import org.scaleborn.linereg.estimation.SlopeCoefficientsSampling.SlopeCoefficientsSamplingProxy;
2121
import org.scaleborn.linereg.sampling.Sampling.InterceptSampling;
2222
import org.scaleborn.linereg.sampling.io.StateInputStream;
2323
import org.scaleborn.linereg.sampling.io.StateOutputStream;

src/main/java/org/scaleborn/elasticsearch/linreg/aggregation/support/ModelResults.java

Lines changed: 13 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -17,69 +17,52 @@
1717
package org.scaleborn.elasticsearch.linreg.aggregation.support;
1818

1919
import java.io.IOException;
20+
import java.util.Arrays;
2021
import org.elasticsearch.common.io.stream.StreamInput;
2122
import org.elasticsearch.common.io.stream.StreamOutput;
2223
import org.elasticsearch.common.io.stream.Writeable;
2324
import org.elasticsearch.common.xcontent.ToXContent;
2425
import org.elasticsearch.common.xcontent.XContentBuilder;
25-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
26-
import org.scaleborn.linereg.evaluation.SlopeCoefficients.DefaultSlopeCoefficients;
26+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
2727

2828
/**
2929
* Created by mbok on 07.04.17.
3030
*/
3131
public class ModelResults implements Writeable, ToXContent {
3232

33-
private SlopeCoefficients slopeCoefficients;
34-
35-
private double intercept;
33+
private final double[] coefficients;
3634

3735
public ModelResults(final SlopeCoefficients slopeCoefficients, final double intercept) {
38-
this.slopeCoefficients = slopeCoefficients;
39-
this.intercept = intercept;
36+
final int slopeLen = slopeCoefficients.getCoefficients().length;
37+
this.coefficients = new double[slopeLen + 1];
38+
System.arraycopy(slopeCoefficients.getCoefficients(), 0, this.coefficients, 1, slopeLen);
39+
this.coefficients[0] = intercept;
4040
}
4141

4242
public ModelResults(final StreamInput in) throws IOException {
43-
this.slopeCoefficients = new DefaultSlopeCoefficients(in.readDoubleArray());
44-
this.intercept = in.readDouble();
43+
this.coefficients = in.readDoubleArray();
4544
}
4645

4746
@Override
4847
public void writeTo(final StreamOutput out) throws IOException {
49-
out.writeDoubleArray(this.slopeCoefficients.getCoefficients());
50-
out.writeDouble(this.intercept);
51-
}
52-
53-
public SlopeCoefficients getSlopeCoefficients() {
54-
return this.slopeCoefficients;
55-
}
56-
57-
public void setSlopeCoefficients(final SlopeCoefficients slopeCoefficients) {
58-
this.slopeCoefficients = slopeCoefficients;
59-
}
60-
61-
public double getIntercept() {
62-
return this.intercept;
48+
out.writeDoubleArray(this.coefficients);
6349
}
6450

65-
public void setIntercept(final double intercept) {
66-
this.intercept = intercept;
51+
public double[] getCoefficients() {
52+
return this.coefficients;
6753
}
6854

69-
7055
@Override
7156
public String toString() {
7257
return "ModelResults{" +
73-
"slopeCoefficients=" + this.slopeCoefficients +
74-
", intercept=" + this.intercept +
58+
"coefficients=" + Arrays.toString(this.coefficients) +
7559
'}';
7660
}
7761

7862
@Override
7963
public XContentBuilder toXContent(final XContentBuilder builder, final Params params)
8064
throws IOException {
81-
builder.array("coefficients", this.getSlopeCoefficients().getCoefficients());
82-
builder.field("intercept", this.getIntercept());
65+
builder.array("coefficients", this.coefficients);
8366
return builder;
8467
}
8568

src/main/java/org/scaleborn/linereg/calculation/intercept/InterceptCalculator.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
package org.scaleborn.linereg.calculation.intercept;
1818

19-
import org.scaleborn.linereg.evaluation.SlopeCoefficients;
19+
import org.scaleborn.linereg.estimation.SlopeCoefficients;
2020
import org.scaleborn.linereg.sampling.Sampling.InterceptSampling;
2121
import org.scaleborn.linereg.sampling.Sampling.SamplingContext;
2222

0 commit comments

Comments
 (0)