@@ -150,7 +150,7 @@ Due to algorithmic constraints both aggregations result an empty response, if
150150
151151* the search result size is less or equal than the number of indicated explanatory variables,
152152* values of the explanatory variables in the search result set is linearly dependent (that means
153- that a column can be written as a linear combination of the other columns)
153+ that a column can be written as a linear combination of the other columns).
154154
155155
156156## Algorithm
@@ -164,8 +164,81 @@ The overall complexity of the implemented algorithm to estimate the regression c
164164of the indicated explanatory variables (fields).
165165
166166## Examples
167- ...
167+ ### Predicting house prices
168+ The idea is very simple. We have data in our Elasticsearch index representing
169+ sold house prices in our region with some features like square footage of
170+ the house, # of bathrooms, # of bedrooms etc. Now we want to find out which
171+ price we have to pay for a house of our dreams.
168172
173+ In this example we use test data from: http://wiki.csc.calpoly.edu/datasets/attachment/wiki/Houses/RealEstate.csv?format=raw
174+
175+ To import the data into Elasticsearch we use logstash and this pipeline config
176+ https://github.com/scaleborn/elasticsearch-linear-regression/tree/master/examples/houseprices/house-prices-import.conf[house-prices-import.conf]:
177+ [source,js]
178+ --------------------------------------------------
179+ ./bin/logstash -f house-prices.conf
180+ --------------------------------------------------
181+ The index data will have this form:
182+ [source,js]
183+ --------------------------------------------------
184+ {
185+ "_index": "houses",
186+ "_type": "prices",
187+ "_id": "AV0zjVhTomRh2LZNgmfJ",
188+ "_source": {
189+ "bathrooms": 3,
190+ "bedrooms": 4,
191+ "size": 4168,
192+ "mls": "140077",
193+ "price": 1100000,
194+ "location": "Morro Bay",
195+ "price_sq_ft": 263.92,
196+ "status": "Short Sale"
197+ }
198+ }
199+ --------------------------------------------------
200+
201+ We can now query the index for houses "Morro Bay" and let predict the price
202+ with respect to the features of our dream house, which should have 3 bedrooms,
203+ 2 bathrooms and at least 2000 square feet:
204+ [source,js]
205+ --------------------------------------------------
206+ /houses/_search?size=0
207+ {
208+ "query": {
209+ "match" : {
210+ "location" : "Morro Bay"
211+ }
212+ },
213+ "aggs": {
214+ "dream_house_price": {
215+ "linreg_predict": {
216+ "fields": ["size", "bedrooms", "bathrooms", "price"],
217+ "inputs": [2000, 3, 2]
218+ }
219+ }
220+ }
221+ }
222+ --------------------------------------------------
223+
224+ Regarding the following prediction response we have to expect about
225+ $ 650,000 to pay for the desired house in "Morro Bay".
226+ [source,js]
227+ --------------------------------------------------
228+ {
229+ "aggregations": {
230+ "dream_house_price": {
231+ "value": 649918.0709489314,
232+ "coefficients": [
233+ 249.02340193904183,
234+ -68314.4830871133,
235+ 64248.05007337558
236+ ],
237+ "intercept": 228318.6161854365
238+ }
239+ }
240+ }
241+ --------------------------------------------------
169242
170243## Installation
171244
0 commit comments