Skip to content

Commit c33a976

Browse files
committed
Documentation
1 parent d50fce3 commit c33a976

File tree

2 files changed

+108
-2
lines changed

2 files changed

+108
-2
lines changed

README.adoc

Lines changed: 75 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ Due to algorithmic constraints both aggregations result an empty response, if
150150

151151
* the search result size is less or equal than the number of indicated explanatory variables,
152152
* values of the explanatory variables in the search result set is linearly dependent (that means
153-
that a column can be written as a linear combination of the other columns)
153+
that a column can be written as a linear combination of the other columns).
154154

155155

156156
## Algorithm
@@ -164,8 +164,81 @@ The overall complexity of the implemented algorithm to estimate the regression c
164164
of the indicated explanatory variables (fields).
165165

166166
## Examples
167-
...
167+
### Predicting house prices
168+
The idea is very simple. We have data in our Elasticsearch index representing
169+
sold house prices in our region with some features like square footage of
170+
the house, # of bathrooms, # of bedrooms etc. Now we want to find out which
171+
price we have to pay for a house of our dreams.
168172

173+
In this example we use test data from: http://wiki.csc.calpoly.edu/datasets/attachment/wiki/Houses/RealEstate.csv?format=raw
174+
175+
To import the data into Elasticsearch we use logstash and this pipeline config
176+
https://github.com/scaleborn/elasticsearch-linear-regression/tree/master/examples/houseprices/house-prices-import.conf[house-prices-import.conf]:
177+
[source,js]
178+
--------------------------------------------------
179+
./bin/logstash -f house-prices.conf
180+
--------------------------------------------------
181+
The index data will have this form:
182+
[source,js]
183+
--------------------------------------------------
184+
{
185+
"_index": "houses",
186+
"_type": "prices",
187+
"_id": "AV0zjVhTomRh2LZNgmfJ",
188+
"_source": {
189+
"bathrooms": 3,
190+
"bedrooms": 4,
191+
"size": 4168,
192+
"mls": "140077",
193+
"price": 1100000,
194+
"location": "Morro Bay",
195+
"price_sq_ft": 263.92,
196+
"status": "Short Sale"
197+
}
198+
}
199+
--------------------------------------------------
200+
201+
We can now query the index for houses "Morro Bay" and let predict the price
202+
with respect to the features of our dream house, which should have 3 bedrooms,
203+
2 bathrooms and at least 2000 square feet:
204+
[source,js]
205+
--------------------------------------------------
206+
/houses/_search?size=0
207+
{
208+
"query": {
209+
"match" : {
210+
"location" : "Morro Bay"
211+
}
212+
},
213+
"aggs": {
214+
"dream_house_price": {
215+
"linreg_predict": {
216+
"fields": ["size", "bedrooms", "bathrooms", "price"],
217+
"inputs": [2000, 3, 2]
218+
}
219+
}
220+
}
221+
}
222+
--------------------------------------------------
223+
224+
Regarding the following prediction response we have to expect about
225+
$ 650,000 to pay for the desired house in "Morro Bay".
226+
[source,js]
227+
--------------------------------------------------
228+
{
229+
"aggregations": {
230+
"dream_house_price": {
231+
"value": 649918.0709489314,
232+
"coefficients": [
233+
249.02340193904183,
234+
-68314.4830871133,
235+
64248.05007337558
236+
],
237+
"intercept": 228318.6161854365
238+
}
239+
}
240+
}
241+
--------------------------------------------------
169242

170243
## Installation
171244

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
input {
2+
file {
3+
path => ["RealEstate.csv"]
4+
start_position => "beginning"
5+
sincedb_path => "/dev/null"
6+
}
7+
}
8+
9+
filter {
10+
csv {
11+
columns => [
12+
"mls",
13+
"location",
14+
"price",
15+
"bedrooms",
16+
"bathrooms",
17+
"size",
18+
"price_sq_ft",
19+
"status"
20+
]
21+
convert => { "price" => "float" "bedrooms" => "integer" "bathrooms" => "integer" "size" => "integer" "price_sq_ft" => "float" }
22+
}
23+
}
24+
25+
output {
26+
elasticsearch {
27+
action => "index"
28+
hosts => ["127.0.0.1:9200"]
29+
index => "houses"
30+
document_type => "prices"
31+
workers => 1
32+
}
33+
}

0 commit comments

Comments
 (0)