Skip to content

Commit e7f2afa

Browse files
Merge pull request #38 from JuliaML/boston
add boston housing
2 parents c3441c4 + 6fa1237 commit e7f2afa

File tree

7 files changed

+644
-2
lines changed

7 files changed

+644
-2
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ addons:
2525
jobs:
2626
include:
2727
- stage: "Documentation"
28-
julia: 1.0
28+
julia: 1
2929
os: linux
3030
script:
3131
- julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd()));

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "MLDatasets"
22
uuid = "eb30cadb-4394-5ae3-aed4-317e484a6458"
3-
version = "0.5.0"
3+
version = "0.5.1"
44

55
[deps]
66
BinDeps = "9e28174c-4ba2-5203-b857-d8d62c4213ee"

src/BostonHousing/BostonHousing.jl

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
export BostonHousing
2+
3+
"""
4+
Boston Housing Dataset.
5+
6+
Sources:
7+
(a) Origin: This dataset was taken from the StatLib library which is
8+
maintained at Carnegie Mellon University.
9+
(b) Creator: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the
10+
demand for clean air', J. Environ. Economics & Management,
11+
vol.5, 81-102, 1978.
12+
(c) Date: July 7, 1993
13+
14+
Number of Instances: 506
15+
16+
Number of Attributes: 13 continuous attributes (including target
17+
attribute "MEDV"), 1 binary-valued attribute.
18+
19+
Features:
20+
1. CRIM per capita crime rate by town
21+
2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
22+
3. INDUS proportion of non-retail business acres per town
23+
4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
24+
5. NOX nitric oxides concentration (parts per 10 million)
25+
6. RM average number of rooms per dwelling
26+
7. AGE proportion of owner-occupied units built prior to 1940
27+
8. DIS weighted distances to five Boston employment centres
28+
9. RAD index of accessibility to radial highways
29+
10. TAX full-value property-tax rate per 10,000 dollars
30+
11. PTRATIO pupil-teacher ratio by town
31+
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
32+
13. LSTAT % lower status of the population
33+
34+
Target:
35+
14. MEDV Median value of owner-occupied homes in 1000's of dollars
36+
37+
Note: Variable #14 seems to be censored at 50.00 (corresponding to a median price of \$50,000);
38+
Censoring is suggested by the fact that the highest median price of exactly \$50,000 is reported in 16 cases,
39+
while 15 cases have prices between \$40,000 and \$50,000, with prices rounded to the nearest hundred.
40+
Harrison and Rubinfeld do not mention any censoring.
41+
42+
The data file stored in this repo is a copy of the This is a copy of UCI ML housing dataset.
43+
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/
44+
45+
## Interface
46+
47+
- [`BostonHousing.features`](@ref)
48+
- [`BostonHousing.targets`](@ref)
49+
- [`feature_names`](@ref)
50+
"""
51+
module BostonHousing
52+
53+
using DataDeps
54+
using DelimitedFiles
55+
56+
export features, targets, feature_names
57+
58+
const DATA = joinpath(@__DIR__, "boston_housing.csv")
59+
60+
"""
61+
targets(; dir = nothing)
62+
63+
Get the targets for the Boston housing dataset,
64+
a 506 element array listing the targets for each example.
65+
66+
```jldoctest
67+
julia> using MLDatasets: BostonHousing
68+
69+
julia> target = BostonHousing.targets()
70+
71+
julia> summary(target)
72+
1x506 Array{Float64,2}
73+
74+
julia> target[1]
75+
24.0
76+
```
77+
"""
78+
function targets(; dir = nothing)
79+
housing = readdlm(DATA, ',')
80+
reshape(Vector{Float64}(housing[2:end,end]), (1, 506))
81+
end
82+
83+
"""
84+
feature_names()
85+
86+
Return the the names of the features provided in the dataset.
87+
"""
88+
function feature_names()
89+
["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"]
90+
end
91+
92+
93+
"""
94+
features()
95+
96+
Return the features of the Boston Housing dataset. This is a 13x506 Matrix of Float64 datatypes.
97+
The values are in the order ["crim","zn","indus","chas","nox","rm","age","dis","rad","tax","ptratio","b","lstat"].
98+
It has 506 examples.
99+
100+
```jldoctest
101+
julia> using MLDatasets: BostonHousing
102+
103+
julia> features = BostonHousing.features()
104+
105+
julia> summary(features)
106+
13x506 Array{Float64,2}
107+
```
108+
"""
109+
function features()
110+
housing = readdlm(DATA, ',')
111+
Matrix{Float64}(housing[2:end,1:13])' |> collect
112+
end
113+
114+
end # module
115+

0 commit comments

Comments
 (0)