Skip to content

Commit a3cf3d7

Browse files
committed
Roll the first version of UCI data with the wine dataset
1 parent 6bdeec0 commit a3cf3d7

File tree

11 files changed

+167
-0
lines changed

11 files changed

+167
-0
lines changed

.Rbuildignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
^.*\.Rproj$
2+
^\.Rproj\.user$
3+
^data-raw$

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.Rproj.user
2+
.Rhistory
3+
.RData
4+
.Ruserdata

DESCRIPTION

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Package: ucidata
2+
Title: Collection of Datasets from the UCI Irving Machine Learning Repository
3+
Version: 0.0.1
4+
Authors@R: person("James", "Balamuta", email = "[email protected]", role = c("aut", "cre"))
5+
Description: Varying datasets from the UCI Irving Machine Learning Repository
6+
Depends: R (>= 3.4.1)
7+
License: GPL (>=2)
8+
Encoding: UTF-8
9+
LazyData: true
10+
Roxygen: list(markdown = TRUE)
11+
RoxygenNote: 6.0.1

NAMESPACE

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Generated by roxygen2: do not edit by hand
2+

R/pkg_datasets.R

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#' Wine Data Set
2+
#'
3+
#' This data set is the combination of two datasets that were created, using red and white wine samples.
4+
#' The inputs include objective tests (e.g. PH values) and the output is based on sensory data
5+
#' (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality
6+
#' between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model
7+
#' these datasets under a regression approach. The support vector machine model achieved the
8+
#' best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T),
9+
#' etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity
10+
#' analysis procedure).
11+
#' @format A data frame with 6497 observations (1599 Red and 4898 White) on the following 12 variables.
12+
#' - fixed acidity
13+
#' - volatile acidity
14+
#' - citric acid
15+
#' - residual sugar
16+
#' - chlorides
17+
#' - free sulfur dioxide
18+
#' - total sulfur dioxide
19+
#' - density
20+
#' - pH
21+
#' - sulphates
22+
#' - alcohol
23+
#' - quality
24+
#' - Score between 0 and 10 based on sensor reading
25+
#' - color
26+
#' - `"White"` or `"Red"`
27+
#' @source P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
28+
#' Modeling wine preferences by data mining from physicochemical properties.
29+
#' In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
30+
#' @references
31+
#' <https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.names>
32+
#' <https://archive.ics.uci.edu/ml/datasets/wine>
33+
"wine"

R/ucidata.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
#' @keywords internal
2+
"_PACKAGE"

data-raw/wine_build.R

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
### UCI Irving
2+
## Wine Data https://archive.ics.uci.edu/ml/datasets/wine
3+
4+
# Location of Data Sets
5+
red_wine_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
6+
white_wine_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
7+
8+
# Note the .csv uses a `;` as the separater. Not `,`
9+
red_wine_data = read.csv(red_wine_url, sep = ";")
10+
white_wine_data = read.csv(white_wine_url, sep = ";")
11+
12+
# Load in Red vs. White Data
13+
red_wine_data$color = "Red"
14+
white_wine_data$color = "White"
15+
16+
# Merge the two data sets together
17+
wine = rbind(red_wine_data, white_wine_data)
18+
19+
# Convert color into a factor
20+
wine$color = as.factor(wine$color)
21+
22+
# Remove periods
23+
colnames(wine) = gsub("\\.", "_", colnames(wine))
24+
25+
devtools::use_data(wine)

data/wine.rda

77.4 KB
Binary file not shown.

man/ucidata-package.Rd

Lines changed: 15 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/wine.Rd

Lines changed: 51 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)