This project serves to demonstrate the collection and cleaning of a tidy data set that can be used for subsequent analysis. The R script, run_analysis.R
, does the above mentioned task.
The dataset being used is: Human Activity Recognition Using Samsung GalaxyS Smartphones.
-
You need to have Windows OS to run the R script
run_analysis.R
. -
You need to have Rstudio and the followiing packages installed to run
run_analysis.R
successfully :
- reshape2
- dplyr
CodeBook.md
describes the variables, the data, and any transformations or work that was performed to clean up the data.
run_analysis.R
contains all the code to perform the analysis described in the 5 steps. This can be launched in RStudio by just importing the file.
The output of the 5th step of the course project work is called tidydata.txt
.
The R script, run_analysis.R
, does the following:
- Download the dataset if it does not already exist in the working directory
- Load the activity and feature info
- Loads both the training and test datasets, keeping only those columns which reflect a mean or standard deviation
- Loads the activity and subject data for each dataset, and merges those columns with the dataset
- Merges the two datasets
- Converts the
activity
andsubject
columns into factors - Creates a tidy dataset that consists of the average (mean) value of each
variable for each subject and activity pair.
The end result is shown in the file
tidydata.txt
.