This document describes how the R file run_analysis.R (see this directory) works.
##Functions
The file run_analysiscontains three functions.
startTidy- starts the other two functionsgetTidy1- writes the dataset 'tidy1.csv`getTidy2- writes the dataset 'tidy2.csv`
tidy1.csv is the result of a cleaning and selection of the original dataset.
tidy2.csv is the result of an aggregation of tidy1.csv.
The original data was derived from an experiment, which was carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz was captured. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.
For further info please see: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
tidy1.csv consists of the following variables:
subjectdenotes the subject number performing an activity. In total there are 30 subjects.activityshows the activity performed by the subject. There are six activites:
- WALKING
- WALKING_UPSTAIRS
- WALKING_DOWNSTAIRS
- SITTING
- STANDING
- LAYING
From the original 561 features only features of the mean and the standard deviations were selected. There were 79 features left. Variables of the activity and the subjectID were added, making a total of 81 columns.
Original feature names were changed to make the more readable.
- The prefix
tfor "time" was changed intotime. - The prefix
fwas changed intofreq. - BodyBody has been replaced by Body
- "mean" and "std" were changed to capital letters to make reading easier.
Full information on the original features can be found in the original dataset, filename: "features_info.txt".
The observations from tidy1.csv were grouped by subject and activity and then aggregated using the mean function. With 30 subjects and 6 activities we get 180 aggregated observations, described by 81 columns (subject, activity and 79 features of mean and standard deviation).