Getting-and-Cleaning-Data/CodeBook.md at master · mogrein/Getting-and-Cleaning-Data

Step of the work of the script:

Reads the train data from the "train" folder
Reads the test data from the "test" folder
Merges the train data with test data to dataX, dataY and dataSubject variables
Cleans memory from test and train data, as it wouldn't be used anymore
Further it assignes names to data columns:
For dataX it uses names from features.txt
For dataY it uses name "activity". Script also substitutes actual names of activities from activity_labels.txt instead of coded id's
For dataSubject script uses name "subject"
Script derives indexes of columns with -std() or -mean() in their name, using grep on the array on names of dataX.
Finaly script bind columns from dataY, dataSubject and columns from dataX, that have -std() or -mean() at their name.
The dataframe from step 7 is put into tidy_data_means_and_stddevs.txt using write.table
dataY, dataSubject and dataX are binding to one data.frame
using aggregate function this script separates data using activity and subject values and applys mean function to the subsets.
The result is saved to tidy_data_means_of_all_columns.txt with write.table function

Data in tidy_data_means_and_stddevs.txt is 10299x81 dataframe of the following structure:

The first column is activity - type = "charaster" Contains the following values:
- WALKING
- WALKING_UPSTAIRS
- WALKING_DOWNSTAIRS
- SITTING
- STANDING
- LAYING
The second column is subject number - type = "integer". Contains values from 1 to 30
All other column are means and standart deviations of various measures. They have "numeric" data type.

Data in tidy_data_means_of_all_columns.txt is 180x563 dataframe. The structure of almostly the same:

The first column is activity - type = "charaster" Contains the following values:
- WALKING
- WALKING_UPSTAIRS
- WALKING_DOWNSTAIRS
- SITTING
- STANDING
- LAYING
The second column is subject number - type = "integer". Contains values from 1 to 30
All other column various measures. They have "numeric" data type.

For the meaning of numeric measures you should read http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones and features-info.txt from the archive with data.

Provide feedback