Step of the work of the script:
- Reads the train data from the "train" folder
- Reads the test data from the "test" folder
- Merges the train data with test data to dataX, dataY and dataSubject variables
- Cleans memory from test and train data, as it wouldn't be used anymore
- Further it assignes names to data columns:
- For dataX it uses names from features.txt
- For dataY it uses name "activity". Script also substitutes actual names of activities from activity_labels.txt instead of coded id's
- For dataSubject script uses name "subject"
- Script derives indexes of columns with -std() or -mean() in their name, using grep on the array on names of dataX.
- Finaly script bind columns from dataY, dataSubject and columns from dataX, that have -std() or -mean() at their name.
- The dataframe from step 7 is put into tidy_data_means_and_stddevs.txt using write.table
- dataY, dataSubject and dataX are binding to one data.frame
- using aggregate function this script separates data using activity and subject values and applys mean function to the subsets.
- The result is saved to tidy_data_means_of_all_columns.txt with write.table function
Data in tidy_data_means_and_stddevs.txt is 10299x81 dataframe of the following structure:
- The first column is activity - type = "charaster" Contains the following values:
- WALKING
- WALKING_UPSTAIRS
- WALKING_DOWNSTAIRS
- SITTING
- STANDING
- LAYING
- The second column is subject number - type = "integer". Contains values from 1 to 30
- All other column are means and standart deviations of various measures. They have "numeric" data type.
Data in tidy_data_means_of_all_columns.txt is 180x563 dataframe. The structure of almostly the same:
- The first column is activity - type = "charaster" Contains the following values:
- WALKING
- WALKING_UPSTAIRS
- WALKING_DOWNSTAIRS
- SITTING
- STANDING
- LAYING
- The second column is subject number - type = "integer". Contains values from 1 to 30
- All other column various measures. They have "numeric" data type.
For the meaning of numeric measures you should read http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones and features-info.txt from the archive with data.