GitHub

#An explanation of the run_analysis.R script

##Part 1 ###Here the training and test sets are extracted into separate data frames:

The test and trainining activity data sets are then read into the test_data_activities and train_data_activities data frames
The column within each of the test_data_activities and train_data_activities data frames are renamed to 'activity_id'
The test and training subject data are then extracted into the test_data_subject and train_data_subject data frames
The column within each test_data_subject and train_data_subject data frames are renamed to 'subject_id'
The activity column labels are extracted and stored within the activity_labels data frame
The features column labels are extracted and stored within the features data frame
The columns in the activity_labels data frame are renamed to something more meaningful (activity_id and activity)
The columns in the features data frame are renamed to something more meaningful (feature_id and feature)

##Part 2 ###Here, descriptive activity names are used to name the activities in the data set

A column that contains the test activity description based on the activity_id is added to the test activity data
Then the id column is removed
A column that contains the training activity description based on the activity_id is added to the training activity data
Then the id column is removed
The test_data_subject data frame is merged with the test_data frame
The train_data_subject data frame is merged with the train_data frame
The activities data frame is then merged with the test_data frame
The activities data frame is also merged with the train_data frame
The test and train data frames are then concatenated

##Part 3 ###The data set is appropriately labelled with descriptive variable names.

##Part 4 ###Extract only the measurements on the mean and standard deviation for each measurement.

Data frames to hold the means columns and standard deviation columns are separately set up
The mean measure names only are derived from the existing features list and set as the rows for the mean_col_names data. This is based on mean-based measure containing 'mean()' in the name.
The standard deviation measure names only are derived from the existing features list and set as the rows for the stdev_col_names data. This is based on standard deviation-based measure containing 'sd()' in the name.
The mean and standard deviation measures are stored within separate data frames
Blank data frames are set up for each of the mean and standard deviation measures with the correct number of rows
The mean data are bound columns together
The sd data columns are bound together
The row_num column which was just an initial placeholder to establish the correct number of rows in the data frame is then removed from the column lists in each variable

##Part 5 ###A second, independent tidy data set with the average of each variable for each activity and each subject is then created.

The data.table package is included in the library
The resultant data set from part 3 is converted into a data.table so that we can perform some grouping calculations on the data
A variable is set up to include only the names of the columns for which the means are required
The means are calculated across all numeric columns and grouped by activity and subject
The data is then output to file as ordered data

Name	Name	Last commit message	Last commit date
Latest commit History 17 Commits
Code Book	Code Book
README.md	README.md
X_test.txt	X_test.txt
X_train.txt	X_train.txt
activity_labels.txt	activity_labels.txt
features.txt	features.txt
run_analysis.R	run_analysis.R
subject_test.txt	subject_test.txt
subject_train.txt	subject_train.txt
y_test.txt	y_test.txt
y_train.txt	y_train.txt

Provide feedback