Getting and Cleaning Data

title	author	date	output
README	Richard A.	Saturday, May 23, 2015	html_document

Getting and Cleaning Data

The Course Assignment

The purpose of this project is to demonstrate our ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis. It is required that we submit:

a tidy data set as described below,
a link to a Github repository with your script for performing the analysis
a code book that describes the variables, the data, and any transformations or work that you performed to clean up the data called CodeBook.md.
a README.md (this file) that is in this repo with my script.

The Source Data

One of the most exciting areas in all of data science right now is wearable computing. Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data used for this project represent data collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

The Script

I created one R script called run_analysis.R that does the following.

Reads in the data sets (data folder "UCI HAR Dataset" assumed to be in you working dirrecotry)
Merges the training and the test sets to create one data set.
Uses descriptive activity names to name the activities in the data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Appropriately labels the data set with descriptive variable names.
From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Only full features with mean() or std() were used. Vectors obtained by averaging the signals in a signal window sample were ignored from this summarized data set. It was assumed that the Mean of these measures would be used to show the central tendency for each subject/activity/features. Having the mean of a 'signal window sample' did not make sense.

The Output Tidy Data Set

I tidy data set was created using the above script and meating the project requirement.

A Code Book for this resulting tidy data set can be found in this repo at Codebook.md

For it was assumened that the Wide data format was more appropriate for the subsequent analysis [that is unknown] would be prefered. The wide data file is here tidy_data.csv

A second long version of the file was also created as a back up. In this second file the features were broken up by domain, feature, statistic, and each with 4 measures for magnitude and the XYZ vector components. A code book was not created for this version. The long data file is here tidy_data_long.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data

The Course Assignment

The Source Data

The Script

The Output Tidy Data Set

About

Uh oh!

Releases

Packages

Languages

Name	Name	Last commit message	Last commit date
Latest commit History 16 Commits
Codebook.md	Codebook.md
README.md	README.md
run_analysis.R	run_analysis.R
tidy_data.csv	tidy_data.csv
tidy_data_long.csv	tidy_data_long.csv

Search code, repositories, users, issues, pull requests...

brashley/Tidy-Data-Example

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data

The Course Assignment

The Source Data

The Script

The Output Tidy Data Set

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages