Difference between revisions of "Atom: Analysis"

From Analytics Practicum
Jump to navigation Jump to search
Line 23: Line 23:
 
| style="border-bottom:2px solid #33a048; border-top:2px solid #33a048; background:#33a048;" width="1%" |  
 
| style="border-bottom:2px solid #33a048; border-top:2px solid #33a048; background:#33a048;" width="1%" |  
 
|}
 
|}
 +
 +
=Data Cleaning and Explorations=
 +
 +
The data we received from MRC was site based and split up into individual excel files with a lot of unnecessary data. After Exploratory Data analysis there is a need to transform the time-based data into appropriately time stamped time series data in order to perform further analysis. For our group we utilized SQL Server Integration Services 2010 to look through all excel files and extract relevant data, as we were comfortable using this software from previous projects.
 +
 +
==Filtering and extracting data==
 +
 +
There were many variables in the excel sheet that was not helpful for our phase 2 analysis. We have decided on using 6 variables for our analysis, which are the most relevant to what we would like to analyze. The variables are peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. We also filtered out 112 Katong as it was a pilot site and there were many missing data.
 +
 +
==Combining Data==
 +
 +
As the data we received from MRC was site based and split up into individual excel files, there is a need for us to combine all the sites together after filtering and extracting data from individual excel files. This file, includes attributes such as time, car_park, total_lots, peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. There are a total of 28 sites that we plan to carry out our analysis.
 +
 +
==Recoding Time==
 +
 +
As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.

Revision as of 20:06, 28 February 2016

AtomTeamLogo.jpg


AtomHome.png

Home

  AtomTeam.png

Team

  AtomProjectOverview.png

Overview

  AtomDocumentation.png

Documentation

  AtomAnalysis.png

Analysis

 

Data Cleaning and Explorations

The data we received from MRC was site based and split up into individual excel files with a lot of unnecessary data. After Exploratory Data analysis there is a need to transform the time-based data into appropriately time stamped time series data in order to perform further analysis. For our group we utilized SQL Server Integration Services 2010 to look through all excel files and extract relevant data, as we were comfortable using this software from previous projects.

Filtering and extracting data

There were many variables in the excel sheet that was not helpful for our phase 2 analysis. We have decided on using 6 variables for our analysis, which are the most relevant to what we would like to analyze. The variables are peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. We also filtered out 112 Katong as it was a pilot site and there were many missing data.

Combining Data

As the data we received from MRC was site based and split up into individual excel files, there is a need for us to combine all the sites together after filtering and extracting data from individual excel files. This file, includes attributes such as time, car_park, total_lots, peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. There are a total of 28 sites that we plan to carry out our analysis.

Recoding Time

As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.