Difference between revisions of "Atom: Analysis"
Sh.yan.2012 (talk | contribs) |
Sh.yan.2012 (talk | contribs) |
||
Line 23: | Line 23: | ||
| style="border-bottom:2px solid #33a048; border-top:2px solid #33a048; background:#33a048;" width="1%" | | | style="border-bottom:2px solid #33a048; border-top:2px solid #33a048; background:#33a048;" width="1%" | | ||
|} | |} | ||
+ | |||
+ | =Data Cleaning and Explorations= | ||
+ | |||
+ | The data we received from MRC was site based and split up into individual excel files with a lot of unnecessary data. After Exploratory Data analysis there is a need to transform the time-based data into appropriately time stamped time series data in order to perform further analysis. For our group we utilized SQL Server Integration Services 2010 to look through all excel files and extract relevant data, as we were comfortable using this software from previous projects. | ||
+ | |||
+ | ==Filtering and extracting data== | ||
+ | |||
+ | There were many variables in the excel sheet that was not helpful for our phase 2 analysis. We have decided on using 6 variables for our analysis, which are the most relevant to what we would like to analyze. The variables are peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. We also filtered out 112 Katong as it was a pilot site and there were many missing data. | ||
+ | |||
+ | ==Combining Data== | ||
+ | |||
+ | As the data we received from MRC was site based and split up into individual excel files, there is a need for us to combine all the sites together after filtering and extracting data from individual excel files. This file, includes attributes such as time, car_park, total_lots, peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. There are a total of 28 sites that we plan to carry out our analysis. | ||
+ | |||
+ | ==Recoding Time== | ||
+ | |||
+ | As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server. |
Revision as of 20:06, 28 February 2016
![]() |
![]() |
![]() |
![]() |
![]() |
Contents
Data Cleaning and Explorations
The data we received from MRC was site based and split up into individual excel files with a lot of unnecessary data. After Exploratory Data analysis there is a need to transform the time-based data into appropriately time stamped time series data in order to perform further analysis. For our group we utilized SQL Server Integration Services 2010 to look through all excel files and extract relevant data, as we were comfortable using this software from previous projects.
Filtering and extracting data
There were many variables in the excel sheet that was not helpful for our phase 2 analysis. We have decided on using 6 variables for our analysis, which are the most relevant to what we would like to analyze. The variables are peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. We also filtered out 112 Katong as it was a pilot site and there were many missing data.
Combining Data
As the data we received from MRC was site based and split up into individual excel files, there is a need for us to combine all the sites together after filtering and extracting data from individual excel files. This file, includes attributes such as time, car_park, total_lots, peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. There are a total of 28 sites that we plan to carry out our analysis.
Recoding Time
As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.