Difference between revisions of "Atom: Analysis"

From Analytics Practicum
Jump to navigation Jump to search
Line 40: Line 40:
 
As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.
 
As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.
  
[[File:AtomA01.png |200px|center]]
+
[[File:AtomI01.png|center]]
 +
<br>
 +
[[File:AtomI02.png|center]]
 +
<br>
 +
[[File:AtomI03.png|center]]
 +
<br>
 +
Figure above shows that there are unnecessary rows and columns of data as they are empty. Figure 4 below shows that the recoded data after cleaning has been done. <br>
 +
[[File:AtomI04.png|center]]
 +
 
 +
=Initial Approach=
 +
 
 +
Initially our approach was to manually group the parking establishments by region before doing time series analysis. However after we consulted with Professor Kam on Feb 18, the errors of our method was highlighted to us:
 +
The dataset should be telling us what are the groups and patterns instead of us manually deciding how to segregate the data. <br>
 +
 
 +
=Revised Analysis Approach=
 +
 
 +
==Time Series Methodology==
 +
 
 +
We utilized SAS Enterprise Miner, which simplifies time series data mining for huge amounts of data. Additionally Enterprise Miner implements Dynamic Time Warping which is an algorithm for measuring similarity between two times based sequences which might initially vary, Enterprise Miner can identify patterns and similarities by shifting time series against each other. <br>
 +
[[File:AtomI05.png|center]]

Revision as of 23:30, 28 February 2016

AtomTeamLogo.jpg


AtomHome.png

Home

  AtomTeam.png

Team

  AtomProjectOverview.png

Overview

  AtomDocumentation.png

Documentation

  AtomAnalysis.png

Analysis

 

Interim Analysis

Data Cleaning and Explorations

The data we received from MRC was site based and split up into individual excel files with a lot of unnecessary data. After Exploratory Data analysis there is a need to transform the time-based data into appropriately time stamped time series data in order to perform further analysis. For our group we utilized SQL Server Integration Services 2010 to look through all excel files and extract relevant data, as we were comfortable using this software from previous projects.

Filtering and extracting data

There were many variables in the excel sheet that was not helpful for our phase 2 analysis. We have decided on using 6 variables for our analysis, which are the most relevant to what we would like to analyze. The variables are peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. We also filtered out 112 Katong as it was a pilot site and there were many missing data.

Combining Data

As the data we received from MRC was site based and split up into individual excel files, there is a need for us to combine all the sites together after filtering and extracting data from individual excel files. This file, includes attributes such as time, car_park, total_lots, peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. There are a total of 28 sites that we plan to carry out our analysis.

Recoding Time

As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.

AtomI01.png


AtomI02.png


AtomI03.png


Figure above shows that there are unnecessary rows and columns of data as they are empty. Figure 4 below shows that the recoded data after cleaning has been done.

AtomI04.png

Initial Approach

Initially our approach was to manually group the parking establishments by region before doing time series analysis. However after we consulted with Professor Kam on Feb 18, the errors of our method was highlighted to us: The dataset should be telling us what are the groups and patterns instead of us manually deciding how to segregate the data.

Revised Analysis Approach

Time Series Methodology

We utilized SAS Enterprise Miner, which simplifies time series data mining for huge amounts of data. Additionally Enterprise Miner implements Dynamic Time Warping which is an algorithm for measuring similarity between two times based sequences which might initially vary, Enterprise Miner can identify patterns and similarities by shifting time series against each other.

AtomI05.png