Atom FinalWiki

From Analytics Practicum
Revision as of 01:43, 16 April 2016 by Sh.yan.2012 (talk | contribs)
Jump to navigation Jump to search
AtomTeamLogo.jpg


AtomHome.png

Home

  AtomTeam.png

Team

  AtomProjectOverview.png

Overview

 


AtomDocumentation.png

Documentation

  AtomAnalysis.png

Analysis

 


ABSTRACT

In Singapore, roads today already take up 12% of its total land area and with the limited land available. Singapore cannot afford to exhaust its land area by building more roads to accommodate vehicles and further expand the road network.

Parking space is simply the provision for the storage of vehicles. Car parks can be provided in a variety of land uses ranging from residential to shopping centers. Furthermore, car park can cause a serious impact on aesthetics whether it is on street, or in multi-storey aboveground or underground structures. These car parks consume both land and resources, that might be put to better usage in other areas, for instance, building another development or private homes.

A strategic approach to parking would connect the separate decisions of parking provision at individual sites with the achievement of wider planning goals. For instance, saving the land for other usage. A poor planning in car park would result in jams, bad traffic management and causing overspill at the surrounding areas. This is avoidable only if appropriate planning process is in placed, it helps to determine the future parking arrangement associated hence, preventing it to cause unnecessary headaches for the drivers. The main concern of planning parking activity will take note of the ways land and natural environments are conserved, valued, developed or organized using geographical understanding.

Data mining is the computational process of discovering patterns in large datasets, also known as “big data”. Whereas, for our project, the data collected are in time-series format. And time-series data is considered to be multidimensional data, as there is one observation per time unit and each time unit represents a dimension.

Parking utilization provides a time-series of typical parking demand for the development in that area that parking day. Thus, by comparing parking utilization comprehensively, the study will be able to clearly identify patterns and trends of those high and low usage car parks.

Hence, this paper seeks to explore using time-series data mining techniques to discover patterns and trends of similar car park sites within 29 shopping retail malls.


INTRODUCTION

Parking requirements are the exclusive domain of local government and it is subjected to their concerns. Minimally parking requirements include four important elements, (1) the land use for the parking, (2) the car park ratio with regard to the size of the development, (3) taking into consideration of the demand and supply for the car park lots and (4) the car parks surrounding will also influence the demand required (Marsden, 2006).

Cities created off-street parking to ensure that new developments have sufficient space and ample parking (Barter, 2010). A lack of parking will result in generating traffic congestion and causing car to park and spill over to the surrounding areas. Therefore, car park planners and public officials must be able to accurately estimate the number of parking lots required in an amenities to eliminate these parking issues (Bartner, 2010).

In Singapore, roads today already take up 12% of its total land area and with the limited land available. Singapore cannot afford to exhaust its land area by building more roads to accommodate vehicles and further expand the road network.

With the increase of Singapore population in the recent years, the scarcity must be allotted wisely. As Singapore continues to grow as a city, there is a need to increase the supply for housing, industrial and office estate. Therefore, it is not a realistic plan for every Singaporean household to own a car (LTA, 2012).

Having said that, car is not a basic necessity in Singapore since public transportation is well developed and easily accessible. However, Singaporeans seem to think otherwise as the number of households in Singapore that own a car increased to 45% in 2013 from 40% in 2008. In order to curb the amount of car ownership and to ensure the roads is smooth flowing and congestion-free, the authority affirmed that it would continue to emphasize the vehicle ownership and usage restraint measure.

Since 1990, the Certificate Of Entitlement (COE) system has enabled Singapore to exercise effective control of vehicle population growth. As Singapore becomes more urbanized, the social cost of car ownership will also increase. This is because land has to be set aside for parking spaces at not only where we reside, but also at the places where we work, study and play. Allocating more land for car parks means that there is less land for other developments, such as housing, schools or healthcare facilities. On top of that, illegal parking and congestion in local neighborhoods may also become more prevalent.

With these considerations in mind, the authority would like to understand the car park occupancy situation in Singapore. Thus, the authority has requested a consultancy firm to find out more through site survey and observation in theses car parks. The information collected is transformed into knowledge with the help of the team assisting the consultancy firm in producing detailed reports, info-graphics and consolidated data for each car park sites to summarize the findings. This information is crucial to authority, as it will help it to better forward plan and handle the car park issues in Singapore.

Apart on assisting the consultancy firm in report the independent car park site situation, the team will explore and demonstrate the effective use of time-series data mining in analyzing complex data. This research study will help the authority to discover new insights on several clusters of shopping malls that are grouped together based on their similar characteristics through utilizing the car park occupancy.

Car Park Sites

The consultancy firm had completed the data collection process and compiled the results. Its primary focus is to work on the analysis and to report the findings of the 65 car park sites. Additionally, they had also created clusters by grouping the nearby car park sites together. For instance, the car parks of Punggol Plaza and Punggol 21 CC are grouped together, as they are geographically located next to each other.

The allocation of the reports required for all of these car park sites are as shown below:

AtomFT1.png



As of the initial meeting on 30th December 2015, the consultancy firm had completed and submitted 10 reports to the authority. Hence, there are 45 outstanding reports, info-graphics and excel files need to be worked on. the consultancy firm’s submission deadline for the reports is 31st January 2016. Therefore, the team’s initial job scope is to assist the team in the consultancy firm in meeting its submission deadline.

After the completion of phase 1, the team will further process the data collected to coming up with new insights. Additional initiatives will include the comparison of the different car park sites, as well as, a national average representation. This will allow the business owner, to better understand the situation nationwide rather than looking at each car park site’s situation independently. Additionally, the team will also conduct a focus study using time-series clustering on shopping mall car park sites. The team would like to explore and demonstrate with the use of time-series analysis to group similar characteristics shopping mall car park sites together based on their car park occupancy. And lastly, sharing the findings and evaluate the accuracy of the analysis by linking back to the real world.

LITERATURE REVIEW

Parking Policy

One of the most important links between land-use and transport is parking policy. The effectiveness of parking policies are often compromised due to the perceived tension among three of the objectives that parking supports: regeneration, restraint and revenue. In particular, the belief that parking restraint measures could potentially damage the attractiveness of city centers in both retail and commercial enterprises and this limits the political acceptability of pricing policies and planning (Strubbs, 2012).

Parking space is simply the provision for the storage of vehicles (Dolnick, 1999). Car parks can be provided in a variety of land uses ranging from residential to shopping centers. Furthermore, car park can cause a serious impact on aesthetics whether it is on street, or in multi-storey aboveground or underground structures. These car parks consume both land and resources, that might be put to better usage in other areas, for instance, building another development or private homes.

A strategic approach to parking would connect the separate decisions of parking provision at individual sites with the achievement of wider planning goals. For instance, saving the land for other usage (March, 2007). A poor planning in car park would result in jams, bad traffic management and causing overspill at the surrounding areas. This is avoidable only if appropriate planning process is in placed, it helps to determine the future parking arrangement associated hence, preventing it to cause unnecessary headaches for the drivers. The main concern of planning parking activity will take note of the ways land and natural environments are conserved, valued, developed or organized using geographical understanding (Aldridge et. al, 2006).

In order to achieve desirable arrangements on land use for car park, planning is essential and must establish through reiterate rules, goals, standards, designs and decision systems. In this sense, there is need for us to examine existing understanding on parking issues, as the first step to re-consider the manner in which collective action might be taken on the basis of this knowledge. Usually, these information are in the form of spatio-temporal characteristics, hence, data mining techniques are applied in order to explore the insights of car park overspill pattern.

Time-Series Data mining

Data mining is the computational process of discovering patterns in large datasets, also known as “big data”. Conventional data mining is also known as Knowledge Discovery in Database (KDD). The objective of KDD process is to extract information and transform it into knowledge, an understandable structure for future use by the business users (Frawley et. al, 1992). There are three main types of data mining techniques, which are the association rules, classification and statistical. Association rule is used to discover relations between variables in a large dataset. Classification is a data mining function that helps to generate a set of rules for classifying instances into predefined classes. Lastly, statistical data mining is driven by the data to discover new patterns and build predictive models. Although these conventional data mining techniques are broadly used by many industries, they are not appropriate for performing data mining on time-series data. Hence, another set of data mining techniques is developed to cater for time-series data, which is time series data mining (Fuller, 1995).

Time-series data mining has four major tasks: clustering, indexing, classification and segmentation (Harvey, 1994). Firstly, clustering helps to find various time-series data of the similar patterns and grouping them together. Next, indexing finds other similar time series data in order, given in a query series. Thirdly, classification assigns each time series to a known category by using the trained model that was established earlier. And lastly, segmentation separate and partitions the time series. Time-series data is considered to be multidimensional data, as there is one observation per time unit and each time unit represents a dimension. In the real world, each time series data is usually highly dimensional (Lee et. al, 2014). For instance, in a stock market setting, the data which the prices change over time can be recorded every second. In the other words, it will accumulate to be 3,600 records an hour and 86,400 records a day.

Through the collection of data on a routine basis, organizations are amassing sequentially order data. Observation and records in such dataset possess a time element in it. Accordingly, these information are collected over a period of time (i.e. over a day, a week or even up to a decade). Examples of such data types include Sales transaction, delivery orders, traffic information and etc. Over the years, businesses and organizations increasingly start to realize the importance and valuable of these data. They seek to analyze these time-series data to discover more business insights to help them improve and grow.

While presenting the data, data analyst has to put in the extra effort of transforming these high-dimensional data from time-stamped transaction into a table that is suitable for time-series application. This will help applications to identify these data as time-series data and perform further analysis and pattern detection on theses data. Data analyst has to ensure that the time-series data is transformed into a set of contiguous time instance, whereas, previously it is used to be univariate or multivariate data type.

One of the common mistakes in analyzing time-series data is that the time-stamped data is irregularly recorded, this will result in two different time-series data has been identified to have the common trends (Esling & Argon, 2012). In the case of two time-series did not occur concurrently, the application of time-series data mining techniques would not be able to discern the relationship, as time is no longer a factor in that comparison.

AtomFF1.png



For example, based on the two figures shown above, Fig 1a represents the traditional data mining similarity measure using Euclidean distance. It is used to compare the similarity between the two time-series Q and C, and it is shown that the relationship is not discerned as both of them are out of phase. However, in Fig 1b, using Dynamic Time Wrapping (DTW) technique, it has overcome this issu by accounting for the time factor when comparing the two different time-series.

The development of DTW algorithm helps to identify the similar treands that may occur over the time period across multiple arrays of sequenced data. This mathematical formal serves very well as an effective data mining technique when algorithmically comparing the different sets of time-ordered data. DTW has offered a better means of identifying similar trends across sets of sequenced records and observations.

Hence, this paper seeks to explore these time-series data mining techniques to discover patterns and trends of similar car park sites within 29 shopping retail malls.


TOOLS & TECHNOLOGY

Analysis & Visualization Tools

The team will be use 4 tools for analysis and visualization, (1) Microsoft Excel, (2) SAS Enterprise Miner, (3) Microsoft SQL Server Integration Services (SSIS) and (4) JMP Pro.

Microsoft Excel is a spreadsheet application that is designed for calculation, graphing charts and visual aids, and pivot tables. In this case, the team will be using it to analyze the data for phase 1, generating charts and calculation for each car park sites.

SAS Enterprise Miner is analytical software that helps to streamline and simplify the data mining process. This will allow the easy retrieval of the datasets and perform analysis. Additionally, SAS Enterprise Miner allows user to perform descriptive, predictive and time-series analysis on huge amount of data. The software also has interactive visualization function and ease-to-use user interface that help to perform most of the task by drag and drop functionality.

Microsoft SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. Integration services will allow user to extract, transform the data and load it onto the database.

JMP Pro is the advanced version of JMP, it is created for users who need sophisticated modeling techniques. JMP Pro is a statistical analysis software from SAS that provides a platform for interactive data visualization, exploration, analysis and communication.

For this project, the team will be using these analytical tools to gather and discover new insights of the car park overspill patterns.

Reporting Tools

Apart from analysis and visualization tools, the team will be using 2 software for reporting, namely (1) Microsoft Word, a word processor software, and (2) Microsoft PowerPoint, a slideshow presentation software.

Collaboration tools

Lastly, the team are using (1) Dropbox, file hosting services, (2) Google Drive, cloud storage web application and (3) SMU Wikipedia, a SMU encyclopedia web page that is built for collaborations, to collaborate within the team and external stakeholders too.