Atom FinalWiki

From Analytics Practicum
Jump to navigation Jump to search
AtomTeamLogo.jpg


AtomHome.png

Home

  AtomTeam.png

Team

  AtomProjectOverview.png

Overview

 


AtomDocumentation.png

Documentation

  AtomAnalysis.png

Analysis

 


Contents

ABSTRACT

In Singapore, roads take up 12% of total land area and with the limited land available Singapore cannot afford to further exhaust land area by building inefficiently utilized parking spaces to accommodate vehicles and waste land space that is already scarce.

Parking space is simply the provision for the storage of vehicles. Car parks take on several different forms, ranging from residential roads to shopping centers. Furthermore, car parks can cause ramifications on aesthetics whether it is on multi-storied aboveground, street or underground structures. These car parks consume both land and resources that would ideally be put to better use in other areas, for instance, building another development or homes.

A more ideal approach to parking would connect the separate decisions of parking provision at individual sites in hopes to achieve of wider strategic goals, for example making use of underutilized car park spaces more efficiently. Car park spaces that are not planned strategically would result in jams, bad traffic management and cause overspill (legal or illegal) in the surrounding areas. The main concern of planning parking activity will take into account the ways land and natural environments are conserved, valued and developed using better geographical understanding.

Data mining is the computational process of discovering patterns in large datasets, also known as “big data”. As for this specific study, the data collected depicts characteristics of time-series data. Time-series data is considered to be multidimensional data as there is one observation per time unit and each time unit represents a dimension.

Parking utilization provides a time-series of typical parking demand for the development in that area that parking day. Thus, by comparing parking utilization comprehensively, the study will hope to clearly identify patterns and trends of the different car parks in Singapore.

The study in this paper seeks to explore using time-series data mining techniques to discover patterns and trends of similar car park sites within 29 unique retail malls which would ideally assist in planning and utilization of car parks in Singapore.

INTRODUCTION

Parking requirements are in the exclusive domain of local government and is subjected to their concerns. Minimally, parking requirements include four important elements (Marsden, 2006)
1. Land use for the parking,
2. Car park ratio with regard to the size of the development,
3. Consideration of the demand and supply for the car park lots and
4. Car parks surrounding influencing the demand required
Cities created off-street parking to ensure that new developments have sufficient space and ample parking (Barter, 2010). A lack of parking might result in traffic congestion and causing cars to park and spill over to the surrounding areas. Therefore, car park planners and public officials must be able to accurately estimate the number of parking lots required in an amenities to eliminate these parking issues (Bartner, 2010).

In Singapore, roads take up 12% of total land area and with the limited land available Singapore cannot afford to further exhaust land area by building inefficiently utilized parking spaces to accommodate vehicles and waste land space that is already scarce.

With the increase of Singapore population in the recent years, the land must be allotted wisely. As Singapore continues to grow as a city, there is a need to increase the supply for housing, industrial and office estate. Therefore, it is not a realistic plan for every Singaporean household to own a car (LTA, 2012).

Having said that, a car is not a basic necessity in Singapore as public transportation is well developed and easily accessible. However, Singaporeans seem to think otherwise as the number of households in Singapore that own a car increased to 45% in 2013 from 40% in 2008. In order to curb the amount of car ownership and to ensure the roads is smooth flowing and congestion-free, the authority affirmed that it would continue to emphasize the vehicle ownership and usage restraint measure.

Since 1990, the Certificate Of Entitlement (COE) system has enabled Singapore to exercise effective control of vehicle population growth. As Singapore becomes more urbanized, the social cost of car ownership will also increase. This is because land has to be set aside for parking spaces at not only where we reside, but also at the places where we work, study and play. Allocating more land for car parks means that there is less land for more meaningful developments, such as housing, schools or healthcare facilities. On top of that, illegal parking and congestion in local neighborhoods may also become more prevalent.

With these considerations in mind, the authority would ideally like to gain more understanding of the car park occupancy situation in Singapore. Thus, the authority has requested a consultancy firm to find out more through site survey and observation in theses car parks. The information collected is transformed into knowledge with the help of the team assisting the consultancy firm in producing detailed reports, info-graphics and consolidated data for each car park sites to summarize the findings. This information is crucial to authority, as it will help it to better forward plan and handle the car park issues in Singapore.

Apart from assisting the consultancy firm in report the independent car park site situation, the team will explore and analyze complex data with effective use of time-series data mining. This research study will ideally help authorities to discover new insights on the different clusters of shopping malls that might be grouped together based on unique characteristics based on site car park occupancies.

Car Park Sites

The consultancy firm had already completed the data collection process and compiled the results. The primary focus was on the analysis and reporting of the findings of 65 car park sites. Additionally, they had also created clusters by grouping nearby car park sites together. For instance, the car parks of Punggol Plaza and Punggol 21 CC are grouped together, as they are geographically located next to each other.

The allocation of the reports required for all of these car park sites are as shown below:

AtomFT1.png



During the initial meeting on 30th December 2015, the consultancy firm had completed and submitted 10 reports to the authority. There were 45 outstanding reports, info-graphics and excel files that still required analysis and reporting. As the consultancy firm’s submission deadline for the reports was 31st January 2016, the team’s initial job scope was to assist the consultancy firm in meeting their submission deadline.

After the completion of phase 1, the team will further process the data collected in order to produce new insights. Additional initiatives will include the comparison of the different car park sites, as well as a national average representation. This will allow the business owners to better understand the situation nationwide instead of only considering each car park site’s situation independently. Additionally, the team will also conduct a focus study using time-series clustering on shopping mall car park sites. The team would like to explore and demonstrate with the use of time-series analysis, to group similar characteristics shopping mall car park sites together based on their car park occupancy. The team will also share the findings and evaluate the accuracy of the analysis, by applying it to real-life scenarios.

Project Objective And Business Problem

The objective of this project is to assist a consultancy firm in understanding the current parking situations in 65 different locations in Singapore. These locations comprise 30 retail malls, 15 Retails and Food & Beverage (F&B) clusters in housing estates, 10 hawker centers, as well as 10 community clubs.

The study was previously conducted by the consultancy firm through parking occupancy surveys, human traffic counts, and interview-based surveys for selected locations at stipulated times. The collected data will be further processed before being submitted to the authority to understand the current parking situation at these locations. The team will be splitting the project into two phases to complete.

In phase 1, the team assisted the consultancy firm to analyze each car park site situation. Each parking site result collected was tabulated into a single Microsoft Excel spreadsheet file according to each survey type. This Excel spreadsheet was used to generate charts and graphs for the info-graphics. Finally, a final report shared the findings of each parking site, including a write-up of the characteristics and methodology of the entire process, all location maps and captured images (if any). The final report adopts to following format:

 1. Executive Summary
 2. Site Background
 3. Site Characteristics
 4. Site Assessment
 5. Survey Deployment Plan
 6. Survey Findings
 7. Conclusion
 8. Appendices
  8.1 Site Map of the parking locations
  8.2 Car park characteristics
  8.3 Pre Survey Observations & Results
  8.4 Info-graphic to summarize the results collected
  8.5 Survey Questionnaire Template

LITERATURE REVIEW

Parking Policy

One of the most important links between land-use and transport is parking policy. The effectiveness of parking policies are often compromised due to the perceived tension amongst three of the objectives that parking supports: regeneration, restraint and revenue. The belief that parking restraint measures could potentially damage the attractiveness of city centers in both retail and commercial enterprises limits the political acceptability of pricing policies and planning (Strubbs, 2012).

Parking space refers to the provision for the storage of vehicles (Dolnick, 1999). Car parks can be provided in a variety of land uses, ranging from residential to shopping centers. Car parks can have a high impact on aesthetics depending on its locations on the street, or in multi-story aboveground or underground structures. These car parks consume both land and resources that might be put to better use in other areas, such as building other developments or private homes.

A strategic approach to parking would connect the separate decisions of parking provision at individual sites with the achievement of wider planning goals. One example would be to save the land for other usage (March, 2007). Poor car park planning would result in jams, bad traffic management and overspill within the surrounding areas. This is avoidable only if appropriate planning process is in place to determine the future parking arrangements, preventing unnecessary stress to drivers. The planning of parking areas is mainly concerned with the ways that land and natural environments are conserved, valued, developed or organized using geographical understanding (Aldridge et. al, 2006).

To achieve desirable arrangements of land use for car parks, planning is essential and must be established through the reiteration of rules, goals, standards, designs and decision systems. There is thus a need for the team to examine the existing understanding on parking issues. The first step would be to reconsider the manner in which collective action might be taken on the basis of this knowledge. This information usually comes in the form of spatio-temporal characteristics, hence data mining techniques are applied to explore the insights of car park overspill patterns.

Time-Series Data mining

Data mining is the computational process of discovering patterns in large datasets, also known as “big data”. Conventional data mining is also known as Knowledge Discovery in Database (KDD). The objective of the KDD process is to extract information and transform it into knowledge, an understandable structure for future application by tbusiness users (Frawley et. al, 1992). There are three main types of data mining techniques, including the association rules, classification and statistical data mining. The association rule is used to discover relations between variables in a large dataset. Classification is a data mining function that helps to generate a set of rules for classifying instances into predefined classes. Lastly, statistical data mining is driven by the data to discover new patterns and build predictive models. Although these conventional data mining techniques are broadly used by many industries, they are not appropriate for performing data mining on time-series data. Another set of data mining techniques was thus developed to cater for time-series data, known as time series data mining (Fuller, 1995).

Time-series data mining has four major tasks: clustering, indexing, classification and segmentation (Harvey, 1994). Firstly, clustering helps to find various time-series data of the similar patterns and grouping them together. Next, indexing finds other similar time series data in order, given in a query series. Thirdly, classification assigns each time series to a known category by using the trained model that was established earlier. Lastly, segmentation separates and partitions the time series. Time-series data is considered to be multidimensional data, as there is one observation per time unit and each time unit represents a dimension. In the real world, each time series data is usually highly dimensional (Lee et. al, 2014). For instance, in a stock market setting, the data of prices changing over time can be recorded every second. In other words, it could accumulate up to 3,600 records an hour and 86,400 records a day.

Through the collection of data on a routine basis, organizations are amassing sequentially ordered data. Observation and records in such datasets possess a time element. Accordingly, this information gets collected over a period of time (i.e. over a day, a week or even up to a decade). Examples of such data types include sales transactions, delivery orders, traffic information, etc. Over the years, businesses and organizations have increasingly realized the importance and value of this data. They seek to analyze this time-series data to discover more business insights that could help them improve and grow.

While presenting the data, data analysts have to put in the extra effort of transforming the high-dimensional data from time-stamped transaction into a table suitable for time-series applications. This will help applications to identify this data as time-series data and perform further analysis and pattern detection on theses data. Data analysts have to ensure the time-series data is transformed into a set of contiguous time instance, when it was previously either a univariate or multivariate data type.

One of the common mistakes in analyzing time-series data is that the time-stamped data is irregularly recorded, resulting in two different time-series data identified to have common trends (Esling & Argon, 2012). In cases where two time-series did not occur concurrently, the application of time-series data mining techniques would not be able to discern the relationship, as time was no longer a factor in that comparison.

AtomFF1.png



For example, based on the two figures shown above, Figure 1a represents the traditional data mining similarity measure using Euclidean distance. It is used to compare the similarity between the two time-series Q and C, and it is shown that the relationship is not discerned as both of them are out of phase. However in Figure 1b, using the Dynamic Time Wrapping (DTW) technique has overcome this issue by accounting for the time factor when comparing the two different time-series.

The development of DTW algorithm helps to identify the similar trends that may occur over the time period, across multiple arrays of sequenced data. This mathematical formula serves well as an effective data mining technique when algorithmically comparing the different sets of time-ordered data. DTW has offered a better means of identifying similar trends across sets of sequenced records and observations.

This paper seeks to explore these time-series data mining techniques to discover patterns and trends of similar car park sites within 29 shopping retail malls.

TOOLS & TECHNOLOGY

Analysis & Visualization Tools

The team will use four tools for analysis and visualization, which are
(1) Microsoft Excel,
(2) SAS Enterprise Miner,
(3) Microsoft SQL Server Integration Services (SSIS) and
(4) JMP Pro.

Microsoft Excel is a spreadsheet application designed for calculation, graphing charts and visual aids, as well as pivot tables. In this case, the team will be using it to analyze the data for phase 1, generating charts and calculation for each car park sites.

SAS Enterprise Miner is analytical software that helps to streamline and simplify the data mining process. This will allow the easy retrieval of the datasets and perform analysis. Additionally, SAS Enterprise Miner allows users to perform descriptive, predictive and time-series analysis on large amounts of data. The software also has interactive visualization function and an ease-of-use user interface that help to perform most of the tasks by the drag and drop functionality.

Microsoft SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. Integration services will allow user to extract, transform the data and load it onto the database.

JMP Pro is the advanced version of JMP, created for users who need sophisticated modeling techniques. JMP Pro is a statistical analysis software from SAS that provides a platform for interactive data visualization, exploration, analysis and communication.
For this project, the team will be using these analytical tools to gather and discover new insights of the car park overspill patterns.

Reporting Tools

Apart from analysis and visualization tools, the team will be using two other software for reporting, namely (1) Microsoft Word, a word processor software, and (2) Microsoft PowerPoint, a slideshow presentation software.

Collaboration tools

To collaborate within the team and external stakeholders, the team will use (1) Dropbox, file hosting services, (2) Google Drive, cloud storage web application and (3) SMU Wikipedia, a SMU encyclopedia web page that is built for collaborations.

DATA COLLECTION AND DATASETS

Data Collection

The traffic planners’ objective is to review the current parking situation in four different types of developments in 65 different locations (30 RM, 15 F&B, 10 HC and 10 CC). Due to the different nature of each premise, different methods are used to gather the count data of the vehicles and patrons. Additionally, intercept surveys were carried out to gather sample data. In extremely accessible points, automated counters were deployed to assist the count process. The entire data collection process occurred between May 2015 and October 2015.

Each dataset contains the vehicle and human count of the particular premises. For instance, in the Retail Mall settings there were a handful of enumerators deployed at the entrance point and the exit point of the building, as well as the car park. These enumerators were deployed in pairs or trios. There was a pair of enumerators in charge of counting the number of people entering and exiting the building. One of them was responsible for counting the number of people entering the building while the other was responsible for the outbound count. Next, a trio of enumerators was deployed to count the vehicles, including motorcycle. One was assigned to count the number of vehicles entering the premises, while the other was in charge of counting the outbound vehicles exiting the premises. The last enumerator was tasked with finding out the overspill demand by counting the number of vehicles queuing to enter the car park and noting down the number of vehicles parking or waiting illegally along the side streets. The same data collection process was done for both the Community Centers settings and Retails and F&B clusters.

As for the Hawker Centers, the trio enumerators that were counting the vehicles entering and exiting followed the same process mentioned in the previous paragraph. The pair of enumerators responsible for the human count patrolled the Hawker Centers instead of being stationed at the entrance or exit point. One of them was assigned to count the number of seated patrons while the other counted the number of patrons queuing at the stalls. The enumerators made their rounds every 15 minutes to count the human occupancy of the Hawker Centers.

This data were collected between 10am to 9pm, with each data being recorded in blocks of 15 minutes. Within a one-hour period there would thus be four records (12pm, 12.15pm, 12.30pm and 12.45pm) documented. Lastly, the data collection process for each car park site locations lasted for two days; one on a weekday (non-peak day) and one on a weekend (peak day).

Additionally, a dedicated team of people was deployed on the ground carrying out intercept survey, interviewing patrons and collecting survey results. All this information had been compiled into a single spreadsheet for the core project team members to analyze before they reported their findings to the authority.

Datasets

Pre-Survey Report

Pre-surveys reports are used to collate each site’s information, such as its unique characteristics and assess eligibility. This will allow the team to better understand the surrounding of that particular premise and the nature of business and uniqueness of that premise. Hence, it would help determine the most appropriate survey methodology to achieve results.

Human Count

The enumerators will count the total number of patrons and passenger(s) on board of the vehicle. With this information, the team will be able to determine the total number of human entering the premises during that particular time.

Vehicle Count

Likewise for the total vehicle count, the results gathered from the enumerators show the number of inbound and outbound vehicles, roadside parking and overspill count.

Interview Survey

As for the interview survey, demographic profiles and travel behavior was recorded. Demographic profiles capture the citizenship, gender, age and ethnicity of the patrons, whereas the travel behavior survey notes the number of patrons visiting the amenities that particular day, their frequency of visit; the main purpose of visitation; duration of visit; their form of commute to the premise; and the number of companions during their visit. For drivers, more information needs to be included, such as their vehicle parking location, their reason for parking there, as well as the accessibility from their parking location to the amenities they are visiting on that particular day.

PHASE 1 ANALYSIS METHODOLOGY

Phase 1 (Jan 2016)

In order to gain insight from the current parking situation at these 65 selected locations, we have to gather all the information collected previously and further process it. The four key components, as mentioned in the previous section, are Pre-Survey Report, Human Count, Vehicle count and Interview Survey.

An illustration of the analytics methodology is shown below:

AtomFF2.png


With that, we will be able to derive both the qualitative and quantitative results findings. Qualitative results help us to gain a better understanding of the main reasons, motivations and behavioral patterns when visiting the premise. Results from survey questionnaires are considered qualitative results findings. On the other hand, quantitative results are facts and figures that quantify data and generalize results from a sample population. As such, the total number of human count, vehicle count, roadside parking count and overspill count are considered quantitative findings.

With the analysis from both the qualitative and quantitative figures, we will be able to draw insights from the current parking situations in that particular car park. Therefore, a report and info-graphics will be created to complement the key findings in each particular parking site. The team will be using Microsoft Excel to work on the analysis and findings, and use other tools like Microsoft Word and Microsoft PowerPoint to create the final report and info-graphics.

Phase 1 Deliverables

The team completed Phase 1 of the project in January, which was to assist the consultancy firm to understand the car park issues in six development sites (AMK hub, AMK hawker centre, Compass Point, Jalan Salang F&B cluster, Rail Mall F&B cluster and Sengkang CC). As mentioned earlier, the consultancy firm has categorized AMK hub and AMK hawker centre to be grouped together due to the close proximity between the two sites. Similarly, Compass Point and Sengkang CC was grouped together to form Sengkang Cluster. As for the deliverables for Phase 1, the team analyzed and completed these deliverables for the consultancy firm:

Excel files

1. AMK Hub.xlsx (7 spreadsheets) 2. AMK Hawker Centre.xlsx (7 spreadsheets) 3. Compass Point.xlsx (7 spreadsheets) 4. Sengkang CC.xlsx (7 spreadsheets) 5. Rail Mall.xlsx (7 spreadsheets) 6. Jalan Salang.xlsx (7 spreadsheets)

Reports

1. AMK Cluster.pdf (82 pages) 2. SengKang.pdf (99 pages) 3. Rail Mall.pdf (87 pages) 4. Jalan Salang.pdf (62 pages)

Info-graphics files

1. AMK Hub.pdf (7 informative poster images) 2. AMK Hawker Centre.pdf (8 informative poster images) 3. Compass Point.pdf (8 informative poster images) 4. Sengkang CC.pdf (9 informative poster images) 5. Rail Mall.pdf (11 informative poster images) 6. Jalan Salang.pdf (11 informative poster images)
The excel files are processed to assist in analyzing and plotting out the charts for the info-graphics. The reports are generated to share and report the insights found in the development. Lastly, the info-graphics documents are prepared to capture and share the main insights from the respective sites.

Phase 1 General Findings

It is concluded that residents living within the region mainly patronize the development cluster. The human traffic profiles show patrons on those development sites to have more often visited the cluster during late afternoon to evening periods, which coincide with after-school hours and the start times of classes, activities held at the cluster and dinner peak periods. Most of the observation gathered is that there are no distinct anomalies to suggest patronage of other purposes, i.e. famous food stalls, mall being a popular shopping location for out-of-towners. 


In general, weekday and weekend parking demand appeared to be similar in the traffic flow pattern during the lunch hour whereas the weekend saw a spike during the dinnertime. For the retail malls, parking occupancy findings show that parking supply for the cluster is sufficient for the current demand and there is spare capacity in the public car parks, which are within 5 minutes’ walk to the development cluster, to handle overspill parking if the situation does occur.

However, for the F&B clusters, the local residents staying in the landed properties dominate 
 roadside parking along the surveyed roads. While it was evident that the illegal parking was partly contributed by the patrons visiting the development, it was also observed by the Site Supervisor that residents contributed their fair share of the illegally parked vehicles as some of the cars were parked throughout the survey hours. 


The team felt that the one-day survey for a weekday and a weekend provided had limited insight into the traffic patterns and trends of a particular development. The team also held the opinion that the survey results are too focused on individual sites, thus was a need to draw more insights on the similarity of the car parks. This was carried out in Phase 2 of the project.  

PHASE 2 ANALYTICS METHODOLOGY

Phase 2 (Feb 2016 – Apr 2016)

The team will utilize the raw data provided by the consultancy firm to do more a in-depth analysis than what was required in Phase 1 of the project. Phase 1 did more of a summary of each parking site individually. In Phase 2, we proposed time series data mining to identify and analyze any patterns that might exist between different parking sites. For example, perhaps there might be a similarity between a retail mall in the east and west that might not be obvious from a superficial view of the data. Time series dating mining allows us to represent a collection of data obtained over a period of time, which allows us to view the shape of the data over time. With that, the team will perform a further analysis based on the site characteristic of the different clusters.

The team would extend the scope of work of the project to include comparison with other car park sites in Phase 2. For instance, comparing the car park occupancy of AMK hub (Retail Mall) with NEX (Retail Mall) over the weekend. The purpose of this feature will allow business users to better appreciate the data collected previously and also gain new insights that can be used as a reference in refining its future planning provisions. In order to do this, the team will be using SAS Enterprise Miner programming language to create the visualization for this analysis.

Data Cleaning and Explorations

The data we received from the consultancy firm was site based and split up into individual excel files with a lot of unnecessary data.

AtomFF3.png

Figure 3 above shows a screenshot of the sample data collected. Firstly, the data will display some details on the carpark, time, and the peak day and non-peak day data recorded in accordance to the time. Weekend is represented to be peak day (i.e Saturday and Sunday) and weekday is represented to be a non-peak day (i.e Monday, Tuesday, Wednesday, Thursday and Friday). Each record is captured every 15 minutes, which will include the details of the number of cars in season lots, the number of the cars in the carpark and any overspill car observed.

After some elementary analysis, we realized there was a need to transform the time-based data into appropriately time stamped time series data in order to perform further analysis. For our group we utilized SQL Server Integration Services 2010 to look through all excel files and extract relevant data, as we were comfortable using this software from previous projects.


Filtering and extracting data

There were many variables in the excel sheet that did not prove to be helpful for our Phase 2 analysis. We decided to use six target variables for our analysis, which were the most relevant to what we would like to analyze. The variables are peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. We also filtered out 112 Katong as it was a pilot site and contained a number of missing data.

Combining Data

As the data we received from the consultancy firm was site-based and split up into individual excel files, there was a need for us to combine all the sites together after filtering and extracting data from individual excel files. This file includes attributes such as time, car_park (cross id), total_lots, peak_occupancy, non_peak_occupancy, peak_car_in, non_peak_car_in, peak_car_out, non_peak_car_out. There are a total of 28 sites that we plan to carry out our analysis for.

Recoding Time

As the time given was in ##:##AM/PM format, there was a need for us to recode it into numbers in order for us to run Time Series Analysis on SAS Enterprise Miner. We used SAS Enterprise Guide to recode our time to Time ID starting from 1 before loading the cleaned data into SAS Server.

AtomI01.png


AtomI02.png


AtomI03.png


Figure above shows that there are unnecessary rows and columns of data as they are empty. Figure below shows that the recoded data after cleaning has been done.

AtomI04.png


Data Dictionary

Time: Timeslot ID number
Car_Park: Name of the development car park
Total_Lots: Total number of lots available in the development car park
Peak_Occupancy: Numbers of cars inside the car park during peak day
Non_Peak_Occupancy: Numbers of cars inside the car park during non-peak day
Peak_Car_In: Number of cars going into the car park on a peak day
Non_Peak_Car_In: Number of cars going into the car park on a non-peak day
Peak_Car_Out: Number of cars going out of the car park on a peak day
Non_Peak_Car_Out: Number of cars going out of the car park on a non-peak day

Variable used for the study

Parking utilization provides a time-series of typical parking demand for the development in that area that particular day (Moskovitz & Wheeler, 2011). By comparing parking utilization comprehensively, the study will be able to clearly identify patterns and trends of high and low usage car parks. Additionally, the study will be able to assess the utilization rate of the parking throughout the day.

Various studies have suggested the use of occupancy as a form of measure for the study (Chen, 2014 and Soler, 2015), as it suggests the numbers of cars in the development car park at that point of time. Furthermore, it helps us to understand the utilization of the car park. Development car parks follow a concrete temporal patterns or periodic behavior, hence data mining modeling techniques methods will be able to realize these patterns (Soler, 2015).

As briefly mentioned above, the team has collected the occupancy data for both weekday and weekend for 29 shopping centres’ car parks. These data are recorded periodically in 15-minute intervals for 11 hours, between 10am to 9pm. Data collection is captured for two days (one weekday [off-peak day] and one weekend [peak day]) between May 2015 and October 2015. Hence, with this data the team will be able to examine and understand the car park issue using time-series data mining. The team will be building models through the usage of occupancy_peak variable. As occupancy_peak suggests the shopping center car parks situation during a peak day (weekend).

EXPLORATORY DATA ANALYSIS

Time Series Analysis

The first step in performing a time series analysis is to transform the irregularly recorded time-stamped data into data measured at regular time intervals. For the analysis done in this paper, the time interval is taken at 15 minutes. This accumulation can result in a variety of statistics such as total at each interval, average at each interval, minimum, maximum, etc. Often, there are several cross-sectional available in a time series.

SAS Workflow

AtomFF4.png


The SAS Enterprise Miner workflow that our team implemented consisted of two main steps, namely Time-Series (TS) Data Preparation and TS Similarity nodes.

The TS Data Preparation Node is used mainly to transform the dataset into a readable format that is recognized by the program to time-stamp format in order to further run the TS Similarity node. Additionally the results of the TS Data Preparation node allow the user to analyze the output of the time series data set based on summary statistics.

AtomFF5.png



TS Data Preparation

The TS Data Preparation Node was mainly used to transform the dataset into a readable format recognized by the program to time-stamp format in order to further run the TS Similarity node. Additionally, the results of the TS Data Preparation node allows the user to analyze the output of the time series data set based on summary statistics.

For this paper, we used Development Car Park as our cross ID and Peak Occupancy as target for Peak Occupancy Analysis and Non Peak Occupancy as target for Non Peak Occupancy Analysis. For the variable Development Car Park, an aggregated time series is created using the Time Series Data Preparation (TSDP) node in SAS Enterprise Miner.

TSDP node provides several other techniques, such as: • Creating time series IDs and metadata • Detecting and specifying time intervals • Seasonality information • Start times and end times • Missing value replacement • Differencing • Transforming and transposing time series data

Multiple Time Series Comparison Plot

Represents time-series graph based on multiple car parks.

TSID Map Table

Shows the original dataset with the time series variable plot by CrossID variable that was transposed and converted into 28 time series with 28 unique variables. In the table, each time-series is named TS_n where n is the value of the TSID created for the time-series.

TSIP Map Summary Table

Shows the level, count, and frequency and percentage information for the CrossID and TSID variables. There is a total of 28-development car park for input and target variables, which resulted in a unique combination of 28 TSIDs.

Time Series Metadata Table

Shows the occupancy input data that uses Time as the TimeID variable to analyze data from interval of 1 to 57, which consists of data from 1000hrs to 2100hrs.

Time Series Plot

Represents time-series graph based on a particular car park.

Time Series Summary

AtomI10.png


Show distribution of the occupancy for each car park in a form of bar graphs, which includes max, mean, min and sum.

Time Series Similarity (TSS)

Allows us to analyze similarities between different time series grouped by clusters. Dynamic Time Warping algorithm is applied in Enterprise Miner to match the different lengths of similar time series in one group.

Cluster Constellation Plot

AtomI11.png


Shows simple view of the identified clusters.

Cluster Dendrogram

AtomI12.png


A tree hierarchy displays the steps and how the clusters are actually formed.

Distance Map

AtomI13.png


Shows the time-series data that has been clustered on both axes that provides visual display of similarity between one clusters to every other clusters. Blue color indicates that it is similar and red color indicates that it is dissimilar.

Using the above workflow, we are able to get meaningful insights of occupancy patterns in each clusters group by similar time-series of car parks. Further analysis of TSDP and TSS, which would be discussed in the later section of Advance finding.

FINDINGS AND ANALYSIS

By repeating the steps taken above to load the data into the SAS server, we generated 7 smaller subsets of time series data sets from the main data set. From here our group identified the car parks that belonged to the different clusters, namely:

Peak Occupancy

Occupancy Analysis

AtomFF6.png


There are a total of 7 clusters for the analysis of occupancy.

AtomFF7.png


These 7 clusters were identified using the Cluster Dendrogram where very large merging of clusters is deemed as unsuitable for similarity analysis. As the diagram shows at the cut off there are no large merging of clusters before this point.

AtomFF8.png


The resulting time series graphs show that there might be some similarities between the car parks, however we would not be able to identify them without any further analysis of the Time series.


Cluster 1

AtomI16.png


AtomT01.png


An interesting cluster is cluster 1, the retail malls that belong in this cluster are located within or around Ang Mo Kio, Hougang and Novena, which are very close to each other geographically.

For cluster 1, the main purpose of visit was found to be Food and beverages. For Punggol Plaza, there are many food outlets in it and there were no other choices around that area for residents to dine in. AMK Hub and Heartland Mall has got hawker centres, which are located around the development site. United Square and Square 2 are connected together with Novena Square. However, only United Sqaure and Square 2 offers a variety of food for their patrons to choose.

Cluster 2

AtomI17.png


AtomT02.png


Cluster 2 consists of Anchorpoint, Causeway Point, East Point, JEM, Jurong Point, NEX, NorthPoint, One KM and Tampines One. Majority of the development sites are located near or within bus interchange and MRT stations.
For this retail mall, it is understand that it offers a wide of services, from food and beverages to shopping and retail.
This shows that all these development sites are the concentrated areas of activities for each estate. This also explains why it has a consistently high trend of peak occupancy.

Cluster 3

AtomI18.png


AtomT03.png


Cluster 3 consists of Novena Square, Compass Point and Bedok Point. The purpose of visiting for this 3 malls were found to be mainly for shopping and F&B, with a higher percentage of shopping. It also shows that teenagers mainly visit these shopping centers as Compass Point and Bedok Point consists of tuition centres and Novena Square consists of mainly shops for sports.
Another similarity for this cluster is that, these shopping malls is integrated with MRT station.


Cluster 4

AtomI19.png


AtomT04.png


Cluster 4 consists of Boon Lay Shopping Centre. It is clustered by itself as the development site does not have a development carpark and the carpark used for analysis was an open carpark shared by multiple HDBs.

Cluster 5

AtomI20.png


AtomT05.png


Cluster 5 consists of Century Square, Clementi Mall, Hougang Mall, Seletar Mall and Thomson Plaza. Purpose of visit for these 5 sites are very evenly spread out between shopping, F&B and supermarket. These are characteristics of heartland malls whereby patron visit these retail malls for supermarket purposes. There are consistent increasing trend from morning till afternoon for majoirty of the malls in these clusters and a sharp fall during late afternoon.

Cluster 6

AtomI21.png


AtomT06.png


Cluster 6 consists of Changi City Point, Vivo City, West Mall, and West Coast Plaza. There are churches located very near to all development sites. The peak for these clusters is in the noon and evening where services end. A possible reason for this cluster is that patrons visit this development after working hours, and they spent their dinner there before returning home.

Cluster 7

AtomI22.png


AtomT07.png


Cluster 7 consists of Pioneer Mall. It is clustered by itself as the development site does not have a development carpark and the carpark used for analysis was a multi storey carpark which is being shared with Blk 638A.

From the results we identified a few interesting similarities and insights. Boon Lay Shopping Centre and Pioneer Mall appear to be outliers, two retail malls located in the West of Singapore that are significantly different from the rest of the retail malls recorded in the data set but not similar enough to belong in the same cluster.

Off Peak Occupancy

Off Peak Occupancy Analysis

By repeating the steps taken above to load the data into the SAS server, we generated 6 smaller subsets of time series data sets from the main data set. From here our group identified the car parks that belonged to the different clusters, namely:

AtomI23.png


There are a total of 6 clusters for the analysis of occupancy.

AtomI24.png


These 6 clusters were identified using the Cluster Dendrogram where very large merging of clusters is deemed as unsuitable for similarity analysis. As the diagram shows at the cut off there are no large merging of clusters before this point.

AtomI25.png


The resulting time series graphs show that there might be some similarities between the car parks, however we would not be able to identify them without any further analysis of the Time series.

Cluster 1

AtomI26.png


AtomT08.png


Starts increasing throughout the day from 10:00am onwards. Increasing trend increases by a larger amount starting at 11:30am. At 1:45pm, the non-peak occupancy starts to decrease quite sharply but never reaching back the 11:30am levels.
The non-peak occupancy levels continue falling albeit very slowly until 5:30pm where a spike in occupancy can be observed. The most pronounced of which coming from Vivocity development site while there is a smaller rate of increase for the other 4 developments.
The interesting thing about this pattern is that the dip after 1:45pm is not extremely pronounced, the largest dip is only around 25% of the maximum occupancy (1:15pm compared to 5:15pm). This pattern is unique to this cluster and is not reflected in other clusters where the dip in occupancy is much more pronounced after the meal timings.
This unique cluster behavior is reflected according to the main purpose of visit as per site survey data. Majority of patrons (more than 50%) arrive for Food & Beverage and Shopping. Food & Beverage behavior is reflected in the spikes during meal times and patrons arriving for shopping purposes cause the fall of occupancy to not be very pronounced after meal times as compared to other clusters.
In addition to patron behavior of the developments, these sites are also all located very close to or right beside bus interchanges. This common characteristic throughout developments might be a critical factor that causes the developments to be similar and clustered together.


Cluster 2

AtomI27.png


AtomT09.png


It is found that majority visit retail malls in this clusters for shopping and F&B. According to the time series graph, all retail malls in the cluster starts to gradually increase in the number of occupancy from 11:45pm to 1:00pm before dropping and maintaining at a low level of occupancy. It starts to gradually increase again from 6pm to 8pm. The highest level of occupancy for all the retail malls in this clusters fall during 6pm to 9pm.

The interesting thing about this pattern is that the dip after 1:00pm is quite significant for around 30% of the maximum occupancy and retains a constant decreasing trend in the level of occupancy. This pattern is unique to this cluster and is not reflected in other clusters where the dip in occupancy is lesser after the meal timings. This shows that visitors frequent the retail malls in this cluster during meal-time (i.e. lunch break and end work).

Cluster 3

AtomI28.png


AtomT10.png


Cluster 3 consists of Boon Lay Shopping Centre. It is clustered by itself as the development site does not have a development carpark and the carpark used for analysis was an open carpark shared by multiple HDBs.

Cluster 4

AtomI29.png


AtomT11.png


It is found that majority visits retail malls in this clusters for F&B and other activities (social activities, enrichment class, cinema, and supermarket). According to the time series graph, the peak hours of retail malls in this cluster fall during 12 noon to 12:45pm and 6:45pm to 8:00pm. However, the overall graph of majority retail malls in this cluster is very inconsistent. This might be due to the fact that some of the retail malls in this clusters being right next to each other.
For example, Tampines One and Century square, and Novena Square, Square2 and United Square. As the retail malls are connected and right next to each other, the number of car occupancy of connected retail malls might be dependent with each other.

Cluster 5

AtomI30.png


AtomT12.png


This cluster shows an increase in car occupancy throughout the day from 10:45am onwards. There is an increasing trend from 10:45am to 12:45pm. At 12:45pm, the non-peak occupancy starts to decrease very sharply all the way till 5:45pm before it starts increasing again from 5:45pm to 7:45pm.
The interesting thing about this pattern is that the dip after 12:45pm is very significant for around 70% of the maximum occupancy and it does not retains a constant decreasing trend in the level of occupancy. This pattern is unique to this cluster and is not reflected in any other clusters where the increase in level of occupancy only happens from 10:45am to 12:45pm (breakfast + lunch) and 5:45pm to 7:45pm (dinner).
This shows that there is a high percentage of visitors frequenting the retail malls in this clusters solely for meals. Both malls are located very near to a University (SUTD and NUS) and a Hospital (CGH and NUH). Moreover, shuttle buses are also provided to the retail malls in this cluster during meal time.

Cluster 6

AtomI31.png


AtomT13.png


Cluster 6 consists of Pioneer Mall. It is clustered by itself as the development site does not have a development carpark and the carpark used for analysis was a multi storey carpark which is being shared with Blk 638A.


CONCLUSION

Limitations and Assumptions

Although the team managed to conduct a proper time series analysis with the raw data set provided by the consultancy firm, there were many limitations that might cause issues as well as potential avenues of improvement for further studies.

The team believes there are many ways that may help to improve and strengthen the current project analysis and findings. For instance, increasing the scalability of the datasets without hindering on time and performance will provide a better understanding and greater insights of the car park sites. This will also improve the team effort.

One of the main limitations includes the limited period of data collection. Instead of only having one day each for peak off peak periods from 1000hrs to 2100hrs, ideally the data should be collected over a few weeks or months. A larger dataset will allow the identification of more seasonal patterns, such as monthly or quarterly patterns. Additionally, instead of just being limited to analyzing 28 retail malls car park sites, we can also include other shopping malls.

The team holds the opinion that the data recorded were en car park and patron demographic information.

Possible Avenue for Future Works

As briefly mentioned earlier, our team analysis was made based on 28 retail malls car park sites. By increasing the number of car park sites and other developments (HDB estate and etc.) in the analysis, it will help the team to further improve on the analysis and gain deeper insights.

Rather than just counting the raw numbers of cars in each lot, a better form of data collection can be to place video cameras to keep track of the duration each car remains in the parking lot. This opens up more avenues for car park analysis and allows better insights in future studies.

Another recommendation that could be implemented for any future studies conducted is to separate development sites based on the results of this study. As already highlighted, there are certain developments that did not include a development car park. This produced the outliers observed above. The presence of these outliers shows that developments with car parks should definitely be conducted in separate studies from developments without car parks.

Apart from having more records, another initiative could be to build an interactive dashboard to visualize the data. This can be achieved by using R programming language.

Summary

In conclusion, we hope that through the work performed, we are able to bring about much needed insights to the parking allocation policies of developments. Through time series data mining, we hope to have adequately highlighted to the local authority several ways to improve the system, though identifying the trends of parking occupancies based on activity or location of developments.

Currently, the authority utilizes a number of retail shop policy, such as development car park being required to meet a minimum number of parking lots based on the number of retail shops within the development. The authority will manually analyse each proposal on a case by case basis and find out if any developments requires additional parking lots. For example, having an an interchange there may require more lots from the developer.

The authority should consider looking into a Varied Pricing system that utilizes time series to predict the demand of parking lots and vary the car park pricing based on projections. Many locations overseas already utilize this system at present, such as San Francisco in the United States where SFPark has been successfully.

However, to carry out this study and accurately project the demand and price level of car parks, further analysis and data collection needs to be carried out. Our group hopes that despite the limitations mentioned above, there is enough support and evidence to justify an investment into this analysis, as parking issues not only cause congestion on the road due to overspill, but also result in potential accidents.

ACKNOWLEDGEMENT

This research was partially supported by Land Transport Authority (LTA) and Mediacorp Research Centre (MRC). We would like to extend our gratitude to our project sponsors, Mr Jason Soriano and Mr Darren Lum, from MRC and kl:kk respectively. They had provided insight and expertise that assisted the research.

We would like to thank Professor Kam Tin Seong, project supervisor, for his kind assistance with analytical tools, insights and guidance that greatly improved the manuscript.

Lastly, we would like to thank 2 “anonymous” reviewers for their insights and assisting the team in writing this research paper. We are also immensely grateful for their comments on an earlier version of the manuscript.


REFERENCES

1. Aldridge, K., Carreno, M., Ison, S., Rye, T., & Straker, I. (2006). Car parking management at airports: A special case?. Transport Policy, 13(6), 511-521.
2. Barter, P. A. (2010). Parking policy in Asian cities. Lee Kuan Yew School of Public Policy Research Paper No. LKYSPP, 10-15.
3. Code Of Practice. (2011). Retrieved January 2, 2016, from https://www.lta.gov.sg/content/dam/ltaweb/corp/Industry/files/VPCOP2011.pdf
4. Chen, X. Parking occupancy prediction and pattern analysis. Technical report, Stanford University, 2014. Machine Learning Final Projects.
5. Department of Infrastructure. (2007). Meeting Our Transport Challenges: Connecting our communities. Melbourne: Department of Premier and Cabinet.
6. Dolnick, F., & Davidson, M. (Eds.). (1999). A Glossary of zoning, development, and planning terms. Chicago: American Planning Association.
7. Fuller, W. A. (1995), Introduction to Statistical Time Series, New York: John Wiley & Sons.

8. Han, J. and Kamber, M. (2001), Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers.
9. Harvey, A. C. (1994), Time Series Models, Cambridge, MA: MIT Press.
10. iitb.vlab.co.in,. (2011). Parking Analysis. Retrieved 14 March 2016, from iitb.vlab.co.in/?sub=42&brch=132&sim=466&cnt=1
11. K.Nakkeeran, S.Garla and G.Chakraborty.(2012). Application of Time-series Clustering using SAS® 
Enterprise MinerTM for a Retail Chain, Proc of SAS® Global Forum 

12. Land Transport Authority (LTA). Land Transport Master Plan, Singapore, 2012
13. Land Transport Masterplan 2013. (2013). Retrieved January 2, 2016, from https://www.lta.gov.sg/content/dam/ltaweb/corp/PublicationsResearch/files/ReportNewsletter/LTMP2013Report.pdf
14. Leonard, M., Sloan, J., Lee, T., & Elsheimer, B. (2010). An Introduction to Similarity Analysis Using SAS®. SAS Institute Inc., Cary, NC.
15. Lee, T., Zhang, R., Xiao, Y., & Dean, J. (2014). Feature Extraction Methods for Time Series Data in SAS® Enterprise Miner™. SAS Institute Inc.
16. Luqman, H B A R & Ying Ying, K (2015) Analysisng Mass Rapid Public Transporation Travel Patterns Of Singapore Through Time-Series Data Mining
17. March, A. (2007). Towards strategic planning for car parking. Working Paper, Urban Planning Programme, Faculty of Architecture, Building and Planning, University of Melbourne. Retrieved from http://www.abp.unimelb.edu.au/files/miabp/5GAMUT2008_JUN_02.pdf
18. Marsden, G. (2006). The evidence base for parking policies—a review.Transport policy, 13(6), 447-457.
19. Moskovitz, D., & Wheeler, N. (2011). Bicycle Parking Analysis with Time Series Photography. Transportation Research Record: Journal of the Transportation Research Board, (2247), 64-71.
20. Nakkeeran, K., Garla, S., & Chakraborty, G. (2012). Application of Time Series Clustering using SAS® Enterprise Miner TM for a Retail Chain, SAS® Global Forum 2012.
21. Parking Studies. (2014, August 5). Retrieved March 14, 2016, from https://www.civil.iitb.ac.in/tvm/1111_nptel/581_Parking/plain/plain.html#table-in-out-parking-survey-solution
22. P.Esling and C.Agon.(2012).Time-Series Data Mining.Institut de Recherche et Coordination, ACM 
Computing Surveys (45), pp.1-4. 

23. Rye, T., & Ison, S. (2005). Overcoming barriers to the implementation of car parking charges at UK workplaces. Transport Policy, 12(1), 57-64.
24. Sheller, M., & Urry, J. (2000). The city and the car. International journal of urban and regional research, 24(4), 737-757.
25. Soler, S. (2015). Creation of a web application for smart park system and parking prediction study for system integration (Doctoral dissertation).
26. S,Schubert and T.Y.Lee.(2011).Time Series Data Mining with SAS Enterprise Miner, Proc of SAS® Global 
Forum. 

27. Schubert, S., & Lee, T. (2011). Time Series Data Mining with SAS® Enterprise Miner. In Proceedings of SAS Global Forum 2011 conference.
28. Stubbs, M. (2012). Car Parking and Residential Development: Sustainability, Design and Planning Policy, and Public Perceptions of Parking Provision.
29. TCRP (2003) ‘Parking Management and Supply: Traveler Response to Transportation System Changes’, Transit Cooperative Research Program Report 95, Chapter 18, Transportation Research Board, Washington D.C.
30. The City o Seattle Depeartment of Transpotration. (2011, August). Performance-Based Parking Pricing Study. Retrieved from http://www.seattle.gov/transportation/parking/docs/SDOT_PbPP_FinRpt.pdf
31. Van de Ven, T., Bakker, B., Koenders, E., & van Vugt, G. (2012). Parckr-Estimating and Forecasting Parking Occupancy Based on Floating Vehicle Data. In 19th ITS World Congress.
32. Vuchic, V. (2000). Transportation for Livable Cities. New Brunswick: Centre for Urban Policy Research.
33. W.Frawley, C.Piatesky-Shapiro and C.Matheus, Knowledge Discovery in Database: Overview, AI Magazine, Fall, page 213-228, 1992

APPENDIX

Car Park Capacity over Volume

AtomFA1.png


AtomFA2.png


Free lots over Occupancy

AtomFA3.png


AtomFA4.png


Percentage Occupancy

AtomFA5.png


AtomFA6.png