Difference between revisions of "File:Group3ProjectBanner.PNG"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
<!-- BANNER -->
 
<!-- BANNER -->
[[Image:Group8ProjectBanner.png|1050px|right|width="100%"]]
+
 
 +
 
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
 
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
| style="font-family:Century Gothic; font-size:110%; solid #000000; background:#2B3856; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #000000; background:#6A5ACD; text-align:center;" width="25%" |  
 
;
 
;
[[Group_8_Overview| <font color="#FFFFFF">Proposal</font>]]
+
[[Group_3_Overview| <font color="#FFFFFF">Proposal</font>]]
  
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#6A5ACD; text-align:center;" width="25%" |  
 
;
 
;
[[Group_8_Poster| <font color="#FFFFFF">Poster</font>]]
+
[[Group_3_Poster| <font color="#FFFFFF">Poster</font>]]
  
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#6A5ACD; text-align:center;" width="25%" |  
 
;
 
;
[[Group_8_Application| <font color="#FFFFFF">Application</font>]]
+
[[Group_3_Application| <font color="#FFFFFF">Application</font>]]
  
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#6A5ACD; text-align:center;" width="25%" |  
 
;
 
;
[[Group_8_Report| <font color="#FFFFFF">Report</font>]]
+
[[Group_3_Report| <font color="#FFFFFF">Report</font>]]
 
|}
 
|}
 
<br/>
 
<br/>
Line 23: Line 24:
 
== Background ==
 
== Background ==
  
=== Time Series Explorer ===
+
The Crown Jewel of the Formula One Race Circuit, backdrop of the successful Hollywood Film “Crazy Rich Asian” and the honorable host of the Memorable North Korea-United States Summit, Singapore’s ability to position herself as a neutral yet vibrant destination has led to hordes of visitors setting foot onto her sunny shores. It is no surprise that the tourism sector has been developing into a growth engine for Singapore’s economy . For 2017, Singapore’s tourism sectors attained records highs in both tourists’ arrivals and spending.  According to the data released by Singapore Tourism Board, the number of arrivals increased by 6.2 per cent to $17.4 million, while tourism receipts increased by 3.9 per cent to $26.8 million. The increasing affordability of travel, with the prevalence of low-cost carriers globally, as contribute to the opportunistic trend.<br><br>
 +
Beyond tourism, Singapore is also an ideal venue for the conduit of businesses. Singapore has constantly been ranked as the top few, if not the top, amongst Asian cities for hosting Meetings, Inventive Travel, Conventions & Exhibitions (MICE) events. Its premium geographical location and stable political climates have been the two main reasons for being the prime destination for international MICE events. In 2017, a total of 935 international meetings took place in Singapore.
  
Time series data has always played an important role in understanding and evaluating past behaviours. The usage of time-series information has allowed companies and organizations to tune their operations instead of simply improving by trial and error. Time series data have also been used to evaluate whether an organization has grown throughout the years of operation. The analysis techniques for time-bound data help to describe and explore insights, while predicting the future outcomes such as revenues and profits.
+
== Motivation ==
  
This category of data analysis is not only constrained to the business world. A country's economic data is also a rich pool of information that can be analysed for insights. One of the important example measures that can allow us to get a glimpse would be the Consumer Price Index (CPI). Using this easily available measure, we can explore living standards of a country's citizens by understanding the factors affecting their day-to-day lives.
+
During our exploratory analysis on the data comprising of the tourism arrival into Singapore, we noticed that the arrival patterns of tourists and business travellers from respective countries at heterogenous. A keen understanding to the unique travel behaviours can reveal their travel preference which is essential for local businesses to devise plans to attract more tourism receipts boosting their business revenue. The ability of the analysts to grapple the data and transform the insights into actionable business decision will see their businesses flourishes.
  
=== Use Case - Consumer Price Index ===
+
With the recent completion of Marina Cruise Centre and ongoing construction of Jewel Changi Airport, the tourism receipts are expected to continue to grow steady for the next decade, barring any black swans.
  
The Consumer Price Index (CPI), is a critical indicator to assess a country's consumer price inflation. To profile the weighted average price changes for households' cost of living, Singapore's CPI adopts a fixed basket of residents' commonly consumption goods and services. About 6,600 brands/varieties from 4,200 outlets are selected in the 2014-based CPI (the dataset for our study). At basket level, the composition of goods and services can be categorized into 10 major divisions, which are listed below:
+
== Objectives ==
Food, Clothing & Footwear, Housing & Utilities, Household Durables And Services, Health Care,
+
We aim to build an interactive platform to illustrate the trends and seasonality within given time-series data on Singapore tourism sector. Users can have a better understanding of the Singapore tourism situation over the last ten years.
  
Transport, Communication, Recreation & Culture, Education, Miscellaneous Goods & Services
+
Through this project, we hope that the tourism industry business, especially the small and medium business (check with the SME contribution to Singapore economy or tourism industry) can make optimal marketing solutions and business decision. We attempt to create the platform that assist the business owners and analysts to detect some useful insights from the relationship between travelling revenue and expenditure to promote the economic growth.
 +
 +
* The platform can give us the overview on the visitors’ arrivals pattern by country, age and different transportation methods.
 +
 +
* It also provides the geographic map to illustrate the visitor density among different countries.
 +
 +
* Tourism demand forecasting
  
[[Image:SG_2014_BASED_CPI.png|500px|thumb|right|Figure 1: 2014 CPI Basket Items from Singstat<ref name="singstat">Department of Statistics Singapore. [http://www.singstat.gov.sg/educational-corner/faq-on-CPI], "FAQ on Consumer Price Index (CPI)", Retrieved on 30 November 2017</ref>]]
 
To construct the CPI, two main types of data are required - the sample price data of the good or service, and the weighting data to represent the proportion of different categories' expenditure. The price data is gathered through a combination of data collection modes, while the frequency of price collection depends on the price behavior of the item. The weighting data is derived from the expenditure values collected in the Household Expenditure Survey,  the latest one came from HES 2012/13, updated to 2014 values by taking into account price changes between 2012/13 and 2014.
 
  
As a Price index, CPI can also be affected by other types of costs in Singapore. Indicators like COE Bidding Price<ref name="COE">Yeap, R. [http://tralvex.com/pub/cars/coe.htm], "COE Prices", Retrieved on 30 November 2017</ref>, Import and Export price index<ref name="CEIC">CEICData.com [https://insights-ceicdata-com.libproxy.smu.edu.sg/Untitled-insight/views], "CEIC - A Euromoney Institutional Investor Company", Retrieved on 30 November 2017</ref>, Exchange Rates<ref name="MAS">Monetary Authority of Singapore. [https://secure.mas.gov.sg/msb/ExchangeRates.aspx], "Monetary Authority of Singapore", Retrieved on 30 November 2017</ref>, are assumed to have interesting relationships with CPI which are worthy of further study. These data sources are also acceptable within the proposed system design within certain format restrictions.
+
== Data Source ==
  
 +
The Singapore Tourism Sector data is extracted from CEIC database which is available at:<br>
 +
https://insights-ceicdata-com.libproxy.smu.edu.sg<br><br>
 +
From our perspective, we have selected the five datasets --
  
 +
Arrival by country <br>
 +
Arrival by age <br>
 +
Arrival by transport <br>
 +
Length of stay <br>
 +
Tourism revenue and expenditure <br>
  
 +
The datasets are in either monthly or yearly format, or both. For our system analysis, we plan to use filtered data from 2007 onwards.
  
 +
== Methodology ==
  
 +
==== Exploratory Analysis ====
  
 +
We will explore the different trends of time-series data provided by the various tourism data sets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis. Visualize the time series in the following ways:
  
 +
* Geographic heat map: Visualize the time series by displaying the geographic heat map on the density of visitor arrivals based on the selection of the specific calendar month. 
 +
* Slopegraphs; This visualization technique can provide maximum information with “minimum ink”. It could help us to detect how the number of the visitor changed over the years.
 +
* Waterfall: Rather than the values itself, a waterfall plot tries to bring out the changes in the values. It could provide the overview of the time series line chart along with on how large the difference is between two data points.
  
== Motivation ==
+
==== Explanatory Analysis ====
 +
* Decompose time-series information into its constituent parts: Observation, Seasonal, Trend, Random (Noise). From the separate parts, users can understand the different time-series patterns and derive insights.
 +
* We have many variables(columns) in our dataset, so it is obvious that dimensionality is too high to make effective analysis, and the curse of dimensionality can happen. For this reason, it is important to reduce dimensionality in some way. One of the best approaches is to use time series representations in order to reduce dimensionality, reduce noise and emphasize the main characteristics of time series. In this stage, we would like to do the clustering time series analysis to group the countries with the similar pattern.
  
During our personal data analysis research and experiences, we discovered a lack of freely available analysis tools that can help us optimize the parameter settings of time-series models. The result is a large amount of time and effort utilized to enter in different combinations of parameters and waiting for the models to be trained. The time series system that we have in mind would need to help us estimate the model's accuracy rates automatically while we perform our data analyses, so we can simply choose the best models for comparison.
+
==== Predictive Analysis ====
 +
Time series forecasting is the use of a model to predict future values based on previously observed values.in this case, we would like to use forecasting techniques such as seasonal exponential smoothing and ARIMA to perform prediction. After forecasting analysis, we must compare predicted tourism to real tourism to help us understand the accuracy of our forecasts. Meanwhile, the standard error and other mathematical statistics can be estimated to further verify the forecasting models and help to choose the best one.
  
Our team would also like to create exploratory and predictive models, which showcase these complex time-related trends of Singapore's CPI throughout the years (1990-2017) for different categories. We would like to use time-series visualization techniques such as tables and line charts representing Trend, Seasonality and Random to investigate any insights and to display potential forecasts to the audience.
+
== Application Libraries & Packages ==
 +
{|class="wikitable"
 +
|-
 +
! Package Name !! Descriptions
 +
|-
 +
| ''TSrepr''  || Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also min-max and z-score normalisations, and forecasting accuracy measures are implemented.
 +
|-
 +
| ''ggplot2''  || ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
 +
|-
 +
| ''Cluster''  || Methods for Cluster analysis.
 +
|-
 +
| ''Clustercrit''  || Best criterion returns the best index value according to a specified criterion.
 +
|-
 +
| ''ggmap''  || ggmap is a package to show the spatial data visualization. It can retrieve various online sources (e.g. Google Maps) for user to download and use as layers within the ggplot2 plotting system.
 +
|-
 +
| ''Slopgraph''  || Convert a data frame (containing a panel dataset, where rows are observations and columns are time periods) into an Edward Tufte-inspired "slopegraph" using either base or ggplot2 graphics.
 +
|-
 +
| ''Forecast''  || Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
 +
|-
 +
| ''Tseries''  || Computes the Augmented Dickey-Fuller test for the null that x has a unit root.
 +
|-
 +
|}
  
We will apply system design principles to make the proposed system accept any form of generic time-series data. This is to allow flexibility of the system and expands its scope of usage.
+
== References ==
  
There are several reasons why we found this project interesting:
+
1. Tay, F. (2018, February 12). Tourist arrivals, spending in Singapore hit record high for 2nd straight year; China top source of visitors.<br>
 +
Retrieved October 15, 2018, from https://www.straitstimes.com/singapore/tourist-spending-in-singapore-hit-record-268b-in-2017-china-top-source-of-visitors<br><br>
  
* General exploration of time-series data. The project allows us to learn and re-learn time-series analysis techniques and concepts. This gives us the opportunity to let us put our theoretical knowledge into practical use.
+
2. (2018, June 09). Newsletters Singapore Excels as MICE Destination.<br>
* Closer to our daily standards of living, we wanted to understand what are the current categories that make up a country's CPI index and subsequently its impact on its citizens.
+
Retrieved October 15, 2018, from https://www.stb.gov.sg/news-and-publications/newsletters/Pages/June 2015/Singapore-Excels-as-MICE-.aspx<br><br>
* We also wanted to explore whether different periods of time would indicate the different price index values in a cyclical manner. This would also allow us to understand the overall direction prices are taking, for every goods & services category in Singapore.
 
  
Even though Singapore is a relatively young country, it is able to provide rich data to help us explore these interesting observations. This is also in part due to the government initiative of 'SmartNation.sg'.
+
3. Laurinec, P. (2018, March 13). TSrepr use case - Clustering time series representations in R.<br>
 +
Retrieved October 13, 2018, from https://petolau.github.io/TSrepr-clustering-time-series-representations/<br><br>
  
== Objectives ==
+
4. Turner, P. (2012, November) The Comparative Economic Impact of Travel & Tourism – WTTC<br>
1. Provide interactive platform to illustrate the trends and seasonalities within given time-series data (i.e. Singapore's CPI).
+
Retrieved October 12, 2018, from https://www.wttc.org//media/files/reports/benchmark%20reports/the_comparative_economic_impact_of_travel tourism.pdf<br><br>
  
2. Discover data insights using visualization and interactivity that cannot be easily represented using raw data.
+
5. Dalinina, R. (2017, January 10). Introduction to Forecasting with ARIMA in R.<br>  
 
+
Retrieved October 11, 2018, from https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials<br><br>
3. Make use of freely available Singapore economic data to arouse the interests of potential viewers and increase their curiosity on the current state of Singapore's consumer goods and services.
 
 
 
== Data Source ==
 
 
 
The Consumer Price Index (CPI) data is extracted from ''data.gov.sg<ref name="datagov">Government of Singapore. [https://data.gov.sg/dataset/consumer-price-index-monthly?view_id=0063aa5a-c5de-4c74-94be-b9ec443878be&resource_id=67d08d6b-2efa-4825-8bdb-667d23b7285e], Last Updated on 30 November 2017, Retrieved on 30 November 2017</ref>'' in a monthly format which reveals the figures from January 1961 to August 2017, while the index reference period is 2014. The data has an overall index representing changes in the price level of the whole basket with all items considered, and can also be drilled down to sub-indices and sub-sub-indices for different categories and sub-categories of goods and services. For our system analysis, we plan to use filtered data from 1990 onwards.
 
 
 
== Methodology ==
 
 
 
==== Exploratory Analysis ====
 
 
 
We will explore the different trends of time-series data provided by the various economic data sets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis.
 
 
 
==== Explanatory Analysis ====
 
Relationships between our data will be explained based on our understanding of possible real-world events or causes. Using our CPI use-case as an example, the difference in CPI between the months of June and December can be explained as a result of the holiday seasons causing an increase of demand for clothing in December.
 
 
 
==== Predictive Analysis ====
 
We can use analytics techniques such as Exponential Smoothing and ARIMA to predict future trends of our time-series data, due to the data's cyclical and seasonal nature.
 
  
== Application ==
+
6. Powell, C. (2018, June 22). PowCreating Slopegraphs with R.<br>
 +
Retrieved October 11, 2018, from https://datascienceplus.com/creating-slopegraphs-with-r/<br><br>
  
The proposed system would have three major functions:
+
7. Tan, A. (2017, October 24). Singapore tourism doubled in 10 years, supports 164, 000 job.<br>
 +
Retrieved October 11, 2018, from https://www.businesstimes.com.sg/government-economy/singapore-tourism-doubled-in-10-years-supports-164000-jobs-wttc<br><br>

Latest revision as of 16:41, 20 October 2018


Proposal

Poster

Application

Report


Background

The Crown Jewel of the Formula One Race Circuit, backdrop of the successful Hollywood Film “Crazy Rich Asian” and the honorable host of the Memorable North Korea-United States Summit, Singapore’s ability to position herself as a neutral yet vibrant destination has led to hordes of visitors setting foot onto her sunny shores. It is no surprise that the tourism sector has been developing into a growth engine for Singapore’s economy . For 2017, Singapore’s tourism sectors attained records highs in both tourists’ arrivals and spending. According to the data released by Singapore Tourism Board, the number of arrivals increased by 6.2 per cent to $17.4 million, while tourism receipts increased by 3.9 per cent to $26.8 million. The increasing affordability of travel, with the prevalence of low-cost carriers globally, as contribute to the opportunistic trend.

Beyond tourism, Singapore is also an ideal venue for the conduit of businesses. Singapore has constantly been ranked as the top few, if not the top, amongst Asian cities for hosting Meetings, Inventive Travel, Conventions & Exhibitions (MICE) events. Its premium geographical location and stable political climates have been the two main reasons for being the prime destination for international MICE events. In 2017, a total of 935 international meetings took place in Singapore.

Motivation

During our exploratory analysis on the data comprising of the tourism arrival into Singapore, we noticed that the arrival patterns of tourists and business travellers from respective countries at heterogenous. A keen understanding to the unique travel behaviours can reveal their travel preference which is essential for local businesses to devise plans to attract more tourism receipts boosting their business revenue. The ability of the analysts to grapple the data and transform the insights into actionable business decision will see their businesses flourishes.

With the recent completion of Marina Cruise Centre and ongoing construction of Jewel Changi Airport, the tourism receipts are expected to continue to grow steady for the next decade, barring any black swans.

Objectives

We aim to build an interactive platform to illustrate the trends and seasonality within given time-series data on Singapore tourism sector. Users can have a better understanding of the Singapore tourism situation over the last ten years.

Through this project, we hope that the tourism industry business, especially the small and medium business (check with the SME contribution to Singapore economy or tourism industry) can make optimal marketing solutions and business decision. We attempt to create the platform that assist the business owners and analysts to detect some useful insights from the relationship between travelling revenue and expenditure to promote the economic growth.

  • The platform can give us the overview on the visitors’ arrivals pattern by country, age and different transportation methods.
  • It also provides the geographic map to illustrate the visitor density among different countries.
  • Tourism demand forecasting


Data Source

The Singapore Tourism Sector data is extracted from CEIC database which is available at:
https://insights-ceicdata-com.libproxy.smu.edu.sg

From our perspective, we have selected the five datasets --

Arrival by country
Arrival by age
Arrival by transport
Length of stay
Tourism revenue and expenditure

The datasets are in either monthly or yearly format, or both. For our system analysis, we plan to use filtered data from 2007 onwards.

Methodology

Exploratory Analysis

We will explore the different trends of time-series data provided by the various tourism data sets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis. Visualize the time series in the following ways:

  • Geographic heat map: Visualize the time series by displaying the geographic heat map on the density of visitor arrivals based on the selection of the specific calendar month.
  • Slopegraphs; This visualization technique can provide maximum information with “minimum ink”. It could help us to detect how the number of the visitor changed over the years.
  • Waterfall: Rather than the values itself, a waterfall plot tries to bring out the changes in the values. It could provide the overview of the time series line chart along with on how large the difference is between two data points.

Explanatory Analysis

  • Decompose time-series information into its constituent parts: Observation, Seasonal, Trend, Random (Noise). From the separate parts, users can understand the different time-series patterns and derive insights.
  • We have many variables(columns) in our dataset, so it is obvious that dimensionality is too high to make effective analysis, and the curse of dimensionality can happen. For this reason, it is important to reduce dimensionality in some way. One of the best approaches is to use time series representations in order to reduce dimensionality, reduce noise and emphasize the main characteristics of time series. In this stage, we would like to do the clustering time series analysis to group the countries with the similar pattern.

Predictive Analysis

Time series forecasting is the use of a model to predict future values based on previously observed values.in this case, we would like to use forecasting techniques such as seasonal exponential smoothing and ARIMA to perform prediction. After forecasting analysis, we must compare predicted tourism to real tourism to help us understand the accuracy of our forecasts. Meanwhile, the standard error and other mathematical statistics can be estimated to further verify the forecasting models and help to choose the best one.

Application Libraries & Packages

Package Name Descriptions
TSrepr Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also min-max and z-score normalisations, and forecasting accuracy measures are implemented.
ggplot2 ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Cluster Methods for Cluster analysis.
Clustercrit Best criterion returns the best index value according to a specified criterion.
ggmap ggmap is a package to show the spatial data visualization. It can retrieve various online sources (e.g. Google Maps) for user to download and use as layers within the ggplot2 plotting system.
Slopgraph Convert a data frame (containing a panel dataset, where rows are observations and columns are time periods) into an Edward Tufte-inspired "slopegraph" using either base or ggplot2 graphics.
Forecast Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
Tseries Computes the Augmented Dickey-Fuller test for the null that x has a unit root.

References

1. Tay, F. (2018, February 12). Tourist arrivals, spending in Singapore hit record high for 2nd straight year; China top source of visitors.
Retrieved October 15, 2018, from https://www.straitstimes.com/singapore/tourist-spending-in-singapore-hit-record-268b-in-2017-china-top-source-of-visitors

2. (2018, June 09). Newsletters Singapore Excels as MICE Destination.
Retrieved October 15, 2018, from https://www.stb.gov.sg/news-and-publications/newsletters/Pages/June 2015/Singapore-Excels-as-MICE-.aspx

3. Laurinec, P. (2018, March 13). TSrepr use case - Clustering time series representations in R.
Retrieved October 13, 2018, from https://petolau.github.io/TSrepr-clustering-time-series-representations/

4. Turner, P. (2012, November) The Comparative Economic Impact of Travel & Tourism – WTTC
Retrieved October 12, 2018, from https://www.wttc.org//media/files/reports/benchmark%20reports/the_comparative_economic_impact_of_travel tourism.pdf

5. Dalinina, R. (2017, January 10). Introduction to Forecasting with ARIMA in R.
Retrieved October 11, 2018, from https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials

6. Powell, C. (2018, June 22). PowCreating Slopegraphs with R.
Retrieved October 11, 2018, from https://datascienceplus.com/creating-slopegraphs-with-r/

7. Tan, A. (2017, October 24). Singapore tourism doubled in 10 years, supports 164, 000 job.
Retrieved October 11, 2018, from https://www.businesstimes.com.sg/government-economy/singapore-tourism-doubled-in-10-years-supports-164000-jobs-wttc

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current22:01, 14 October 2018Thumbnail for version as of 22:01, 14 October 20181,050 × 351 (841 KB)Anna.zuo.2017 (talk | contribs)
  • You cannot overwrite this file.

There are no pages that use this file.

Metadata