Group03 Proposal

From Visual Analytics and Applications
Jump to navigation Jump to search

Singapore day.jpg AN INTERACTIVE VISUALISATION PLATFORM ON VISITORS PROFILING

PROPOSAL

POSTER

APPLICATION

REPORT

Tourism Investigator


Background

The Crown Jewel of the Formula One Race Circuit, backdrop of the successful Hollywood Film “Crazy Rich Asian” and the honorable host of the Memorable North Korea-United States Summit, Singapore’s ability to position herself as a neutral yet vibrant destination has led to hordes of visitors setting foot onto her sunny shores. It is no surprise that the tourism sector has been developing into a growth engine for Singapore’s economy . For 2017, Singapore’s tourism sectors attained records highs in both tourists’ arrivals and spending. According to the data released by Singapore Tourism Board, the number of arrivals increased by 6.2 per cent to $17.4 million, while tourism receipts increased by 3.9 per cent to $26.8 million. The increasing affordability of travel, with the prevalence of low-cost carriers globally, as contribute to the opportunistic trend.

Beyond tourism, Singapore is also an ideal venue for the conduit of businesses. Singapore has constantly been ranked as the top few, if not the top, amongst Asian cities for hosting Meetings, Inventive Travel, Conventions & Exhibitions (MICE) events. Its premium geographical location and stable political climates have been the two main reasons for being the prime destination for international MICE events. In 2017, a total of 935 international meetings took place in Singapore.

F1 singapore.jpg CRA.jpg Kim-Trump Summit.jpg


Past Work Reviews

WORLD BANK TOURISM DATA


Worldbank httpsdata.worldbank.orgindicatorST.INT.TRNR.CDend=2016&name desc=false&start=1995&view=chart.jpg

The World bank presents how tourism has evolved over time on the aggregate as well as across respective nations.

International tourism, number of arrivals International tourism, number of departures International tourism, receipts (US$)
International tourism, receipts (% of total exports) International tourism, receipts for passenger transport items (current US$) International tourism, receipts for travel items (current US$)
International tourism, expenditures for travel items (current US$) International tourism, expenditures (current US$)

The data illustrated from World Bank shows the various listed measures in the line graph format. Such visual representations do not provide a clear and deep insight into the origin of travellers. Moreover, the data does not give a clear explanation on the measures utilised, and the usage of expenditure versus receipts confuse users who may not be familiar with the industry.


STB STATISTICS & INSIGHTS

STB httpswww.stb.gov.sgstatistics-and-market-insights.jpg

The chart from STB shows a combination of line and bar graph which could be misleading as the zero baselines are at different heights. In addition usage of two y-axis that are of different measurement is likely to confuse readers. As such, we strive to be clear in our interpretation of results to ensure that it is concise.

Motivation & Objectives

During our exploratory analysis on the data comprising of the tourism arrival into Singapore, we noticed that the arrival patterns of tourists and business travellers from respective countries at heterogenous. The analysis obtained from The World Bank and Singapore Tourism Board provides a macro-view on the overall tourism activity. As much our team aims to address the gap but shifting the analysis to country-specific. A keen understanding of the unique travel behaviours can reveal their travel preference which is essential for local businesses to devise plans to attract more tourism receipts boosting their business revenue. The ability of the analysts to grapple the data and transform the insights into actionable business decision will see their businesses flourishes. In addition, beyond analysis, we aim to provide a forecast on the visitor's future travel and expenditure pattern. This will allow the local businesses to be better prepared to capture the tourism dollars in the next few years.

With the recent completion of Marina Cruise Centre and ongoing construction of Jewel Changi Airport, the tourism receipts are expected to continue to grow steadily for the next decade, barring any black swans.

Through this project, we hope that the tourism industry business, especially the small and medium business (65% of SG's employment rate and contributing 50% of SG's GDP) can make optimal marketing solutions and business decision. We attempt to create a platform that assists the business owners and analysts to detect some useful insights from the relationship between tourists arrivals and their respective expenditure to promote economic growth.

  • The platform can give us an overview on the visitors’ arrivals pattern by country, age and different transportation methods.
  • It also provides the geographic map to illustrate the visitor density among different countries.
  • Tourism demand forecasting

.

Data Quality & Quantity

The Singapore Tourism Sector data is extracted from CEIC database.
Data selection.jpg


Though there are many data sets available from CEIC, not all are applicable given that we want to provide a holistic overview of respective countries tourism attributes. Thus from the list, we have selected the following:

Arrival by country The dataset consists of 47 countries.
Arrival by transport We will illustrate the mode of arrival of these tourists.
Tourism revenue and expenditure The dataset consists of 20 countries


It has been narrowed down to these 3 datasets as they are countries basis. The other datasets such as "length of stay", "visitors by age" and "hotel room occupancy rates" portray the overall tourists' activity and not country-specific. Therefore we are unable to utilise these data to match our objectives.


For our system analysis, we will be using filtered data from 2007 onwards.
The datasets have been reconstructed into a monthly or yearly format, or both where applicable.

Methodology

Exploratory Analysis

  • Dashboard : The one page dashboard at the first page can provide the general information on Singapore Tourism situations. The left hand-side mainly show the average expenditure per capita (SGD) and total arrivals(person) for all tourists in Singapore for the selected year (2007 to 2017). The right hand-side serves for the individual selected country, the value boxes will not only provide the average expenditure per capita (SGD) and total arrivals(person) from that country in the selected year, but also show the highest and lowest months for arrivals from that country in the selected year.
  • Cluster( K-mean): We generate the k-mean cluster on the arrivals and expenditures for 20 countries. The we can group them to see which group of countries is more profitable. And we will present it using a table at the bottom of the dashboard, it will relatively show the other countries lying in the same cluster with the selected one and also present their expenditure and arrivals in the table.
  • Slopegraphs: It mainly illustrate on how the expenditure of 20 countries changed over the past 10 years. Users can change the time period by using the time slider to adjust the time interval they prefer to explore and the highlight country selection can help them make a better comparison with other countries.
  • Ternary Plot: Displaying ternary plot for every year to see which transportation(SEA,LAND AIR) methods used by tourists is more popular from different countries and with the animation button, we can know how the trend changed over the past ten years

Explanatory Analysis

We will explore the different trends of time-series data provided by the various tourism datasets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis. Visualize the time series in the following ways:

  • Time series line chart on TOTAL, AIR, LAND and SEA separately: Different kinds of line chart may have different patterns for the selected country.
  • Decompose time-series information into its constituent parts: Observation, Seasonal, Trend, Random (Noise). From the separate parts, users can understand the different time-series patterns.


Forecasting Analysis

We import the last decade tourism arrival data form 2007 onwards. Our training model use the data from 2007 to 2016 and the whole year data in 2017 will be retained for testing. Since the decomposing illustrate the whole picture(Seasonal, Trend) for each time series on the country. In this case, we would like to use forecasting techniques such as ETS and ARIMA to perform forecasting. Meanwhile, the standard error and other mathematical statistics, such as MASE, can be utilised to further verify the forecasting models and help to choose the best one.

Application Libraries & Packages

Package Name Descriptions
Cluster Help to generate hierarchical cluster and k-mean cluster every year on 20 countries based on tourists the Arrivals and Expenditures.
ggplot2 ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
tidyverse The tidyverse is a collection of R packages that share common principles and are designed to work together seamlessly, such as readr(Read flat files (csv, tsv, fwf) into R), ggplot2 and dplyr.
plotly For plotting interactive time series line chart and ternary plot
broom Use tidy() to constructs a data frame that summarizes the model’s statistical findings, such as extract forecasting results as a data frame.
ggrepel Use geom_text_repel() to avoid labels overlapping.
Forecast Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling. Use ggsubseriesplot() to plot cyclic plot and ets() and auto.arima for better parameter selection.
reshape2 Use melt() to reshape the table to a proper format
DT This package provides a function datatable() to display R data via the DataTables library, such as cluster table, forecast result table etc.
data.table Data manipulation operations such as subset, group, update, join etc., are all inherently related.
timetk Use timetk::tk_tbl () to coerce time-series objects to tibble

References

The following also lists the referenced literature considered in our problem statement and methodology.

[1] Tay, F. (2018, February 12). Tourist arrivals, spending in Singapore hit record high for 2nd straight year; China top source of visitors.
https://www.straitstimes.com/singapore/tourist-spending-in-singapore-hit-record-268b-in-2017-china-top-source-of-visitors

[2] (2018, June 09). Newsletters Singapore Excels as MICE Destination.
https://www.stb.gov.sg/news-and-publications/newsletters/Pages/June 2015/Singapore-Excels-as-MICE-.aspx

[3] MPA 635: Data visualization.
https://datavizf17.classes.andrewheiss.com/class/05-class/

[4] Turner, P. (2012, November) The Comparative Economic Impact of Travel & Tourism WTTC
https://www.wttc.org//media/files/reports/benchmark%20reports/the_comparative_economic_impact_of_travel tourism.pdf

[5] Dalinina, R. (2017, January 10). Introduction to Forecasting with ARIMA in R.
https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials

[6] Powell, C. (2018, June 22). PowCreating Slopegraphs with R.
https://datascienceplus.com/creating-slopegraphs-with-r/

[7] Tan, A. (2017, October 24). Singapore tourism doubled in 10 years, supports 164, 000 job.
https://www.businesstimes.com.sg/government-economy/singapore-tourism-doubled-in-10-years-supports-164000-jobs-wttc

[8] Gabriel Martos. Cluster Analysis with R. 
https://rpubs.com/gabrielmartos/ Cluster Analysis

[9] Ternary Plots in R with Plotly. 
https://xang1234.github.io/ternary/