Difference between revisions of "Group03 Proposal"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 88: Line 88:
 
|-
 
|-
 
| Tourism revenue and expenditure|| The dataset consists of 20 countries
 
| Tourism revenue and expenditure|| The dataset consists of 20 countries
 +
 +
  
 
It has been narrowed down to these 3 datasets as they are countries basis. The other datasets such as "length of stay", "visitors by age" and "hotel room occupancy rates" portray the overall tourists' activity and not country-specific. Therefore we are unable to utilise these data to match our objectives.
 
It has been narrowed down to these 3 datasets as they are countries basis. The other datasets such as "length of stay", "visitors by age" and "hotel room occupancy rates" portray the overall tourists' activity and not country-specific. Therefore we are unable to utilise these data to match our objectives.

Revision as of 22:57, 27 November 2018

Singapore day.jpg AN INTERACTIVE VISUALISATION PLATFORM ON VISITORS PROFILING

PROPOSAL

POSTER

APPLICATION

REPORT

Background

The Crown Jewel of the Formula One Race Circuit, backdrop of the successful Hollywood Film “Crazy Rich Asian” and the honorable host of the Memorable North Korea-United States Summit, Singapore’s ability to position herself as a neutral yet vibrant destination has led to hordes of visitors setting foot onto her sunny shores. It is no surprise that the tourism sector has been developing into a growth engine for Singapore’s economy . For 2017, Singapore’s tourism sectors attained records highs in both tourists’ arrivals and spending. According to the data released by Singapore Tourism Board, the number of arrivals increased by 6.2 per cent to $17.4 million, while tourism receipts increased by 3.9 per cent to $26.8 million. The increasing affordability of travel, with the prevalence of low-cost carriers globally, as contribute to the opportunistic trend.

Beyond tourism, Singapore is also an ideal venue for the conduit of businesses. Singapore has constantly been ranked as the top few, if not the top, amongst Asian cities for hosting Meetings, Inventive Travel, Conventions & Exhibitions (MICE) events. Its premium geographical location and stable political climates have been the two main reasons for being the prime destination for international MICE events. In 2017, a total of 935 international meetings took place in Singapore.

F1 singapore.jpg CRA.jpg Kim-Trump Summit.jpg


Past Work Reviews

WORLD BANK TOURISM DATA


Worldbank httpsdata.worldbank.orgindicatorST.INT.TRNR.CDend=2016&name desc=false&start=1995&view=chart.jpg

The World bank presents how tourism has evolved over time on the aggregate as well as across respective nations.

International tourism, number of arrivals International tourism, number of departures International tourism, receipts (US$)
International tourism, receipts (% of total exports) International tourism, receipts for passenger transport items (current US$) International tourism, receipts for travel items (current US$)
International tourism, expenditures for travel items (current US$) International tourism, expenditures (current US$)

The data illustrated from World Bank shows the various listed measures in the line graph format. Such visual representations do not provide a clear and deep insight into the origin of travellers. Moreover, the data does not give a clear explanation on the measures utilised, and the usage of expenditure versus receipts confuse users who may not be familiar with the industry.


STB STATISTICS & INSIGHTS

STB httpswww.stb.gov.sgstatistics-and-market-insights.jpg

The chart from STB shows a combination of line and bar graph which could be misleading as the zero baselines are at different heights. In addition usage of two y-axis that are of different measurement is likely to confuse readers. As such, we strive to be clear in our interpretation of results to ensure that it is concise.

Motivation & Objectives

During our exploratory analysis on the data comprising of the tourism arrival into Singapore, we noticed that the arrival patterns of tourists and business travellers from respective countries at heterogenous. The analysis obtained from The World Bank and Singapore Tourism Board provides a macro-view on the overall tourism activity. As much our team aims to address the gap but shifting the analysis to country-specific. A keen understanding of the unique travel behaviours can reveal their travel preference which is essential for local businesses to devise plans to attract more tourism receipts boosting their business revenue. The ability of the analysts to grapple the data and transform the insights into actionable business decision will see their businesses flourishes. In addition, beyond analysis, we aim to provide a forecast on the visitor's future travel and expenditure pattern. This will allow the local businesses to be better prepared to capture the tourism dollars in the next few years.

With the recent completion of Marina Cruise Centre and ongoing construction of Jewel Changi Airport, the tourism receipts are expected to continue to grow steadily for the next decade, barring any black swans.

Through this project, we hope that the tourism industry business, especially the small and medium business (check with the SME contribution to Singapore economy or tourism industry) can make optimal marketing solutions and business decision. We attempt to create a platform that assists the business owners and analysts to detect some useful insights from the relationship between travelling revenue and expenditure to promote economic growth.

  • The platform can give us an overview on the visitors’ arrivals pattern by country, age and different transportation methods.
  • It also provides the geographic map to illustrate the visitor density among different countries.
  • Tourism demand forecasting

.

Data Quality & Quantity

The Singapore Tourism Sector data is extracted from CEIC database.
Data selection.jpg


Though there are many data sets available from CEIC, not all are applicable given that we want to provide a holistic overview of respective countries tourism attributes. Thus from the list, we have selected the following:

Arrival by country The dataset consists of 47 countries.
Arrival by transport We will illustrate the mode of arrival of these tourists.
Tourism revenue and expenditure The dataset consists of 20 countries


It has been narrowed down to these 3 datasets as they are countries basis. The other datasets such as "length of stay", "visitors by age" and "hotel room occupancy rates" portray the overall tourists' activity and not country-specific. Therefore we are unable to utilise these data to match our objectives.


For our system analysis, we will be using filtered data from 2007 onwards.
The datasets have been reconstructed into a monthly or yearly format, or both where applicable.

Methodology

Exploratory Analysis

We will explore the different trends of time-series data provided by the various tourism datasets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis. Visualize the time series in the following ways:

  • Heatmap: Visualize the time series by displaying the heat map on the density of visitor arrivals based on the selection of the specific calendar month.
  • Slopegraphs: This visualization technique can provide maximum information with “minimum ink”. It could help us to detect how the number of the visitor changed over the years.
  • Ternary: We use the Rather than the values itself, a waterfall plot tries to bring out the changes in the values. It could provide an overview of the time series line chart along with on how large the difference is between two data points.

Explanatory Analysis

  • Decompose time-series information into its constituent parts: Observation, Seasonal, Trend, Random (Noise). From the separate parts, users can understand the different time-series patterns and derive insights.
  • We have many variables(columns) in our dataset, so it is obvious that dimensionality is too high to make effective analysis, and the curse of dimensionality can happen. For this reason, it is important to reduce dimensionality in some way. One of the best approaches is to use time series representations in order to reduce dimensionality, reduce noise and emphasize the main characteristics of the time series. In this stage, we would like to do the clustering time series analysis to group the countries with a similar pattern.

Forecasting Analysis

Time series forecasting is the use of a model to predict future values based on previously observed values.in this case, we would like to use forecasting techniques such as ETS and ARIMA to perform prediction. After forecasting analysis, we must compare predicted tourism to real tourism to help us understand the accuracy of our forecasts. Meanwhile, the standard error and other mathematical statistics, such as MASE, can be utilised to further verify the forecasting models and help to choose the best one.

Application Libraries & Packages

Package Name Descriptions
TSrepr Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also min-max and z-score normalisations, and forecasting accuracy measures are implemented.
ggplot2 ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
Cluster Methods for Cluster analysis.
Clustercrit Best criterion returns the best index value according to a specified criterion.
ggmap ggmap is a package to show the spatial data visualization. It can retrieve various online sources (e.g. Google Maps) for user to download and use as layers within the ggplot2 plotting system.
Slopgraph Convert a data frame (containing a panel dataset, where rows are observations and columns are time periods) into an Edward Tufte-inspired "slopegraph" using either base or ggplot2 graphics.
Forecast Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
Tseries Computes the Augmented Dickey-Fuller test for the null that x has a unit root.

References

The following also lists the referenced literature considered in our problem statement and methodology.

[1] Tay, F. (2018, February 12). Tourist arrivals, spending in Singapore hit record high for 2nd straight year; China top source of visitors.
https://www.straitstimes.com/singapore/tourist-spending-in-singapore-hit-record-268b-in-2017-china-top-source-of-visitors

[2] (2018, June 09). Newsletters Singapore Excels as MICE Destination.
https://www.stb.gov.sg/news-and-publications/newsletters/Pages/June 2015/Singapore-Excels-as-MICE-.aspx

[3] Laurinec, P. (2018, March 13). TSrepr use case - Clustering time series representations in R.
https://petolau.github.io/TSrepr-clustering-time-series-representations/

[4] Turner, P. (2012, November) The Comparative Economic Impact of Travel & Tourism WTTC
https://www.wttc.org//media/files/reports/benchmark%20reports/the_comparative_economic_impact_of_travel tourism.pdf

[5] Dalinina, R. (2017, January 10). Introduction to Forecasting with ARIMA in R.
https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials

[6] Powell, C. (2018, June 22). PowCreating Slopegraphs with R.
https://datascienceplus.com/creating-slopegraphs-with-r/

[7] Tan, A. (2017, October 24). Singapore tourism doubled in 10 years, supports 164, 000 job.
https://www.businesstimes.com.sg/government-economy/singapore-tourism-doubled-in-10-years-supports-164000-jobs-wttc