Difference between revisions of "Group 8 Overview"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(49 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
<!-- BANNER -->
 +
[[Image:Group8ProjectBanner.png|1050px|right|width="100%"]]
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
 
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#1B338F;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
| style="font-family:Century Gothic; font-size:110%; solid #000000; background:#000080; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #000000; background:#2B3856; text-align:center;" width="25%" |  
 
;
 
;
[[Group_8_Overview| <font color="#FFFFFF">Overview</font>]]
+
[[Group_8_Overview| <font color="#FFFFFF">Proposal</font>]]
  
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#000080; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |  
 
;
 
;
 
[[Group_8_Poster| <font color="#FFFFFF">Poster</font>]]
 
[[Group_8_Poster| <font color="#FFFFFF">Poster</font>]]
  
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#000080; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |  
 
;
 
;
 
[[Group_8_Application| <font color="#FFFFFF">Application</font>]]
 
[[Group_8_Application| <font color="#FFFFFF">Application</font>]]
  
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#000080; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:110%; solid #1B338F; background:#2B3856; text-align:center;" width="25%" |  
 
;
 
;
 
[[Group_8_Report| <font color="#FFFFFF">Report</font>]]
 
[[Group_8_Report| <font color="#FFFFFF">Report</font>]]
Line 20: Line 22:
  
 
== Background ==
 
== Background ==
===What is CPI ===
 
The Consumer Price Index (CPI), is a critical indicator to assess the consumer price inflation. To profile the weighted average price changes for households' cost of living, Singapore's CPI adopts a fixed basket of residents' commonly consumption goods and services. About 6,600 brands/varieties from 4,200 outlets are selected in the 2014-based CPI (the dataset for our study). In basket level, the composition of goods and services can be categorized into 10 major divisions, which are listed below:
 
Food, Clothing & Footwear, Housing & Utilities, Household Durables And Services, Health Care,
 
  
Transport, Communication, Recreation & Culture, Education, Miscellaneous Goods & Services
+
=== Time Series Explorer ===
  
To construct CPI, two main types of data are required - the price data of a sample of goods and services, and the weighting data to represent the shares of different divisions' expenditure. The price data is gathered through a combination of data collection modes, while the frequency of price collection depends on price behavior of the item. The weighting data is derived from the expenditure values collected in the Household Expenditure Survey,  the latest one came from HES 2012/13, updated to 2014 values by taking into account price changes between 2012/13 and 2014.
+
Time series data has always played an important role in understanding and evaluating past behaviours. The usage of time-series information has allowed companies and organizations to tune their operations instead of simply improving by trial and error. Time series data have also been used to evaluate whether an organization has grown throughout the years of operation. The analysis techniques for time-bound data help to describe and explore insights, while predicting the future outcomes such as revenues and profits.
  
As a Price index, CPI can also be affected by other types of costs in Singapore. Indicators like COE Bidding Price, Import and Export price index, Exchange Rates,  are assumed to have interesting relationships with CPI which worth further study.
+
This category of data analysis is not only constrained to the business world. A country's economic data is also a rich pool of information that can be analysed for insights. One of the important example measures that can allow us to get a glimpse would be the Consumer Price Index (CPI). Using this easily available measure, we can explore living standards of a country's citizens by understanding the factors affecting their day-to-day lives.
  
===Exchange Rate===
+
=== Use Case - Consumer Price Index ===
  
Singapore has adopted a market-oriented currency system since 1985. To ensure better control over economic measures such as inflation and import/export prices, the Monetary Authority of Singapore (MAS) has been set up to monitor the various exchange rates between the Singapore Dollar (SGD) and other important countries who are Singapore's trading partners.
+
The Consumer Price Index (CPI), is a critical indicator to assess a country's consumer price inflation. To profile the weighted average price changes for households' cost of living, Singapore's CPI adopts a fixed basket of residents' commonly consumption goods and services. About 6,600 brands/varieties from 4,200 outlets are selected in the 2014-based CPI (the dataset for our study). At basket level, the composition of goods and services can be categorized into 10 major divisions, which are listed below:
 +
Food, Clothing & Footwear, Housing & Utilities, Household Durables And Services, Health Care,
  
MAS allows the SGD to rise or fall against a secretive basket of currencies, only intervening when necessary to keep the particular exchange rate within an unspecified exchange rate policy band. The reason why there is sparse information about the basket of currencies and band value ranges is to deter any form of currency speculation. Based on the needs of the current appreciation or depreciation of the Singapore Dollar, MAS adjusts the slope, width and centre of this exchange rate policy band. The exchange rate that MAS targets is also trade-weighted, so the currencies of bigger trading partners of Singapore will bear more weight in the equation.
+
Transport, Communication, Recreation & Culture, Education, Miscellaneous Goods & Services
  
MAS policies and the floating exchange rates have great bearing upon Singapore's economic history and future, so it is beneficial to explore the deep repercussions that these changes bring to the country. Important information such as the Import/Export prices and Consumer Price Index can help to illustrate this complex economic relationship.
+
[[Image:SG_2014_BASED_CPI.png|500px|thumb|right|Figure 1: 2014 CPI Basket Items from Singstat<ref name="singstat">Department of Statistics Singapore. [http://www.singstat.gov.sg/educational-corner/faq-on-CPI], "FAQ on Consumer Price Index (CPI)", Retrieved on 30 November 2017</ref>]]
 +
To construct the CPI, two main types of data are required - the sample price data of the good or service, and the weighting data to represent the proportion of different categories' expenditure. The price data is gathered through a combination of data collection modes, while the frequency of price collection depends on the price behavior of the item. The weighting data is derived from the expenditure values collected in the Household Expenditure Survey,  the latest one came from HES 2012/13, updated to 2014 values by taking into account price changes between 2012/13 and 2014.
  
===Export & Import Index
+
As a Price index, CPI can also be affected by other types of costs in Singapore. Indicators like COE Bidding Price<ref name="COE">Yeap, R. [http://tralvex.com/pub/cars/coe.htm], "COE Prices", Retrieved on 30 November 2017</ref>, Import and Export price index<ref name="CEIC">CEICData.com [https://insights-ceicdata-com.libproxy.smu.edu.sg/Untitled-insight/views], "CEIC - A Euromoney Institutional Investor Company", Retrieved on 30 November 2017</ref>, Exchange Rates<ref name="MAS">Monetary Authority of Singapore. [https://secure.mas.gov.sg/msb/ExchangeRates.aspx], "Monetary Authority of Singapore", Retrieved on 30 November 2017</ref>, are assumed to have interesting relationships with CPI which are worthy of further study. These data sources are also acceptable within the proposed system design within certain format restrictions.
No matter how rich a country is, how small or big it is, it will never be totally independent from the rest and have everything it needs. As a strong advocate of free trade, Singapore has relatively few trade barriers. In 2016, Singapore is the 13th largest exporter ($353.3 billion), and 15th largest importer ($271.3 billion) in the world. It has strongly influenced the nation economy. Especially due to the geographical location and national conditions, most of the livelihood commodity in Singapore relies on import which affects civilian’s daily life.
 
  
== Motivation ==
 
  
With the background knowledge in mind, our team would like to create an exploratory model that showcases these complex relationships of Singapore's CPI, exchange rates, import and export pricing throughout the years (1990-2017) visualized on the world map. Time-series visualization techniques would also be adopted to look for hidden data trends for analysis.
 
  
There are several reasons why we found this project interesting:
 
  
* Closer to our daily standards of living, we wanted to understand what are the existing factors that can impact a country's CPI and subsequently its impact on its citizens.
 
* One of the potential factors that impact CPI, we wanted to look at how exchange rates work in Singapore's context and how they are used to Singapore's advantage when trading with the country's trading partners.
 
* Since Singapore is heavily dependent on its imported consumer goods, we also wanted to investigate how import and export pricing works as a whole, as well as its involvement as a moving part of a country's full economic system.
 
* As an overall conclusion, we know that the above economic indicators are highly related and would like to combine them to receive any data insights in terms of time period trends.
 
  
Even though Singapore is a relatively young country, it is able to provide rich data to help us explore these interesting observations.
 
  
== Objectives ==
 
1. Provide interactive platform to illustrate the relationship between Singapore's exchange rate information, Consumer Price Index and Import and Export markets.
 
  
2. Discover data insights using visualization and interactivity that cannot be easily represented using raw data.
 
  
3. Make use of freely available Singapore economic data to arouse the interests of potential viewers and increase their curiosity on our trading relationships with the country's important trading partners.
+
== Motivation ==
  
== Data Source ==
+
During our personal data analysis research and experiences, we discovered a lack of freely available analysis tools that can help us optimize the parameter settings of time-series models. The result is a large amount of time and effort utilized to enter in different combinations of parameters and waiting for the models to be trained. The time series system that we have in mind would need to help us estimate the model's accuracy rates automatically while we perform our data analyses, so we can simply choose the best models for comparison.
  
<CPI> <COE>
+
Our team would also like to create exploratory and predictive models, which showcase these complex time-related trends of Singapore's CPI throughout the years (1990-2017) for different categories. We would like to use time-series visualization techniques such as tables and line charts representing Trend, Seasonality and Random to investigate any insights and to display potential forecasts to the audience.
  
Singapore MAS Exchange Rate Data can be retrieved from Singapore MAS website, and showcases denominations of S$ per 100 of selected foreign currencies. The data set can be obtained in weekly and monthly format from the year 1988-2017.
+
We will apply system design principles to make the proposed system accept any form of generic time-series data. This is to allow flexibility of the system and expands its scope of usage.
  
Singapore import/export statistic dataset is downloaded from CEIC database and sourced by International Enterprise Singapore & Department of Statistics.
+
There are several reasons why we found this project interesting:
  
The dataset contains monthly import & export value by commodity section from 1964 to 2017. In order to align with other datasets used in this project, the data before 1990 will be excluded from the list.
+
* General exploration of time-series data. The project allows us to learn and re-learn time-series analysis techniques and concepts. This gives us the opportunity to let us put our theoretical knowledge into practical use.
 +
* Closer to our daily standards of living, we wanted to understand what are the current categories that make up a country's CPI index and subsequently its impact on its citizens.
 +
* We also wanted to explore whether different periods of time would indicate the different price index values in a cyclical manner. This would also allow us to understand the overall direction prices are taking, for every goods & services category in Singapore.
  
== Deliverables ==
+
Even though Singapore is a relatively young country, it is able to provide rich data to help us explore these interesting observations. This is also in part due to the government initiative of 'SmartNation.sg'.
  
=== Wiki Page ===
+
== Objectives ==
 +
1. Provide interactive platform to illustrate the trends and seasonalities within given time-series data (i.e. Singapore's CPI).
  
The project's wiki page will focus primarily on delivering the project's background, data processing, data analysis, project poster and the project final report in a concise and easy-to-understand manner to the viewer.
+
2. Discover data insights using visualization and interactivity that cannot be easily represented using raw data.
  
For the wiki-based final Visual Analytics Application report, it should contain the following sections:
+
3. Make use of freely available Singapore economic data to arouse the interests of potential viewers and increase their curiosity on the current state of Singapore's consumer goods and services.
  
* ''Motivation of the application''
+
== Data Source ==
* ''Review and critic on past works''
 
* ''Design framework'' - A detailed description of the design principles used and data visualization elements built
 
* ''Demonstration'' - Sample test cases
 
* ''Discussion points'':
 
# What has the audience learned from the project?
 
# What new insights or practices has the system enabled?
 
# A full blown user study is not expected, but informal observations of use that help evaluate the system are encouraged.
 
* ''Future Works'' - A description of how the system could be extended or refined.
 
* ''Installation guide'' - including hardware configuration and software integration.
 
* ''Sample Installation Guide''
 
* ''User Guide'' - Step-by-step guide on how to use the data visualization functions designed.
 
  
=== Poster ===
+
The Consumer Price Index (CPI) data is extracted from ''data.gov.sg<ref name="datagov">Government of Singapore. [https://data.gov.sg/dataset/consumer-price-index-monthly?view_id=0063aa5a-c5de-4c74-94be-b9ec443878be&resource_id=67d08d6b-2efa-4825-8bdb-667d23b7285e], Last Updated on 30 November 2017, Retrieved on 30 November 2017</ref>'' in a monthly format which reveals the figures from January 1961 to August 2017, while the index reference period is 2014. The data has an overall index representing changes in the price level of the whole basket with all items considered, and can also be drilled down to sub-indices and sub-sub-indices for different categories and sub-categories of goods and services. For our system analysis, we plan to use filtered data from 1990 onwards.
The project poster would provide an illustrated overview of the project.
 
  
Various sections would include:
+
== Methodology ==
  
* ''Background and problem statement''
+
==== Exploratory Analysis ====
* ''Motivation for the project''
 
* ''Project Approach''
 
* ''Project Results''
 
* ''Future Works''
 
  
The poster will be designed with learnt visualization techniques, coupled with aesthetics to deliver the correct data insights to the intended audience.
+
We will explore the different trends of time-series data provided by the various economic data sets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis.
  
Poster Dimensions:
+
==== Explanatory Analysis ====
 +
Relationships between our data will be explained based on our understanding of possible real-world events or causes. Using our CPI use-case as an example, the difference in CPI between the months of June and December can be explained as a result of the holiday seasons causing an increase of demand for clothing in December.
  
* ''Size = ISO A1 (594 × 841mm or 23.39 × 33.11inci)''
+
==== Predictive Analysis ====
* ''Resolution = 300dpi or above (high-resolution)''
+
We can use analytics techniques such as Exponential Smoothing and ARIMA to predict future trends of our time-series data, due to the data's cyclical and seasonal nature.
* ''File format = jpeg''
 
  
Poster will be uploaded to this wiki page and the project Dropbox before the poster presentation.
+
== Application ==
  
=== Presentation ===
+
The proposed system would have three major functions:
  
A poster presentation of the final project will be in the form of a poster session and live demo. A laptop will be set up near our team poster and be used to explain the project. As the venue will be open to any esteemed guests and the general public, a 5-10 minute oral explanation and demonstration will be given to all who are interested in the poster and visualization application.
+
'''Data Manipulation:'''
  
* Conference Venue: SIS SR B1-1 Basement 1
+
System users would be able to upload time-series data files within certain formats. The interface provided would allow some forms of data transformations such as the transposition of columns, the generation of an index column as a substitute for a missing datetime column, and the indication of missing time series periods. These functionalities are required because time series analysis needs data to be indexed by a form of datetime field. Metadata information of the uploaded dataset would be displayed for easy viewing and the uploaded dataset can also be previewed from the main panel.
* Poster Venue: SIS Faculty and Student Lounge Basement 1.
 
* Date/Time: 2nd December 2017 9:00am-6:00pm
 
  
=== Visual Analytics Artifact ===
+
'''Data Exploration:'''
  
Main packages used will be Tinyverse, ggplot2 and R Shiny.
+
The system should allow the functionality to decompose time-series information into its constituent parts: Observation, Seasonal, Trend, Random (Noise). From the separate parts, users can understand the different time-series patterns and derive insights. The system will also provide several filter functions such as denoting whether the data is additive or multiplicative trend, start and end dates of the data, and the frequency of the time-series periods.
  
Analysis will be provided in the form of an interactive visualization platform.
+
'''Forecasting:'''
  
Users will be able to make changes to various parameters to do self-exploratory analysis.
+
The system should also be able to forecast time-series data that have been filtered out from the Data Exploration feature. The forecasting techniques will utilize Exponential Smoothing and ARIMA techniques to perform predictions. An optimization algorithm will be used along with existing packages to find the best set of parameters and the top 3 models of each technique will be selected based on their AIC, BIC values. Once selected, the models can then be graphed on the page as a comparison.
  
Supplementary graphs and charts will be used to showcase the various differences and correlations of data.
+
== Application Libraries & Packages ==
 +
{|class="wikitable"
 +
|-
 +
! Package Name !! Descriptions
 +
|-
 +
| ''Shiny<ref name="shiny">RStudio.org. [https://shiny.rstudio.com/] "Interact. Analyze. Communicate.", Retrieved on 30 November 2017</ref>''  || Interactive web applications for data visualization
 +
|-
 +
| ''Tidyverse: tidyr, dplyr, ggplot2<ref name="tidyverse">tidyverse.org. [https://www.tidyverse.org] "R packages for data science", Retrieved on 30 November 2017</ref>''  || Tidying and manipulating data for visualizing in ggplot2
 +
|-
 +
| ''Shinythemes<ref name="shinythemes">Chang, W, RStudio, and etc. [https://cran.r-project.org/web/packages/shinythemes/index.html] "shinythemes: Themes for Shiny", Retrieved on 30 November 2017</ref>''  || Provide consistent UI elements for aesthetics
 +
|-
 +
| ''forecast<ref name="forecast">Hyndman, R, and etc. [https://cran.r-project.org/web/packages/forecast/index.html] "forecast: Forecasting Functions for Time Series and Linear Models", Retrieved on 30 November 2017</ref>, broom, sweep<ref name="timetk">www.business-science.io. [http://www.business-science.io/r-packages.html] "Open Source Software For Business & Financial Analysis", Retrieved on 30 November 2017</ref>''  || Packages used to "tidy" data models for easy forecasting. Forecast package uses ''ts'' objects that is difficult to manipulate. sw_sweep from the sweep package uses broom-style tidiers to extract model infomation into 'tidy' data frames. sweep package also uses timekit at the back-end to maintain the original time series index throughout the whole process.
 +
|-
 +
| ''tibbletime<ref name="tibbletime">Vaughan, D, and etc. [https://cran.r-project.org/web/packages/tibbletime/index.html] "tibbletime: Time Aware Tibbles", Retrieved on 30 November 2017</ref>''  || Time-based data subsetting
 +
|-
 +
| ''lubridate<ref name="lubridate">Spinu, V, and etc. [https://cran.r-project.org/web/packages/lubridate/index.html] "lubridate: Make Dealing with Dates a Little Easier", Retrieved on 30 November 2017</ref>''  || Easy manipulation of datetime data
 +
|-
 +
| ''timetk<ref name="timetk">www.business-science.io. [http://www.business-science.io/r-packages.html] "Open Source Software For Business & Financial Analysis", Retrieved on 30 November 2017</ref>''  || Extracting/checking of datetime index from ts objects
 +
|-
 +
| ''stringr<ref name="stringr">Wickham, H and RStudio. [https://cran.r-project.org/web/packages/stringr/index.html] "stringr: Simple, Consistent Wrappers for Common String Operations", Retrieved on 30 November 2017</ref>''  || String manipulation
 +
|-
 +
| ''DT<ref name="DT">Xie, Y, and etc. [https://cran.r-project.org/web/packages/DT/index.html] "DT: A Wrapper of the Javascript Library 'DataTables'", Retrieved on 30 November 2017</ref>''  || Sortable data table UI element for model accuracy measures
 +
|-
 +
| ''cowplot<ref name="cowplot">Wilke, C, and etc. [https://cran.r-project.org/web/packages/cowplot/index.html] "cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'", Retrieved on 30 November 2017</ref>''  || Graph arrangement of ''ggplots'' in a single renderPlot function
 +
|-
 +
| ''shinycssloaders<ref name="shinycssloaders">Sali, A, and etc. [https://cran.r-project.org/web/packages/shinycssloaders/index.html] "shinycssloaders: Add CSS Loading Animations to 'shiny' Outputs", Retrieved on 30 November 2017</ref>''  || Loading animation for large data loading and model training
 +
|-
 +
|}
  
 
== References ==
 
== References ==
1. https://secure.mas.gov.sg/msb/ExchangeRates.aspx
+
<references/>
 
 
2. https://insights-ceicdata-com.libproxy.smu.edu.sg/Untitled-insight/views
 

Latest revision as of 23:19, 30 November 2017

width="100%"

Proposal

Poster

Application

Report


Background

Time Series Explorer

Time series data has always played an important role in understanding and evaluating past behaviours. The usage of time-series information has allowed companies and organizations to tune their operations instead of simply improving by trial and error. Time series data have also been used to evaluate whether an organization has grown throughout the years of operation. The analysis techniques for time-bound data help to describe and explore insights, while predicting the future outcomes such as revenues and profits.

This category of data analysis is not only constrained to the business world. A country's economic data is also a rich pool of information that can be analysed for insights. One of the important example measures that can allow us to get a glimpse would be the Consumer Price Index (CPI). Using this easily available measure, we can explore living standards of a country's citizens by understanding the factors affecting their day-to-day lives.

Use Case - Consumer Price Index

The Consumer Price Index (CPI), is a critical indicator to assess a country's consumer price inflation. To profile the weighted average price changes for households' cost of living, Singapore's CPI adopts a fixed basket of residents' commonly consumption goods and services. About 6,600 brands/varieties from 4,200 outlets are selected in the 2014-based CPI (the dataset for our study). At basket level, the composition of goods and services can be categorized into 10 major divisions, which are listed below:

Food, Clothing & Footwear, Housing & Utilities, Household Durables And Services, Health Care, 
Transport, Communication, Recreation & Culture, Education, Miscellaneous Goods & Services
Figure 1: 2014 CPI Basket Items from Singstat[1]

To construct the CPI, two main types of data are required - the sample price data of the good or service, and the weighting data to represent the proportion of different categories' expenditure. The price data is gathered through a combination of data collection modes, while the frequency of price collection depends on the price behavior of the item. The weighting data is derived from the expenditure values collected in the Household Expenditure Survey, the latest one came from HES 2012/13, updated to 2014 values by taking into account price changes between 2012/13 and 2014.

As a Price index, CPI can also be affected by other types of costs in Singapore. Indicators like COE Bidding Price[2], Import and Export price index[3], Exchange Rates[4], are assumed to have interesting relationships with CPI which are worthy of further study. These data sources are also acceptable within the proposed system design within certain format restrictions.





Motivation

During our personal data analysis research and experiences, we discovered a lack of freely available analysis tools that can help us optimize the parameter settings of time-series models. The result is a large amount of time and effort utilized to enter in different combinations of parameters and waiting for the models to be trained. The time series system that we have in mind would need to help us estimate the model's accuracy rates automatically while we perform our data analyses, so we can simply choose the best models for comparison.

Our team would also like to create exploratory and predictive models, which showcase these complex time-related trends of Singapore's CPI throughout the years (1990-2017) for different categories. We would like to use time-series visualization techniques such as tables and line charts representing Trend, Seasonality and Random to investigate any insights and to display potential forecasts to the audience.

We will apply system design principles to make the proposed system accept any form of generic time-series data. This is to allow flexibility of the system and expands its scope of usage.

There are several reasons why we found this project interesting:

  • General exploration of time-series data. The project allows us to learn and re-learn time-series analysis techniques and concepts. This gives us the opportunity to let us put our theoretical knowledge into practical use.
  • Closer to our daily standards of living, we wanted to understand what are the current categories that make up a country's CPI index and subsequently its impact on its citizens.
  • We also wanted to explore whether different periods of time would indicate the different price index values in a cyclical manner. This would also allow us to understand the overall direction prices are taking, for every goods & services category in Singapore.

Even though Singapore is a relatively young country, it is able to provide rich data to help us explore these interesting observations. This is also in part due to the government initiative of 'SmartNation.sg'.

Objectives

1. Provide interactive platform to illustrate the trends and seasonalities within given time-series data (i.e. Singapore's CPI).

2. Discover data insights using visualization and interactivity that cannot be easily represented using raw data.

3. Make use of freely available Singapore economic data to arouse the interests of potential viewers and increase their curiosity on the current state of Singapore's consumer goods and services.

Data Source

The Consumer Price Index (CPI) data is extracted from data.gov.sg[5] in a monthly format which reveals the figures from January 1961 to August 2017, while the index reference period is 2014. The data has an overall index representing changes in the price level of the whole basket with all items considered, and can also be drilled down to sub-indices and sub-sub-indices for different categories and sub-categories of goods and services. For our system analysis, we plan to use filtered data from 1990 onwards.

Methodology

Exploratory Analysis

We will explore the different trends of time-series data provided by the various economic data sets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis.

Explanatory Analysis

Relationships between our data will be explained based on our understanding of possible real-world events or causes. Using our CPI use-case as an example, the difference in CPI between the months of June and December can be explained as a result of the holiday seasons causing an increase of demand for clothing in December.

Predictive Analysis

We can use analytics techniques such as Exponential Smoothing and ARIMA to predict future trends of our time-series data, due to the data's cyclical and seasonal nature.

Application

The proposed system would have three major functions:

Data Manipulation:

System users would be able to upload time-series data files within certain formats. The interface provided would allow some forms of data transformations such as the transposition of columns, the generation of an index column as a substitute for a missing datetime column, and the indication of missing time series periods. These functionalities are required because time series analysis needs data to be indexed by a form of datetime field. Metadata information of the uploaded dataset would be displayed for easy viewing and the uploaded dataset can also be previewed from the main panel.

Data Exploration:

The system should allow the functionality to decompose time-series information into its constituent parts: Observation, Seasonal, Trend, Random (Noise). From the separate parts, users can understand the different time-series patterns and derive insights. The system will also provide several filter functions such as denoting whether the data is additive or multiplicative trend, start and end dates of the data, and the frequency of the time-series periods.

Forecasting:

The system should also be able to forecast time-series data that have been filtered out from the Data Exploration feature. The forecasting techniques will utilize Exponential Smoothing and ARIMA techniques to perform predictions. An optimization algorithm will be used along with existing packages to find the best set of parameters and the top 3 models of each technique will be selected based on their AIC, BIC values. Once selected, the models can then be graphed on the page as a comparison.

Application Libraries & Packages

Package Name Descriptions
Shiny[6] Interactive web applications for data visualization
Tidyverse: tidyr, dplyr, ggplot2[7] Tidying and manipulating data for visualizing in ggplot2
Shinythemes[8] Provide consistent UI elements for aesthetics
forecast[9], broom, sweep[10] Packages used to "tidy" data models for easy forecasting. Forecast package uses ts objects that is difficult to manipulate. sw_sweep from the sweep package uses broom-style tidiers to extract model infomation into 'tidy' data frames. sweep package also uses timekit at the back-end to maintain the original time series index throughout the whole process.
tibbletime[11] Time-based data subsetting
lubridate[12] Easy manipulation of datetime data
timetk[10] Extracting/checking of datetime index from ts objects
stringr[13] String manipulation
DT[14] Sortable data table UI element for model accuracy measures
cowplot[15] Graph arrangement of ggplots in a single renderPlot function
shinycssloaders[16] Loading animation for large data loading and model training

References

  1. Department of Statistics Singapore. [1], "FAQ on Consumer Price Index (CPI)", Retrieved on 30 November 2017
  2. Yeap, R. [2], "COE Prices", Retrieved on 30 November 2017
  3. CEICData.com [3], "CEIC - A Euromoney Institutional Investor Company", Retrieved on 30 November 2017
  4. Monetary Authority of Singapore. [4], "Monetary Authority of Singapore", Retrieved on 30 November 2017
  5. Government of Singapore. [5], Last Updated on 30 November 2017, Retrieved on 30 November 2017
  6. RStudio.org. [6] "Interact. Analyze. Communicate.", Retrieved on 30 November 2017
  7. tidyverse.org. [7] "R packages for data science", Retrieved on 30 November 2017
  8. Chang, W, RStudio, and etc. [8] "shinythemes: Themes for Shiny", Retrieved on 30 November 2017
  9. Hyndman, R, and etc. [9] "forecast: Forecasting Functions for Time Series and Linear Models", Retrieved on 30 November 2017
  10. 10.0 10.1 www.business-science.io. [10] "Open Source Software For Business & Financial Analysis", Retrieved on 30 November 2017
  11. Vaughan, D, and etc. [11] "tibbletime: Time Aware Tibbles", Retrieved on 30 November 2017
  12. Spinu, V, and etc. [12] "lubridate: Make Dealing with Dates a Little Easier", Retrieved on 30 November 2017
  13. Wickham, H and RStudio. [13] "stringr: Simple, Consistent Wrappers for Common String Operations", Retrieved on 30 November 2017
  14. Xie, Y, and etc. [14] "DT: A Wrapper of the Javascript Library 'DataTables'", Retrieved on 30 November 2017
  15. Wilke, C, and etc. [15] "cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'", Retrieved on 30 November 2017
  16. Sali, A, and etc. [16] "shinycssloaders: Add CSS Loading Animations to 'shiny' Outputs", Retrieved on 30 November 2017