Difference between revisions of "Group10 Overview"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
<!-- BANNER -->
 
<!-- BANNER -->
[[Image:Banner.jpg|1000px|right|width="100%"]]
+
[[Image:Banner.png|1000px|centre|width="100%"]]
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
 
{|style="background-color:#00FFFF;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#00FFFF;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
Line 22: Line 22:
  
 
== Background ==
 
== Background ==
 +
There are over 10,000 packages in R that supports many economic and financial analysis. Many analyis methods and alogorithms out there fail to be utilised or optimised by the users. They are either poorly derived with great visualization or accurately derived with poor visualization. One such analysis is Time Series analysis, thus we have taken up the housing price Index of China Housing Market over 5 years. The analysis is time series analysis of the housing prices data over 5 ears using the state of the art time series clustering. Thus allowing better grouping . This analysis have also been presented in most efficient ways.
 +
== Time Series Analysis ==
 +
Time series analysis is about analyzing time series data to understand the characteristics and derive conclusions based on statistical results from the data.The methodology we have used is clustering and forecasting.
 +
=== Clustering ===
 +
The clustering is the grouping of the similar variable. The time series cluster is that which groups the variable that behaves similarly over  a period of time. Unlike most of the time series clustering which use the Eclidean model to perform time series clustering, the algorithm behind the time series clustering is the DTW analysis which is based on the distance measure of the variables over a time period.
  
=== Time Series Analysis ===
+
=== Forecasting ===
 +
Time series analysis is about analyzing time series data to understand the characteristics and derive conclusions based on statistical results from the data.The methodology we have used is clustering and forecasting. The forecasting is done based on ARIMA model to predict on the next two years which is also compared to the actual results to validate the model. Firstly we built ARIMA forecasting model then convert it to “tidy” data frames by sweep package, last we use grid built by ourselves to visualize the trend and forecast for each city.
  
+
=== Case Application ===
 +
The Housing Price Index is a major macro economic factor. It not just reflects the housing market but also the economy as a whole. The Housing Prices of each city are analysed and comparative analysis is provided to derive further analysis on them.
  
=== Clustering Analysis ===
+
== Data Preparation ==
 
+
The data used is from the CEIC Data of the Housing Price Index of Cities in China. Total 48 cities are selected, those cities contains first-tier, second-tier and third-tier cities.The price index is collected from so there are all 3 datasets.
 
+
{| class="wikitable"
 
 
[[Image:Geoclustering.jpeg|500px|right|width="100%"]]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
== Motivation ==
 
 
 
The real-estate market is ever growing and has more stakeholders. We are here to build an app that makes an analysis of the housing price market data in an easy and effective way by just a few clicks. Major stakeholders like economists and  agenst can get a a better understanding of the market using the different clustering methods and time series analysis and forecast analysis on geographic map.
 
 
 
 
 
== Objectives ==
 
1. Efficient Interactive Dashbooard
 
 
 
2. Geographic Understanding
 
 
 
 
 
 
 
== Data Source ==
 
 
 
Ceic Housing Price Index
 
== Methodology ==
 
 
 
=== Exploratory Analysis ===
 
 
 
We will explore the different trends of time-series data provided by the various economic data sets (Period cyclicity and seasonality). Different interactions of identified attributes might provide certain data insights that we can use for our analysis.
 
 
 
=== Explanatory Analysis ===
 
Relationships between our data will be explained based on our understanding of possible real-world events or causes. Using our CPI use-case as an example, the difference in CPI between the months of June and December can be explained as a result of the holiday seasons causing an increase of demand for clothing in December.
 
 
 
=== Predictive Analysis ===
 
We can use analytics techniques such as Exponential Smoothing and ARIMA to predict future trends of our time-series data, due to the data's cyclical and seasonal nature.
 
 
 
== Application ==
 
 
 
The proposed system would have three major functions:
 
 
 
'''Data Manipulation:'''
 
 
 
xxx
 
 
 
'''Data Exploration:'''
 
xxx
 
 
 
'''Forecasting:'''
 
xxx
 
 
 
== Application Libraries & Packages ==
 
{|class="wikitable"  
 
|-
 
! Package Name !! Descriptions
 
|-
 
| ''Shiny''  || Interactive web applications for data visualization
 
|-
 
| ''Tidyverse: tidyr, dplyr, ggplot2''  || Tidying and manipulating data for visualizing in ggplot2
 
|-
 
| ''Shinythemes''  || Provide consistent UI elements for aesthetics
 
|-
 
| ''forecast, broom, sweep''  || Packages used to "tidy" data models for easy forecasting. Forecast package uses ''ts'' objects that is difficult to manipulate. sw_sweep from the sweep package uses broom-style tidiers to extract model infomation into 'tidy' data frames. sweep package also uses timekit at the back-end to maintain the original time series index throughout the whole process.
 
|-
 
| ''tibbletime''  || Time-based data subsetting
 
 
|-
 
|-
| ''lubridate''  || Easy manipulation of datetime data
+
! Name !! Description
 
|-
 
|-
| ''timetk''  || Extracting/checking of datetime index from ts objects
+
| property price index (2010=100)_New constructed || The property price index for 48 cities, the data is a monthly data which start from 2010 until to 2015. The price index is based on 2010. Regard price in 2010 as 100.
 
|-
 
|-
| ''stringr'' || String manipulation
+
| Geolocation || The longitude and latitude of the 48 cities.
|-
 
| ''DT''  || Sortable data table UI element for model accuracy measures
 
|-
 
| ''cowplot''  || Graph arrangement of ''ggplots'' in a single renderPlot function
 
|-
 
| ''shinycssloaders''  || Loading animation for large data loading and model training
 
 
|-
 
|-
 +
| Geofacet || The location information for 48 cites which will used in the ‘Geofacet’ package. 
 
|}
 
|}
 +
 +
== Application and Analysis == 
 +
The application allows the used to conduct the different time series clustering and forecasting between the cities that they wish to see.

Latest revision as of 15:54, 10 December 2017

width="100%"

Proposal

Poster

Application

Report


Background

There are over 10,000 packages in R that supports many economic and financial analysis. Many analyis methods and alogorithms out there fail to be utilised or optimised by the users. They are either poorly derived with great visualization or accurately derived with poor visualization. One such analysis is Time Series analysis, thus we have taken up the housing price Index of China Housing Market over 5 years. The analysis is time series analysis of the housing prices data over 5 ears using the state of the art time series clustering. Thus allowing better grouping . This analysis have also been presented in most efficient ways.

Time Series Analysis

Time series analysis is about analyzing time series data to understand the characteristics and derive conclusions based on statistical results from the data.The methodology we have used is clustering and forecasting.

Clustering

The clustering is the grouping of the similar variable. The time series cluster is that which groups the variable that behaves similarly over a period of time. Unlike most of the time series clustering which use the Eclidean model to perform time series clustering, the algorithm behind the time series clustering is the DTW analysis which is based on the distance measure of the variables over a time period.

Forecasting

Time series analysis is about analyzing time series data to understand the characteristics and derive conclusions based on statistical results from the data.The methodology we have used is clustering and forecasting. The forecasting is done based on ARIMA model to predict on the next two years which is also compared to the actual results to validate the model. Firstly we built ARIMA forecasting model then convert it to “tidy” data frames by sweep package, last we use grid built by ourselves to visualize the trend and forecast for each city.

Case Application

The Housing Price Index is a major macro economic factor. It not just reflects the housing market but also the economy as a whole. The Housing Prices of each city are analysed and comparative analysis is provided to derive further analysis on them.

Data Preparation

The data used is from the CEIC Data of the Housing Price Index of Cities in China. Total 48 cities are selected, those cities contains first-tier, second-tier and third-tier cities.The price index is collected from so there are all 3 datasets.

Name Description
property price index (2010=100)_New constructed The property price index for 48 cities, the data is a monthly data which start from 2010 until to 2015. The price index is based on 2010. Regard price in 2010 as 100.
Geolocation The longitude and latitude of the 48 cities.
Geofacet The location information for 48 cites which will used in the ‘Geofacet’ package.

Application and Analysis

The application allows the used to conduct the different time series clustering and forecasting between the cities that they wish to see.