Difference between revisions of "Group 10 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 50: Line 50:
 
|<center><b>Usage</b></center>
 
|<center><b>Usage</b></center>
 
|-
 
|-
|[[File:dtwclust.png|500px]]<br/>
+
|[[File:Picture102.png|500px]]<br/>
 
<center>'''Recharts'''</center>
 
<center>'''Recharts'''</center>
 
|
 
|
Line 62: Line 62:
 
|
 
|
 
|-
 
|-
|[[File:timetk.png|500px]]<br/>
+
|[[File:Picture105.png|500px]]<br/>
 
<center>'''Corrplot''' </center>
 
<center>'''Corrplot''' </center>
 
|
 
|
 
In
 
In
 
|-
 
|-
|[[File:corrplot.png|500px]]<br/>
+
|[[File:Picture2014.png|500px]]<br/>
 
<center>'''Geofacet'''</center>
 
<center>'''Geofacet'''</center>
 
|
 
|
Line 75: Line 75:
 
|[[File:geofacet.png|500px]]<br/>
 
|[[File:geofacet.png|500px]]<br/>
 
|}
 
|}
 
 
 
 
 
 
  
 
== Analysis and Application ==
 
== Analysis and Application ==

Revision as of 00:02, 4 December 2017

width="100%"

Proposal

Poster

Application

Report


Motivation and Objective

Problem

Many analysis methods and algorithms out there fail to be utilized or optimized by the users. They are either poorly derived with great visualization or accurately derived with poor visualization. R has over 10000 packages that support visualization for advanced analysis too. There is a gap in the potentiality of R and what we use in day to day life. The mainstream packages are very few which cover basic analysis and algorithms. There are many data that require advanced analysis to come up with more accurate and dependable results.

Especially, many economic data and macro economic data have a lot of scope statistically to be analysed and give major indications on the economy and other influenced markets. Even if they could be built they do not have easy access and usability to such application. So the people are loss are economists and financiars who lacks the access to applications that can easily and quickly analyse the required data in the advanced model through efficient visualuatization tools.

Picture101.png

Solution

We have put together the best method of analysing time series in a most efficient way.The advanced time series analysis is based on state of the art time series clustering and also forecasting based on ARIMA model. This provides an accurate model for analysis of time series clustering based on DTW algorithm. This method has been visualised in different way to help in easier comparison and understanding of the characteritics of the data. The Rshiny application makes it easier for the user to choose the different the different option tha is part of the analysis. Just by few click the result is varied based on the chosen algorithm.

Housing Price Index is a major economic factor that not just depicts the housing market but also the economy.The housing prices analysis of 48 cities of China. To understand their trend and compare them with the other cities based on the distance measure. The advanced time series analysis methods helps in understanding the response of the cities and doing a comparative study over a period.

Data Preparation and Packages Used

Data Design

The data is the Housing Price Index for 5 years from 2010-2015.The data is taken from CEIC page and was cleaned to get them to 48 major cities. Different caliberations of the data was done for using them for different analysis.

To build the grid in geaofacet the the geographic location of these cities were also required. This grid was then cutom made by placing each city in the specific location on a china map that up there in github.

R Packages Used

DTWClust
Usage
Picture102.png
Recharts

Adjust

Timetk

In input control

Picture105.png
Corrplot

In

500px
Geofacet

g

500px

Analysis and Application

The analysis has been represented in different ways.the distance measure which is the fundamental algorithm has been represented using the correlation plot. Based on this similarity the clusters have been made.They have been mainly combined with the geographic location. The geographic location of the analysis helps in better comparison of the analysis with regards to their position in the country. There by analysis can be made whether the region or being under a particular location influence the result of the analysis. Thefore the recharts has been use to shoe the clustering in a china map and the forcasting has been showing in the china map using the geo-facet .

Time Series Analysis

The users will be able to choose the cities they want to see the trend for. And the chosen cities will be presented. The plot of the time series trend are based on the GGPlot . The can view and compare 2or more cities by choosing the requirements. This way they can analyse close togther how the trend have been at the two selected location

Clustering

Initially the distance measure between the cities are computed. This is represented in the correlation plot. Each city can be compared against the other city. The darker the shade of the box the more correlated theya re than if they are closer towards dark red which states that theya re negatively correlated. Based on this distance measure algorith is the clustering based on. The clustering has been presented based on the users choice of the clustering details. The user get to input the number of clusters, the type of clustering and the distance measure behind the clustering. This way t allowing the users to have detailed analysis based on their optimum configuration.The clusters are representted together with their underlying cities trends. The appliciation changes as per the users requirements. Apart from the charts together, they clusters have also been depcted in the maps. The cities are clustered based in a colour. Thre by with the shades at different cities in the map, the users can understand where exactly are the clusters majorly location and if there are any outliers.

Forecating

Forecasting hs been represented using the timetk and geofacet package. The forecasted time series are shown against their actual time series and are shown over the china map based on the from scratch made grid.

Future Scope

The future scope is that we can recommend the number of selected cluster based on CVI and best methodology based on compactness and separation within clusters. Currently it is used for just HPI but other data can be added to expand the property market analysis. The data can also be any other time series data. This algorithm is even used in machine learning that any data being time bound can be feeded into the system to allow for analysis. The scope can be expanded, in the future, the application can use for larger region such as province, country even for intercontinental.