Difference between revisions of "Group 3 Overview"

From Visual Analytics and Applications
Jump to navigation Jump to search
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{|style="background-color:#000066; color:#000066 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#000066; color:#000066 padding: 5px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
| style="padding:0.4em; font-size:100%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[ISSS608 2015 16T1 Group1_Proposal| <font color="#ffffff"><b>PROPOSAL</b></font>]]  
+
| style="padding:0.4em; font-size:100%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[Group_3_Proposal| <font color="#ffffff"><b>PROPOSAL</b></font>]]  
  
 
| style="border-bottom:4px solid #000066; border-top:4px solid #000066; background:none;" width="1%" | &nbsp;  
 
| style="border-bottom:4px solid #000066; border-top:4px solid #000066; background:none;" width="1%" | &nbsp;  
| style="padding:0.4em; font-size:100%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[ISSS608 2015 16T1 Group1__Poster|<font color="#ffffff" size=2><b>POSTER</b></font>]]
+
| style="padding:0.4em; font-size:100%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[Group_3_Poster|<font color="#ffffff" size=2><b>POSTER</b></font>]]
  
 
| style="border-bottom:4px solid #000066; border-top:4px solid #000066; background:none;" width="1%" | &nbsp;  
 
| style="border-bottom:4px solid #000066; border-top:4px solid #000066; background:none;" width="1%" | &nbsp;  
| style="padding:0.4em; font-size:100%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[ISSS608 2015 16T1 Group1__Application|<font color="#ffffff" size=2><b>APPLICATION</b></font>]]
+
| style="padding:0.4em; font-size:100%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[Group_3_Application|<font color="#ffffff" size=2><b>APPLICATION</b></font>]]
  
 
| style="border-bottom:4px solid #000066; border-top:4px solid #000066; background:none;" width="1%" | &nbsp;  
 
| style="border-bottom:4px solid #000066; border-top:4px solid #000066; background:none;" width="1%" | &nbsp;  
| style="padding:0.4em; font-size:90%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[ISSS608 2015 16T1 Group1__Report|<font color="#ffffff" size=2><b>REPORT</b></font>]]
+
| style="padding:0.4em; font-size:90%; background-color:#000066;  border-bottom:4px solid #000066; border-top:4px solid #000066; text-align:center; color:#000066" width="10%" |[[Group_3_Report|<font color="#ffffff" size=2><b>REPORT</b></font>]]
 
|}
 
|}
 +
<br>
 +
=Abstraction=
  
 
+
Geospatial analysis was developed for problems in the environmental and life sciences, which has currently extended to almost all industries including economy, defence, utilities, social sciences and public safety.  The application of geo-visualization using geographically weighted regression (GWR) is an exploratory technique mainly intended to indicate where non-stationarity is taking place on the map. It is a good exploratory analytical tool which creates a set of location based parameter estimates, able to be mapped and analysed to give spatial information for the relationship of explanatory variables and response variable.
 
+
<br>
 
<br>
 
<br>
 +
Our study uses economy data to explore district GDP condition in east region of China. The project scope covers the analysis, model and visual representation of multivariate factors like GDP,Industry Output, Usual Residence,Average Wage,Area,City Construction Rate,No. of higher institution, and ratio of Teacher/Student which contributes to economical development in each city area of the province or municipality with the assistance of interactive charts and graphs.
 
<br>
 
<br>
 
<br>
 
<br>
=Motivation=
+
Conventional linear regression models are commonly used to analyse environmental problems to see the influences and basic relationship of factors which contributes to the economy. We feel that a geographical regression model can also be built to make spatial statistical analysis for the issues related to economy. GWR is a technique by allowing local instead of global parameters to be estimated. The model fits where a localized adjustment conveys a more meaningful message with involvement of spatial areas. It uses a moving window weighting mechanism for localised models detected at target location. Results are mapped to an interactive exploratory geo-application with the consideration of the nature of data spatial heterogeneity.
  
 +
<br>
  
Singapore becomes the world’s 5th most expensive place to rent a house, according to Nested.com (a UK based property website). By average, the rental is at $4.73 psf, which makes a monthly rent of $1,985 for an individual and $3,766 for a family. Given the current market condition, there are a lot of search, filter, and comparison work to do for every house seeker to find a suitable home with good price.
+
=Motivation=
<br>
 
  
There are a wide variety of property searching apps and website in the world, with a smart selection function for the user to filter their favor and find the ideal house. However, currently most property portals only provide a mean to search and display a list of houses based on the filter criteria like price, house size, and bedrooms etc. It does not offer visualization summary in graphs for exploration of number of match-criteria houses in different districts, property distance to metro station or shopping malls and also comparisons for different houses types.
+
Conventional linear regression models are commonly used to analyse economical problems to see the influences and basic relationship of factors which contributes to the economy.  We feel that a geographical regression model can also be built to make spatial statistical analysis for the issues related to economy. GWR is a technique by allowing local instead of global parameters to be estimated. The model fits where a localized adjustment conveys a more meaningful message with involvement of spatial areas. It uses a moving window weighting mechanism for localised models detected at target location. Results are mapped to an interactive exploratory geo-application with the consideration of the nature of data spatial heterogeneity.  
 
<br>
 
<br>
 
<br>
 
<br>
Line 29: Line 32:
 
=Objective=
 
=Objective=
  
 
+
Our objective is to provide an interactive and exploratory geo-visual analytics tool for regional and urban planner and policy maker to visualize, analyse and model the location-based data.  
As data analyst, we are keen to provide a customized visualization tool that can reveal all the relationship among the property features--not only can select the ideal house for the client, but all also demonstrate the parameters' relationship. Users are able to get a visual summary for the property rental market and interact with the visualization via zoom, search, filter and comparison by information factors like rental price, house size, and distance to metro station etc.
 
 
<br>
 
<br>
 
<br>
 
<br>
  
=Selection of Tools=
+
=Data Source and Preparation=
 
 
  
Since this course aims at pursuing the good command of R language, for the display part we will choose R to implement our visualization. To be more precise, we will use R Markdown and the other relevant package like ggplot2,and JavaScript library like d3.js, leafnet.js. At the data preparation step, we will take advantage of JMP Pro and Tableau as well.
+
The following data sources will be used for the project: <br>
</br>
+
1. Economy data from National Bureau of Statistics of China in 2015.<br>  
</br>
 
  
=Design Framework=
+
'''Variable'''         ''',Full Name'''                                ''',Unit'''<br>
 +
City                 ,City Name                                 ,-<br>
 +
Province_Municipality ,Province or Municipality Name                 ,-<br>
 +
GDP_billion_b            ,GDP Value                                 ,RMB (billion)<br>
 +
Primary_Industry_b      ,Primary Industry Output Value                 ,RMB (billion)<br>
 +
Secondary_Industry_b    ,Secondary Industry Output Value         ,RMB (billion)<br>
 +
Teriary_Industry_b      ,Teriary Industry Output Value                 ,RMB (billion)<br>
 +
Usual_Residence_k        ,No. of usual residence                 ,thousand<br>
 +
Average_wage_RMB        ,Average wage                                 ,RMB (digit)<br>
 +
Area_sqkm                ,Total Area Size                         ,sqkm<br>
 +
City_Construction_Rate  ,Rate of city construction                 ,%<br>
 +
No_Higher_Institution    ,No. of higher institution                 ,digit<br>
 +
Teacher/Student          ,Ratio of teacher vs student                 ,%<br>
  
Guided by the visual information seeking mantra “overview first, zoom and filter, then details-on-demand, We aim to, first of all, provide a parallel coordinate graph which lists all the criteria (district, price, house size, bedrooms, distance to metro station and shopping malls etc.)  In the line bar, there will be a brush box for each data point. By drag and move the brush, relevant lines across all the bars will be highlighted and display the selected cluster data characteristics.
 
 
<br>
 
<br>
In addition, we will also show all the match-criteria houses in a tree map with color areas or in a territory map with number bubbles showing the district average price per psm and/or the number of houses. Filters are used here to select houses with above mentioned criteria. It gives the user a holistic view of summary for all the available properties. And if possible, we will also add features of zooming in and out to allow user to see more detail in particular interested area and show a list of available properties by clicking on the bubble number. 
 
<br>
 
Lastly, we plan to display a scatter plot chart to show off the position for the selected property and some nearby houses in the same district and a trend line of average district rental price for comparison. This will help the user to compare the house price and choose the most desired one.
 
  
 +
=Approach=
  
=Challenge=
+
The main framework used for the application is RShiny and Leaflets. There are some packages in R to build a smooth interactive interface for extracting, exploring, and modelling the data. We implemented GWR model and parallel coordinate graph for spatial analysis, which enables us to study the impact and influence of various factors contributes to the local economical growth situation. The interactive graphs and charts has also been created to allow the user to apply different scenarios and make deep exploration.  
1.The interaction of multiple graphs and maps with criteria filter.
 
 
<br>
 
<br>
2.To retrieve the geocode for each property and plot the distance to metro station.
 
 
<br>
 
<br>
3.To create the interactive parallel coordinate graph with brush box feature.  
+
=Selection of Tools=
 +
Since this course aims at pursuing the good command of R language, for the display part we will choose R to implement our visualization. To be more precise, we will use R Markdown and the other relevant package like spgwr, pryr, gwmodel, tidyverse, rgdal, RColorBrewer, lubridate, ggplot2,ggraph, and JavaScript library like d3.js, leftlet.js. At the data preparation step, we will take advantage of JMP Pro and Tableau as well.  
 
<br>
 
<br>
4.Find the most suitable tools and libraries to implement the visual features.  
+
<br>
 +
 
 +
=Challenge=
 +
1.The preparation of dataset to fulfil the minimum model requirement <br>
 +
2.The interaction of R graphs and maps with Shiny. <br>
 +
3.The building of an effective and accurate GRW model in R. <br>
 +
4.Find the most suitable tools and libraries to implement the visual features.
  
  
 
=Reference=
 
=Reference=
  
[1]http://www.science.smith.edu/classwiki/images/c/cd/Informationvisualization2.pdf
+
[1] https://rstudio-pubs-static.s3.amazonaws.com/44975_0342ec49f925426fa16ebcdc28210118.html<br>
<br>
+
[2] https://www.rdocumentation.org/packages/spgwr/versions/0.6-32/topics/gwr  <br>
[2] https://eagereyes.org/techniques/parallel-coordinates
+
[3] https://cran.r-project.org/web/packages/spgwr/vignettes/GWR.pdf<br>
 +
[4] http://csiss.org/GISPopSci/workshops/2011/PSU/readings/Grose-Brunsdon-Harris-GWR.pdf<br>

Latest revision as of 12:21, 3 December 2017

PROPOSAL   POSTER   APPLICATION   REPORT


Abstraction

Geospatial analysis was developed for problems in the environmental and life sciences, which has currently extended to almost all industries including economy, defence, utilities, social sciences and public safety. The application of geo-visualization using geographically weighted regression (GWR) is an exploratory technique mainly intended to indicate where non-stationarity is taking place on the map. It is a good exploratory analytical tool which creates a set of location based parameter estimates, able to be mapped and analysed to give spatial information for the relationship of explanatory variables and response variable.

Our study uses economy data to explore district GDP condition in east region of China. The project scope covers the analysis, model and visual representation of multivariate factors like GDP,Industry Output, Usual Residence,Average Wage,Area,City Construction Rate,No. of higher institution, and ratio of Teacher/Student which contributes to economical development in each city area of the province or municipality with the assistance of interactive charts and graphs.

Conventional linear regression models are commonly used to analyse environmental problems to see the influences and basic relationship of factors which contributes to the economy. We feel that a geographical regression model can also be built to make spatial statistical analysis for the issues related to economy. GWR is a technique by allowing local instead of global parameters to be estimated. The model fits where a localized adjustment conveys a more meaningful message with involvement of spatial areas. It uses a moving window weighting mechanism for localised models detected at target location. Results are mapped to an interactive exploratory geo-application with the consideration of the nature of data spatial heterogeneity.


Motivation

Conventional linear regression models are commonly used to analyse economical problems to see the influences and basic relationship of factors which contributes to the economy. We feel that a geographical regression model can also be built to make spatial statistical analysis for the issues related to economy. GWR is a technique by allowing local instead of global parameters to be estimated. The model fits where a localized adjustment conveys a more meaningful message with involvement of spatial areas. It uses a moving window weighting mechanism for localised models detected at target location. Results are mapped to an interactive exploratory geo-application with the consideration of the nature of data spatial heterogeneity.

Objective

Our objective is to provide an interactive and exploratory geo-visual analytics tool for regional and urban planner and policy maker to visualize, analyse and model the location-based data.

Data Source and Preparation

The following data sources will be used for the project:
1. Economy data from National Bureau of Statistics of China in 2015.

Variable ,Full Name ,Unit
City ,City Name ,-
Province_Municipality ,Province or Municipality Name ,-
GDP_billion_b ,GDP Value ,RMB (billion)
Primary_Industry_b ,Primary Industry Output Value ,RMB (billion)
Secondary_Industry_b ,Secondary Industry Output Value ,RMB (billion)
Teriary_Industry_b ,Teriary Industry Output Value ,RMB (billion)
Usual_Residence_k ,No. of usual residence ,thousand
Average_wage_RMB ,Average wage ,RMB (digit)
Area_sqkm ,Total Area Size ,sqkm
City_Construction_Rate ,Rate of city construction ,%
No_Higher_Institution ,No. of higher institution ,digit
Teacher/Student ,Ratio of teacher vs student ,%


Approach

The main framework used for the application is RShiny and Leaflets. There are some packages in R to build a smooth interactive interface for extracting, exploring, and modelling the data. We implemented GWR model and parallel coordinate graph for spatial analysis, which enables us to study the impact and influence of various factors contributes to the local economical growth situation. The interactive graphs and charts has also been created to allow the user to apply different scenarios and make deep exploration.

Selection of Tools

Since this course aims at pursuing the good command of R language, for the display part we will choose R to implement our visualization. To be more precise, we will use R Markdown and the other relevant package like spgwr, pryr, gwmodel, tidyverse, rgdal, RColorBrewer, lubridate, ggplot2,ggraph, and JavaScript library like d3.js, leftlet.js. At the data preparation step, we will take advantage of JMP Pro and Tableau as well.

Challenge

1.The preparation of dataset to fulfil the minimum model requirement
2.The interaction of R graphs and maps with Shiny.
3.The building of an effective and accurate GRW model in R.
4.Find the most suitable tools and libraries to implement the visual features.


Reference

[1] https://rstudio-pubs-static.s3.amazonaws.com/44975_0342ec49f925426fa16ebcdc28210118.html
[2] https://www.rdocumentation.org/packages/spgwr/versions/0.6-32/topics/gwr
[3] https://cran.r-project.org/web/packages/spgwr/vignettes/GWR.pdf
[4] http://csiss.org/GISPopSci/workshops/2011/PSU/readings/Grose-Brunsdon-Harris-GWR.pdf