Group09 Proposal
|  |  |  |  |  | 
Background
An interesting research conducted by Researchers Emer Soyer and Robin Hogarth has tested the same question concerning a dataset on three groups of economists. And the results show that the group that was given only the graph performs better on the question compared with the group that was given the dataset and a standard statistical analysis of the data. This result suggests that the data visualization provides context and accurate representation of the numbers and will help the user extract important information from data quickly and efficiently.
In our general impression, when we access a trading platform to make the investment, we need to deal with plenty of price data to make our investment decision, which includes the opening price, closing price, highest price of the day and lowest price of the day. And the Kline is the most popular visualization chart of the stock data for investors to refer to. However, if a fresh investor is not very professional and sensitive to the financial data, he may be distracted by various price data and are not able to make the appropriate financial decision. 
Therefore, visualization of stock market data is quite useful for technical stock market analysis and will help investors to gain a comprehensive understanding on how the stock market is changing, which lead to our analysis objective for this project.
Project Objective
This project will focus on visualizing the stock market data from many perspectives. 
Firstly, we apply various visualization tools such as scatter plot, treemap to derive useful and interesting insights from the IPO companies. Also, we will map the companies and visualize the companies’ characteristics on the map, for example, visualizing the companies by market shares in the different region. 
Secondly, to discover the trend in each issuer and find the best model to predict the future stock values of each issuer, we will perform time-series analysis using R programming on a sequential dataset. Visualizing our time-series data also enables us to make inferences about important components such as trend (either a long-term increase or decrease) and seasonality (appear with a pattern that repeats over a fixed period of time).
Data Source
The data set we are using consist of quite a lot of attributes, but we will mainly focus on the following characteristics to do our analysis.
| Company name | Indicate the issuer name of the stock | 
| Stock Code | Indicate the stock symbol | 
| Market type | Indicate the market of the issuer (Growth market, Shang Hai, Shen Zhen) | 
| IPO date | Indicate the IPO date of issuer | 
| Total capital stock | Indicate the number of common and preferred shares that a company is authorized to issue | 
| Region | Indicate the location of the issuer | 
| IPO price | Indicate the price of IPO date | 
| Total No of shares issued | Indicate the total No of shares that issued on IPO date | 
| IPO value | Equals to companies’ IPO price*IPO value | 
Methodology
- Histogram:
Histogram gram is an accurate graphical representation of the distribution of numeric data. The variable is cut into several bins, and the number of observation per bin is represented by the height of the bar. The shape of the histogram can be different according to the number of bins you set. We will visualize the distribution of the IPO value by market type ( Growth market, Shen Zhen and Guang Zhou) and the customer can interact with the bar chart to customize their own visualization results.
- Treemap:
Treemap display hierarchical data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node’s rectangle has an area proportional to a specified dimension of the data. It is an alternative way of visualizing the structure of a tree diagram. We will use the Treemap library in R to visualize the data.
- Scatter Plot:
A Scatterplot displays the value of 2 sets of data on 2 dimensions. Each dot represents an observation. The position on the X (horizontal) and Y (vertical) axis represents the values of the 2 variables. It is useful to study the relationship between both variables. It is common to provide even more information using colors or shapes (to show groups or a third variable). It is also possible to map another variable to the size of each dot, what makes a bubble plot. If you have many dots and struggle with overplotting, consider using a 2D density plot.
- Time-series Analysis:
In R programming, we can use ts() function to convert a numeric vector into an R time series object. The format is ts(vector, start=, end=, frequency=) where start and end are the times of the first and last observation and frequency is the number of observations per unit time (1=annual, 4=quartly, 12=monthly, etc.). Also, the forecast package provides functions for the automatic selection of exponential and ARIMA models. The ets() function supports both additive and multiplicative models. The auto.arima() function can handle both seasonal and nonseasonal ARIMA models. Models are chosen to maximize one of several fit criteria.
- Portfolio Analysis:
The PerformanceAnalytics package consolidates functions to compute many of the most widely used performance metrics. tidquant integrates this functionality so it can be used at scale using the split, apply, combine framework within the tidyverse. Two primary functions integrate the performance analysis functionality: tq_performance implements the performance analysis functions in a tidy way, enabling scaling analysis using the split, apply, combine framework. tq_portfolio provides a useful toolset for aggregating a group of individual asset returns into one or many portfolios.
Future Work
Because of data limitation, we are only able to extract stock data from SZSE and SSE. Therefore, a lack of representation will be the main shortcoming of our analysis. In the future, we will extend our analysis to more foreign stock markets such as NYSE, LSE, etc. In addition, since the time series data for all stocks in every year will be too large to download, thus only a portion is taken for analysis in this project.
Also, to validate our analysis, we need to test our visualization together with professionals using technical chart analysis. And modify our analysis and apply it in real financial market decision making.
References
ggplot2
Treemap
Shanghai Stock Exchange A Share Index
Stock analysis
Performance Analysis with tidyquant
R Quantitative Analysis Package Integrations in tidyquant


