Difference between revisions of "Group08 Proposal"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 34: Line 34:
  
 
<h1><span lang=EN-GB>Dataset</span></h1>
 
<h1><span lang=EN-GB>Dataset</span></h1>
Source: Alibaba Cloud TIANCHI Competition: [https://tianchi.aliyun.com/getStart/introduction.htm?spm=5176.11409106.5678.1.12b13a01KYqb3C&raceId=231573&_lang=zh_CN The Purchase and Redemption Forecasts - Challenge the Baseline]. The dataset from this competition comprises of Yu'e Bao user’s profiles, transaction behaviour over time, and financial interest rates, in 4 CSV tables:
+
Source: Alibaba Cloud TIANCHI Competition: [https://tianchi.aliyun.com/getStart/introduction.htm?spm=5176.11409106.5678.1.12b13a01KYqb3C&raceId=231573&_lang=zh_CN The Purchase and Redemption Forecasts - Challenge the Baseline]. The dataset from this competition comprises of Yu'e Bao user’s profiles, transaction behaviour, and financial interest rates over time, in 4 CSV tables:
  
There are 4 table to this dataset:
 
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
Line 47: Line 46:
 
| <b><i>mfd_day_share_interest.csv</i></b>|| 427 observed dates from 1<sup>st</sup> Jul 2013 to 31<sup>st</sup> Aug 2014 and the corresponding Yu’e Bao’s daily and 7 daily interest rates.
 
| <b><i>mfd_day_share_interest.csv</i></b>|| 427 observed dates from 1<sup>st</sup> Jul 2013 to 31<sup>st</sup> Aug 2014 and the corresponding Yu’e Bao’s daily and 7 daily interest rates.
 
|-
 
|-
| <b><i>mfd_bank_shibor.csv</i></b>|| 294 observed dates from 1<sup>st</sup> Jul 2013 to 31<sup>st</sup> Aug 2014 and the corresponding 8 types of [http://www.shibor.org/shibor/web/html/index_e.html Shibor] interest rates, from Overnight interest rates to yearly interest rates.
+
| <b><i>mfd_bank_shibor.csv</i></b>|| 294 observed dates from 1<sup>st</sup> Jul 2013 to 31<sup>st</sup> Aug 2014 and the corresponding 8 types of [http://www.shibor.org/shibor/web/html/index_e.html Shibor] interest rates, from overnight interest rates to yearly interest rates. Although the time frame is the same as above, the number of observations is less here because there are no Shibor interest rates data for weekends and public holidays.
 
|}
 
|}
  
[[Image:Metadata.png|frame|none|alt=Alt text|Full dataset description of the data variables can be found here]
+
[[Image:g8_Metadata.png|thumb|alt=Alt text|Full dataset description of the data variables can be found here|none]]
 
 
  
 
<h1><span lang=EN-GB>Visualization and Analysis</span></h1>
 
<h1><span lang=EN-GB>Visualization and Analysis</span></h1>
The main goal of the TIANCHI competition is to train a model to predict future cash flow of Yu’e Bao users to aid [https://www.antfin.com/index.htm?locale=en_US Ant Financial Services Group], Alibaba Group’s affiliate company operating Alipay, in its business of processing the cash inflow and outflow of its users. In contrast, our group choose to provide a alternate view of the data by implementing the below analysis and visualizations:
+
The goal of the TIANCHI competition is to train a model to predict future cash flow of Yu’e Bao users to aid [https://www.antfin.com/index.htm?locale=en_US Ant Financial Services Group], Alibaba Group’s affiliate company operating Alipay, in its business of processing the cash inflow and outflow of its users. To provide an alternative view and insights to this data, our group has chosen to implement the below modules in the Shiny app:
 +
<br>
 
=== Data Exploration and Visualization ===
 
=== Data Exploration and Visualization ===
In this module, we aim to provide an interactive data explorer to visualize the data and in different ways. We will employ different visualization techniques, eg. treemaps, heatmaps, corrplots, to demonstrate the interaction and relationships between different combination of categorial and interval variables.  
+
In this module, we aim to provide an interactive data explorer to visualize the data and in different ways. We will employ different visualization techniques, eg. treemaps, heatmaps, corrplots, time-series line graphs, to demonstrate the interaction and relationships between different combination of cash flow behaviour user profile and financial data. App users will have the freedom to select the variables of their interest to dynamically generate the corresponding visualization.
 +
 
  
 
=== Objective 2: Survival Analysis and Visualization ===
 
=== Objective 2: Survival Analysis and Visualization ===

Revision as of 21:23, 21 November 2018

G8 Logo.jpg  Visualizing Future of Crowd Funding with Yu’e Bao

PROPOSAL

POSTER

APPLICATION

REPORT


40 Thieves Members

 ¥  Wong Yam Yip
 ¥  Wu Jinglong
 ¥  Song Chenxi

Abstract

Yu’e Bao (余额宝) is an investment product offered by Alipay (支付宝), a mobile and online payment platform established by China’s multinational conglomerate Alibaba Group. In June 2013, Alibaba Group launched Yu’e Bao, in collaboration with Tianhong Asset Management Co., Ltd., to form the first internet fund in China. Since then, Yu’e Bao has become the nation’s largest money market fund and, by Feb 2018, has US$251 billion under its management. In Chinese, Yu’e Bao represents “Leftover Treasure”. Alipay users can deposit their extra cash, for example, leftover from online shopping, into this investment product. The money will be invested via a money market fund with no minimum amount or exit charges, with interest paid on a daily basis. While major banks offer 0.35% annual interest on deposits, Yu’e Bao may offers user 6% interest with the convenience and freedom to deposit and withdraw anytime via Alipay mobile app. Thus, Yu’e Bao became extremely popular in China.

Using various data visualization methodologies, coupled with analysis of survival and time-series, this project aims to build an interactive tool on R Shiny framework, to unearth the underlying treasures of associations between Yu’e Bao’s user profile, behaviour, time and other financial factors.

Dataset

Source: Alibaba Cloud TIANCHI Competition: The Purchase and Redemption Forecasts - Challenge the Baseline. The dataset from this competition comprises of Yu'e Bao user’s profiles, transaction behaviour, and financial interest rates over time, in 4 CSV tables:

Table Name Description
user_balance_table.csv 2,840,421 observations of the cash flow time-series data from 28,041 Yu’e Bao users for 14 months from 1st Jul 2013 to 31st Aug 2014. Cash flow data includes 18 variables of account balances, different types of deposits, withdrawals, interest earned and categories of purchase if funds are used to make online purchases.
user_profile_table.csv 28,041 rows of user profile data that describes the user’s gender, zodiac sign, and registered city, based on each user ID, in 4 columns.
mfd_day_share_interest.csv 427 observed dates from 1st Jul 2013 to 31st Aug 2014 and the corresponding Yu’e Bao’s daily and 7 daily interest rates.
mfd_bank_shibor.csv 294 observed dates from 1st Jul 2013 to 31st Aug 2014 and the corresponding 8 types of Shibor interest rates, from overnight interest rates to yearly interest rates. Although the time frame is the same as above, the number of observations is less here because there are no Shibor interest rates data for weekends and public holidays.
Alt text
Full dataset description of the data variables can be found here

Visualization and Analysis

The goal of the TIANCHI competition is to train a model to predict future cash flow of Yu’e Bao users to aid Ant Financial Services Group, Alibaba Group’s affiliate company operating Alipay, in its business of processing the cash inflow and outflow of its users. To provide an alternative view and insights to this data, our group has chosen to implement the below modules in the Shiny app:

Data Exploration and Visualization

In this module, we aim to provide an interactive data explorer to visualize the data and in different ways. We will employ different visualization techniques, eg. treemaps, heatmaps, corrplots, time-series line graphs, to demonstrate the interaction and relationships between different combination of cash flow behaviour user profile and financial data. App users will have the freedom to select the variables of their interest to dynamically generate the corresponding visualization.


Objective 2: Survival Analysis and Visualization

Performing Survival analysis....

Objective 3: Time-Series Clustering and Visualization

dtwclust time series cluster....

Challenges

Libraries

The below R libraries will be considered for the project

  • shiny
  • shinydashboard
  • shinyWidgets
  • dashboardthemes
  • tidyverse
  • lubridate
  • ggplot2
  • plotly
  • lattice
  • xts
  • treemap
  • d3treeR
  • survival
  • ggfortify
  • survminer
  • dplyr
  • TSclust
  • dtwclust
  • cluster