Difference between revisions of "Group08 Proposal"
Line 29: | Line 29: | ||
<h1><span lang=EN-GB>Abstract</span></h1> | <h1><span lang=EN-GB>Abstract</span></h1> | ||
− | Yu’e Bao (余额宝) is an investment product offered by Alipay (支付宝), a mobile and online payment platform established by China’s multinational conglomerate Alibaba Group. In June 2013, Alibaba Group launched Yu’e Bao, in collaboration with Tianhong Asset Management Co., Ltd., to form the first internet fund in China. Since then, Yu’e Bao has become the nation’s largest money market fund and, by Feb 2018, has US$251 billion under its management. In Chinese, Yu’e Bao represents “Leftover Treasure”. Alipay users can deposit their extra cash, | + | [http://yuebao.thfund.com.cn/ Yu’e Bao (余额宝)] is an investment product offered by [https://www.alipay.com/ Alipay (支付宝)], a mobile and online payment platform established by China’s multinational conglomerate [https://www.alibabagroup.com/en/global/home Alibaba Group]. In June 2013, Alibaba Group launched Yu’e Bao, in collaboration with [http://www.thfund.com.cn/en/index.html Tianhong Asset Management Co., Ltd.], to form the first internet fund in China. Since then, Yu’e Bao has become the nation’s largest money market fund and, by Feb 2018, has [https://yourstory.com/2018/08/alibaba-yue-bao-unearthed-hidden-treasure-from-digital-wallets/ US$251 billion] under its management. In Chinese, Yu’e Bao represents “Leftover Treasure”. Alipay users can deposit their extra cash, for example, leftover from online shopping, into this investment product. The money will be invested via a money market fund with no minimum amount or exit charges, with interest paid on a daily basis. While major banks offer 0.35% annual interest on deposits, Yu’e Bao may offers user 6% interest with the convenience and freedom to deposit and withdraw anytime via Alipay mobile app. Thus, Yu’e Bao became extremely popular in China. |
Using various data visualization methodologies and techniques, coupled with survival analysis and time-series clustering, this project aims to build an interactive tool on R Shiny framework, so as to unearth the underlying treasures of associations between Yu’e Bao’s user profiles, behaviour, time and other financial factors. | Using various data visualization methodologies and techniques, coupled with survival analysis and time-series clustering, this project aims to build an interactive tool on R Shiny framework, so as to unearth the underlying treasures of associations between Yu’e Bao’s user profiles, behaviour, time and other financial factors. |
Revision as of 20:26, 23 November 2018
|
|
|
|
Contents
40 Thieves Members
¥ Wong Yam Yip
¥ Wu Jinglong
¥ Song Chenxi
Abstract
Yu’e Bao (余额宝) is an investment product offered by Alipay (支付宝), a mobile and online payment platform established by China’s multinational conglomerate Alibaba Group. In June 2013, Alibaba Group launched Yu’e Bao, in collaboration with Tianhong Asset Management Co., Ltd., to form the first internet fund in China. Since then, Yu’e Bao has become the nation’s largest money market fund and, by Feb 2018, has US$251 billion under its management. In Chinese, Yu’e Bao represents “Leftover Treasure”. Alipay users can deposit their extra cash, for example, leftover from online shopping, into this investment product. The money will be invested via a money market fund with no minimum amount or exit charges, with interest paid on a daily basis. While major banks offer 0.35% annual interest on deposits, Yu’e Bao may offers user 6% interest with the convenience and freedom to deposit and withdraw anytime via Alipay mobile app. Thus, Yu’e Bao became extremely popular in China.
Using various data visualization methodologies and techniques, coupled with survival analysis and time-series clustering, this project aims to build an interactive tool on R Shiny framework, so as to unearth the underlying treasures of associations between Yu’e Bao’s user profiles, behaviour, time and other financial factors.
Dataset
The source of data is Alibaba Cloud, TIANCHI, Competition: The Purchase and Redemption Forecasts - Challenge the Baseline. The dataset from this competition comprises of Yu'e Bao user’s profiles, transaction behaviour, and financial interest rates over time, in 4 CSV tables as follows:
Table Name | Description |
---|---|
user_balance_table.csv | 2,840,421 observations of the time series cash flow data from 28,041 Yu’e Bao users for 14 months, from 1st Jul 2013 to 31st Aug 2014. Cash flow data includes 18 variables of account balances, different types of deposits, withdrawals, interest earned and, if funds are used to make online purchases, categories of purchase. |
user_profile_table.csv | 28,041 rows of user profile data that describes the user’s gender, zodiac sign, and registered city, based on each user ID, in 4 columns. |
mfd_day_share_interest.csv | 427 observed dates from 1st Jul 2013 to 31st Aug 2014 and the corresponding Yu’e Bao’s daily and 7-daily interest rates. |
mfd_bank_shibor.csv | 294 observed dates from 1st Jul 2013 to 31st Aug 2014 and the corresponding 8 types of Shibor interest rates, from overnight interest rates to yearly interest rates. Although the time frame is the same as above, the number of observations is less here because there are no Shibor interest rates data for weekends and public holidays. |
Visualization and Analysis
The goal of the TIANCHI competition is to train models to predict future cash flow of Yu’e Bao users to aid Ant Financial Services Group, Alibaba Group’s affiliate company operating Alipay, in its business of processing users' cash inflow and outflow. Nonetheless, our group has chosen to provide an alternative view to this dataset, and to seek and visualize insights not visible from predictive modeling. To do this, we will implement the below modules in our Shiny app:
Data Exploration and Visualization
In this module, we aim to provide an interactive data explorer to visualize the data in different ways. We will employ different visualization techniques, eg. treemaps, heatmaps, corrplots, time-series line graphs, to demonstrate the interaction and relationships between different combination of cash flow behaviour, user profile and interest rates. App users will have the flexibility to select the variables of their interest and dynamically generate the corresponding visualizations.
Survival Analysis and Visualization
In contrast to predicting the future cash flow, we will perform survival analysis on the user cash flow data. This will give us an understanding of what percentage and how soon Yu’e Bao users will withdraw their balances upon depositing into the account. Separate survival analysis will also be performed for individual classes of user profiles. The app will allow us to visualize and compare the survival of Yu’e Bao users’ deposits between different classes, for example, between male and female users, or between Taurus and Aries. With this, we can become MythBusters to verify if the myths, like Taurus are better savers than Aries, are indeed true.
Time Series Clustering and Visualization
The dataset provides a rich time series data of Yu’e Bao users’ cash flow. We will attempt to perform segmentation on Yu’e Bao users, by their account balances time series, using time series clustering. The time series clustering technique of Dynamic Time Warping will be explored using the tsclust / dtwclust R library. Our Shiny app will provide an evaluation platform of different clustering results through an interactive visualization and comparison of cluster validation indicators, over a matrix of cluster numbers and Dynamic Time Warping clustering parameters. App users can also dynamically generate the visualization of clusters, based on clustering parameters of their interest, to explore and compare among the clusters.
Challenges
Firstly, the metadata of dataset provided by the competition is not very detailed, and the meaning of some variables is not clearly explained. Therefore, research will need to be done to gain domain knowledge on the workings of Yu'e Bao and understanding on the meaning of the dataset variables. Time series clustering is a resource intensive analysis and on preliminary trials, using all 2,840,421 observations, clustering failed with error: cannot allocate vector of size 5.9GB. This appears to be a limit of R in Windows where R only runs in 32bit, even on a 64bit Windows OS, where the address space cannot exceed 4Gb. We will need to look for ways to reduce the data to a suitable size. The size of data also affects the dynamic generation of visualizations in the Shiny app, where it takes relatively long to generate a visualization, which will affect app user's experience. Thus, we will need to explore data aggregation techniques before executing the visualizations. Other issues of dataset includes missing values, a lot of zero value data, Chinese characters in data, and the city variable is represented by a 7 digit number, which prevents us from mapping our data into a geospatial visualization.
Libraries
The below R libraries will be considered for the project
- shiny
- shinydashboard
- shinyWidgets
- dashboardthemes
- tidyverse
- lubridate
- ggplot2
- plotly
- lattice
- xts
- treemap
- d3treeR
- survival
- ggfortify
- survminer
- dplyr
- TSclust
- dtwclust
- cluster
- corrplot
References
Banner image credit to: China Money Network