Difference between revisions of "ANLY482 AY2016-17 T2 Group 2 Project Overview Methodology"

From Analytics Practicum
Jump to navigation Jump to search
Line 95: Line 95:
 
For missing values, we will determine the number of missing values. If the number is significant, we will use prediction techniques to predict these values based on the data set. Otherwise, we will remove these transactions from our analysis so that it will not affect our findings.<br>
 
For missing values, we will determine the number of missing values. If the number is significant, we will use prediction techniques to predict these values based on the data set. Otherwise, we will remove these transactions from our analysis so that it will not affect our findings.<br>
 
Lastly, we will perform data normalization and transformation. Some fields in the phone purchasing dataset and internet purchasing dataset have different scales and values even though they represent the same information. Also, due to system changes in Kaiso's IT infrastructure, there are some differences in the way the data is stored and named. Therefore, we will perform data normalization and transformation to ensure that values throughout both dataset are consistent before we can perform any analysis.  
 
Lastly, we will perform data normalization and transformation. Some fields in the phone purchasing dataset and internet purchasing dataset have different scales and values even though they represent the same information. Also, due to system changes in Kaiso's IT infrastructure, there are some differences in the way the data is stored and named. Therefore, we will perform data normalization and transformation to ensure that values throughout both dataset are consistent before we can perform any analysis.  
</div>
 
 
<!--Association Rule Content-->
 
<div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 
{| color:#E6CCFF padding: 1px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
| style="padding:0.3em; font-family:helvetica; font-size:100%; border-bottom:2px solid #626262; border-left:2px #66FF99; text-align:left;" width="20%" | <font color="#000000" size="3em"><strong>Association Rule Mining</strong><br></font>
 
|}
 
Association rule mining is a rule-based method to discover interesting relations in the dataset. We will conduct analysis on the ticketing transactions to determine purchasing patterns, which are known as rules, between customers and the different ticketing channels. These rules can then be used by Kaiso as the basis for marketing strategies for their products.
 
</div>
 
 
<!--Correlation Analysis Content-->
 
<div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 
{| color:#E6CCFF padding: 1px 0 0 0;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0" |
 
| style="padding:0.3em; font-family:helvetica; font-size:100%; border-bottom:2px solid #626262; border-left:2px #66FF99; text-align:left;" width="20%" | <font color="#000000" size="3em"><strong>Correlation Analysis </strong><br></font>
 
|}
 
We will perform correlation analysis and observe the interactions of various variables, which we have identified from EDA, with the bet amount. From the correlation coefficient, we will be able to determine the strengths of these relationships and find out does these relationships correlate to the purchasing patterns for both ticketing channels.
 
 
</div>
 
</div>
  
Line 118: Line 102:
 
| style="padding:0.3em; font-family:helvetica; font-size:100%; border-bottom:2px solid #626262; border-left:2px #66FF99; text-align:left;" width="20%" | <font color="#000000" size="3em"><strong>Dashboard</strong><br></font>
 
| style="padding:0.3em; font-family:helvetica; font-size:100%; border-bottom:2px solid #626262; border-left:2px #66FF99; text-align:left;" width="20%" | <font color="#000000" size="3em"><strong>Dashboard</strong><br></font>
 
|}
 
|}
Following the analysis that was carried out, a dashboard will be built to aid in the visualization of the findings. The dashboard will showcase the important variables and its interactions with customer purchasing behaviour. This will be an easy way for the customer engaging teams to use and understand specific behaviours of their customers.
+
Following the analysis, an analytical dashboard will be built to visualize our findings. The dashboard will display the key variables of the data and how they affect the customer purchasing behaviour. The customer engaging teams would be able to utilize the dashboard to display and better understand the differences between the customer behaviours before and after the launch of the new system.
 +
The dashboard will use a framework that allows Singapore Pools to update their dashboard by uploading their dataset every time they have a new dataset. Design, statistics and visualization will be our main considerations when building the dashboard so that they can easily unveil the differences that they are looking for.
 +
 
 
</div>
 
</div>
  

Revision as of 12:23, 14 January 2017


HOME

 

PROJECT OVERVIEW

 

FINDINGS

 

PROJECT DOCUMENTATION

 

PROJECT MANAGEMENT

 

ANLY482 HOMEPAGE

Background Data Source Methodology


Tools Used

Based on the client requirements for the project, the programming language that we will be using is Python and R. Python and R has a mature and growing ecosystem of open-source tools for mathematics and data analysis. Jupyter Notebook is the best IDE for data analytics.

Methodology

Data Collection

Kaiso will provide us with 5 datasets on musical and concert data. The datasets consists of transaction records from both phone booking and internet booking channels. Apart from the data provided, we will also look into collecting external data that may affect our analysis such as the dates of public holidays.

Literature Review

To gain more domain knowledge, we will seek to read up on research papers, articles and news related to our area of topic which is ticketing analytics. Furthermore, we aim to focus our reading on online ticketing because we will be using it as our basis when we perform our analysis. In addition, this will provide us with sufficient theoretical knowledge to conduct these analyses.

Exploratory Data Analysis (EDA)

In the initial stage of this project, we will examine the dataset to have a better understanding of the various aspects of the dataset. This will also help us in the next stage of data preparation by identifying outliers and anomalies. Furthermore, we can perform normalization and transformation on the data if they are not consistent. We will also use EDA to help us identify important variables for subsequent steps such as correlation analysis.
Some of the analysis which we will look at are the frequencies of transactions for account holders in relation to the different bet types and the popular time of transaction, type of transaction and amount of transaction.

Data Preparation

Before performing any further data analysis, the first step is to prepare the data. We will clean the data to handle outliers and missing values. In addition, we will perform data normalization and transformation on the given dataset.
For outliers, we will first determine if the values are due to human or system error. If it is due to human or system error, we can safely remove that transaction from our analysis. Otherwise, we will conduct separate analysis of these outliers values.
For missing values, we will determine the number of missing values. If the number is significant, we will use prediction techniques to predict these values based on the data set. Otherwise, we will remove these transactions from our analysis so that it will not affect our findings.
Lastly, we will perform data normalization and transformation. Some fields in the phone purchasing dataset and internet purchasing dataset have different scales and values even though they represent the same information. Also, due to system changes in Kaiso's IT infrastructure, there are some differences in the way the data is stored and named. Therefore, we will perform data normalization and transformation to ensure that values throughout both dataset are consistent before we can perform any analysis.

Dashboard

Following the analysis, an analytical dashboard will be built to visualize our findings. The dashboard will display the key variables of the data and how they affect the customer purchasing behaviour. The customer engaging teams would be able to utilize the dashboard to display and better understand the differences between the customer behaviours before and after the launch of the new system. The dashboard will use a framework that allows Singapore Pools to update their dashboard by uploading their dataset every time they have a new dataset. Design, statistics and visualization will be our main considerations when building the dashboard so that they can easily unveil the differences that they are looking for.

Recommendations & Insights

From our analysis and dashboard, we seek to assist Kaiso in understanding the characteristics of their customers. We will be proposing business strategies and recommendations to them based on the insights that we have uncovered.