Difference between revisions of "Red Dot Payment Data Source"

From Analytics Practicum
Jump to navigation Jump to search
Line 96: Line 96:
 
For data cleaning, we carried out the following steps:<br><br>
 
For data cleaning, we carried out the following steps:<br><br>
 
'''<big>General Cleaning</big>'''
 
'''<big>General Cleaning</big>'''
# Compiled 2016-2017 data
+
<br><br>
# Standardised all merchants’ names and currency into uppercase: e.g. Realised that JMP Pro recognises characters with uppercase and lowercase separately
+
1. Compiled 2016-2017 data <br><br>
# Standardised currency of all transaction values: Converted all non-SGD values into SGD values, based on average monthly historical exchange rates found on OANDA.com
+
2. Standardised all merchants’ names and currency into uppercase: e.g. Realised that JMP Pro recognises characters with uppercase and lowercase separately<br><br>
# Split ‘date_created’ column into ‘date’ and ‘time’ columns.  
+
3. Standardised currency of all transaction values: Converted all non-SGD values into SGD values, based on average monthly historical exchange rates found on OANDA.com<br><br>
# Created a new column ‘period’ to indicate the respective month of each date in the  ‘date’ column (e.g. 01/01/16 → Period 1; 01/01/17 → Period 12)
+
4. Split ‘date_created’ column into ‘date’ and ‘time’ columns.<br><br>
# Created a new column ‘hour’ to indicate respective hour of each time in the ‘time’ column (e.g. 1:00:00 AM → 1; 1:00:00 PM → 13)
+
5. Created a new column ‘period’ to indicate the respective month of each date in the  ‘date’ column (e.g. 01/01/16 → Period 1; 01/01/17 → Period 12)<br><br>
# Created a new column ‘day’ to indicate respective day in the week for each data in the ‘date’ column (e.g. Monday is assigned a value of 1, Sunday is assigned a value of # Removed deactivated or terminated merchants - 2016: 41 transactions removed; 2017: 841 transactions removed
+
6. Created a new column ‘hour’ to indicate respective hour of each time in the ‘time’ column (e.g. 1:00:00 AM → 1; 1:00:00 PM → 13)<br><br>
# Removed rows with ambiguous or illegible characters/ negative transaction value on JMP Pro: Removal of 1563 rows with ambiguous or illegible characters; Removal of 2 rows with negative transaction value
+
7. Created a new column ‘day’ to indicate respective day in the week for each data in the ‘date’ column (e.g. Monday is assigned a value of 1, Sunday is assigned a value of 7)<br><br>
 +
8. Removed deactivated or terminated merchants - 2016: 41 transactions removed; 2017: 841 transactions removed<br><br>
 +
9. Removed rows with ambiguous or illegible characters/ negative transaction value on JMP Pro: Removal of 1563 rows with ambiguous or illegible characters; Removal of 2 rows with negative transaction value

Revision as of 20:46, 25 February 2018

HOME

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT DOCUMENTATION

 

PROJECT MANAGEMENT

 

ANLY482 HOMEPAGE

Background Data Source Methodology

Metadata

Previously, we only have the transaction data of 2017 provided by RDP. Moving on, we managed to obtain transaction data of 2016 as well.

Below are the definitions of common terms used in their business model:

MERCHANT - A merchant is a business, often a retailer, that operates online. Each merchant has appointed RDP to be its payment processor to handle online transactions between them and their customers.

CUSTOMER - A customer is an entity that makes a transaction with the merchant. For a transaction to be made, a customer must make contact with a merchant and provide their payment details (e.g. Credit card number) to the merchant through RDP’s payment processing gateway.

The following table shows the list of variables in the data and their associated description:

NAME TYPE DESCRIPTION
date_created Date Time Data and time at which the transaction was carried out between merchant and customer
period Integer Month of the year the transaction was carried out between merchant and customer
hour Integer Hour in a day the transaction was carried out between merchant and customer
day Integer Day of the week the transaction was carried out between merchant and customer
time Time Time at which the transaction was carried out between merchant and customer
name Alphanumeric Name of merchant
amount of money Decimal Total transaction amount - unstandardised currency
currency CHAR(3) Currency of the transaction amount - usually in SGD or USD
converted value Decimal Total transaction amount - standardised in SGD
reason_code Alphanumeric A unique code tagged to each reason for each approved or rejected transaction made by customer
reason_code_description Alphanumeric Description of the reason for each approved or rejected transaction made by customer
card_data1 Integer First 6 digits of customer’s card number - reveals details about the issuing bank and card type
card_data2 Integer Last 4 digits of customer’s card number
contact_ip_address Alphanumeric IP address of the customer
contact_ip_country Alphanumeric IP country of the customer
Bin [based on log10] Integer Bin number each merchant belongs to


*Sample dataset shown in proposal, due to sensitivity of data

Data Cleaning

For data cleaning, we carried out the following steps:

General Cleaning

1. Compiled 2016-2017 data

2. Standardised all merchants’ names and currency into uppercase: e.g. Realised that JMP Pro recognises characters with uppercase and lowercase separately

3. Standardised currency of all transaction values: Converted all non-SGD values into SGD values, based on average monthly historical exchange rates found on OANDA.com

4. Split ‘date_created’ column into ‘date’ and ‘time’ columns.

5. Created a new column ‘period’ to indicate the respective month of each date in the ‘date’ column (e.g. 01/01/16 → Period 1; 01/01/17 → Period 12)

6. Created a new column ‘hour’ to indicate respective hour of each time in the ‘time’ column (e.g. 1:00:00 AM → 1; 1:00:00 PM → 13)

7. Created a new column ‘day’ to indicate respective day in the week for each data in the ‘date’ column (e.g. Monday is assigned a value of 1, Sunday is assigned a value of 7)

8. Removed deactivated or terminated merchants - 2016: 41 transactions removed; 2017: 841 transactions removed

9. Removed rows with ambiguous or illegible characters/ negative transaction value on JMP Pro: Removal of 1563 rows with ambiguous or illegible characters; Removal of 2 rows with negative transaction value