AY1516 T2 Team13 Natasha Studio Findings RuleMining

From Analytics Practicum
Revision as of 14:00, 17 April 2016 by Amy.tan.2012 (talk | contribs)
Jump to navigation Jump to search

HOME

TEAM

PROJECT OVERVIEW

FINDINGS & ANALYSIS

PROJECT MANAGEMENT

DOCUMENTATION

EXPLORATORY DATA ANALYSIS OTHER ANALYSIS DATABASE CREATION ASSOCIATION RULE MINING LOGISTIC REGRESSION

Process Flow

Process Flow of ARM Analysis

Using SAS® Enterprise Miner 12.1, we performed ARM using the Association node to discover associations between “Member” as the ID variable and 1) “Price Package” 2) “Package (Genre)” 3) “Course/Open & Level” as 3 different Target variables. This allowed us to identify key associations between the different packages that customers would purchase.

Due to our missing data gap in 2013, we had first split our preliminary analysis into two; 2010-2012 and 2014-2015. This would allow us to see if the association between the time period are different and whether there is a need to split our subsequent analysis. Preliminary results showed that the association discovered does differ between the two time periods. Consequently, we proceeded to analyze them separately.

We also applied sequence discovery to enhance our ARM model. By adding “Date Purchased” as a Sequence variable, time of purchase is taken into account. We find this enhancement is necessary for our model as customers typically do not buy more than 1 package at the same time. Instead, they would buy 1 package, utilize it and then buy another. Thus, taking into account time is necessary in our analysis. Hence, we believe that findings for sequence discovery should be more applicable.

Calibration of ARM Analysis

The above shows our final calibration of our model. It was designed to give us the most ideal set of rules.

Comparing between our association results as well as our sequence results, we find that focusing on our sequence results is sufficient as generally, the same rules are flagged out under both analyses. The key difference is that, as mentioned, the time factor being taken into account. As such, we proceeded to focus on our sequence analysis.


Statistical Analysis

2010-12ARMStats 2014-15ARMStats

From the two tables above, with the top table being 2010-2012 and bottom 2014-15, we see that the average of the 3 measures for both 2010-2012 and 2014-2015 is about the same. The average lift value is just above 1, indicating limited strength of the rules. The support % is also fairly low at 5.46%, meaning that, on average these rules may not occur very often in our dataset. Thus, it is likely that some rules would be irrelevant and thus be taken out of our analysis if their support % is too low or their lift value is less than 1.

However, the max statistics shows promising results, with a support % as high as 22.63, confidence 55.09 and lift of 6.11 in 2014-2015. This means that there are rules that represent a large proportion of the dataset and have high likelihood in the form of high lift values. Thus, even though the quantity of rules may be small, it does not necessarily mean low quality of rules.

Rules Categorization

With the generated rules, we proceeded to split the results into two main groups (See Tables below): 1) Rules which involve the same left and right item 2) Rules which have a different left and right item

Group 1 is specifically taken out as these do not present any cross-selling opportunities. However, they might still be useful in identifying the purchasing behavior of customers. Group 2 is used to mainly identify cross-selling opportunities. Natasha can then anticipate packages that would be purchased one after another and possibly provide discount to increase popularity. Rules were also selected based on the level of support (S), confidence (C) and lift (L), with lift being above or close to 1 being the most importance factor in determining the acceptance of the rule.

Group 1 Sequence Rules

2010-2012 Rules

Group1_2010Rules

2014-2015 Rules

Group1_2014Rules

Group 2 Sequence Rules

2010-2012 Rules

Group2_2010Rules

2014-2015 Rules

Group2_2014Rules