AY1516 T2 Team13 Natasha Studio Findings RuleMining

From Analytics Practicum
Revision as of 14:09, 17 April 2016 by Amy.tan.2012 (talk | contribs)
Jump to navigation Jump to search

HOME

TEAM

PROJECT OVERVIEW

FINDINGS & ANALYSIS

PROJECT MANAGEMENT

DOCUMENTATION

EXPLORATORY DATA ANALYSIS OTHER ANALYSIS DATABASE CREATION ASSOCIATION RULE MINING LOGISTIC REGRESSION

Process Flow

Process Flow of ARM Analysis

Using SAS® Enterprise Miner 12.1, we performed ARM using the Association node to discover associations between “Member” as the ID variable and 1) “Price Package” 2) “Package (Genre)” 3) “Course/Open & Level” as 3 different Target variables. This allowed us to identify key associations between the different packages that customers would purchase.

Due to our missing data gap in 2013, we had first split our preliminary analysis into two; 2010-2012 and 2014-2015. This would allow us to see if the association between the time period are different and whether there is a need to split our subsequent analysis. Preliminary results showed that the association discovered does differ between the two time periods. Consequently, we proceeded to analyze them separately.

We also applied sequence discovery to enhance our ARM model. By adding “Date Purchased” as a Sequence variable, time of purchase is taken into account. We find this enhancement is necessary for our model as customers typically do not buy more than 1 package at the same time. Instead, they would buy 1 package, utilize it and then buy another. Thus, taking into account time is necessary in our analysis. Hence, we believe that findings for sequence discovery should be more applicable.

Calibration of ARM Analysis

The above shows our final calibration of our model. It was designed to give us the most ideal set of rules.

Comparing between our association results as well as our sequence results, we find that focusing on our sequence results is sufficient as generally, the same rules are flagged out under both analyses. The key difference is that, as mentioned, the time factor being taken into account. As such, we proceeded to focus on our sequence analysis.


Statistical Analysis

2010-12ARMStats 2014-15ARMStats

From the two tables above, with the top table being 2010-2012 and bottom 2014-15, we see that the average of the 3 measures for both 2010-2012 and 2014-2015 is about the same. The average lift value is just above 1, indicating limited strength of the rules. The support % is also fairly low at 5.46%, meaning that, on average these rules may not occur very often in our dataset. Thus, it is likely that some rules would be irrelevant and thus be taken out of our analysis if their support % is too low or their lift value is less than 1.

However, the max statistics shows promising results, with a support % as high as 22.63, confidence 55.09 and lift of 6.11 in 2014-2015. This means that there are rules that represent a large proportion of the dataset and have high likelihood in the form of high lift values. Thus, even though the quantity of rules may be small, it does not necessarily mean low quality of rules.

Rules Categorization

With the generated rules, we proceeded to split the results into two main groups (See Tables below): 1) Rules which involve the same left and right item 2) Rules which have a different left and right item

Group 1 is specifically taken out as these do not present any cross-selling opportunities. However, they might still be useful in identifying the purchasing behavior of customers. Group 2 is used to mainly identify cross-selling opportunities. Natasha can then anticipate packages that would be purchased one after another and possibly provide discount to increase popularity. Rules were also selected based on the level of support (S), confidence (C) and lift (L), with lift being above or close to 1 being the most importance factor in determining the acceptance of the rule.

Group 1 Sequence Rules

2010-2012 Rules

Group1_2010Rules

2014-2015 Rules

Group1_2014Rules

Group 2 Sequence Rules

2010-2012 Rules

Group2_2010Rules

2014-2015 Rules

Group2_2014Rules

Business Recommendations

Natasha's customers like to stick with what they know

Judging from the difference in quantity of rules between Group 1 and Group 2. We can see that there seems to be a larger proportion to people who tend to stick with the same package type. This highlights the routine nature of Natasha’s customers and limited desire to venture out of one particular package type. For example, in 2010-2012, the “best” rule is “Popping Popping Popping”, with the highest confidence at 54.55% and highest lift at 3.96. This means that for customers who have a “Popping” package, they are 3.96 times more likely to purchase another “Popping” class for another 2 times. This is consistent with the 2014-2015 data, with the “best” rule being “Popping Popping”, with an astonishing high lift value at 6.11. Even 2014-2015’s rule is only a chain of 2, as compared to 2010-2012’s chain of 3, this is likely to be due to the lack of data in the 2014-2015 time period as the initial hardcopy data did not record the corresponding genre for open classes.

We also see that those who do not venture out “i.e. Group 1 rules”, they are mainly within the unlimited classes as well as open classes packages. This makes sense since within such packages, there is already flexibility to choose what genre, what time, what day you would want to go for and thus, negating for the need for you to try another package type.

Potential to improve customer’s learning progression

Our results show a relatively high proportion of customers buying 1 “Open Class: Beginner” package after another, with the support for chain of 2 being at 7.83%. This means that 7.83% of transactions comprise of “Open Class: Beginner Open Class: Beginner”. We also see a similar rule with a chain count of 3 being flagged out. This is a cause of concern as it shows that customers seem to be stick at the same level and not progressing to higher level courses such as “Intermediate” or Advanced. Our client acknowledges this, highlighting to us that Natasha caters to beginner learners and actively seeks to attract such learners. However, we believe that by marketing such students to advance to higher-level courses would help the customers better appreciate the classes, keeping their interest and ultimately improving customer retention. One such way could be though introducing courses using smaller increments of level e.g. Beginner 1.5. This would smoothen customer’s learning progression through smaller steps.

Enhance revenue generation through 06 Weeks (Full-Course)

One key difference between the 2010-2012 to 2014-2015 data pertains to the rule “06 Weeks (Full-Course) 06 Weeks (Full-Course)”. Our results highlight that there seems to be a sequence association, highlighting the higher repeatability of 06 Weeks (Full Course) customers in 2014-2015 as compared to 2010-2012. This is a potential area to further enhance given the higher revenue potential as seen in our Exploratory Data Analysis (See Figure 12). Thus, Natasha should look towards marketing more 06 Weeks courses.

Cross-selling opportunities mainly in Course type packages

Given the innate flexibility within open and unlimited packages, it seems intuitive that cross-selling opportunities would thus be in course type packages. This is in align with our results for Group 2 rules, showing very limited sequence discoveries and are by and large relating to course type packages.

In 2010-2012, we see customers being likely to shift from 06 Weeks (Full Course) to Unlimited Packages, with confidence level of 44.87% and life value of 1.52. As highlighted by our client, this is likely due to the intentional effort to promote unlimited classes.

In 2014-2015, there seems to be even more limited rules, all limited to course packages. The rule with the highest confidence is also seen to be “04 Weeks (Full-Course) 06 Weeks (Full-Course)”. This does seem intuitive as members would be more likely to start trying shorter courses before moving on to longer courses. This is in contrast with the limited progression seen in open classes. This also correlates with the rule “Course : Level ABC Course: Level I”, since in general, ABC Level courses tends to be 4 weeks and Level I courses, 6. This is in contrast to the rule seen in 2010-2012: “Course: Level I Course: Level II”, thus, it seems that in addition to majority of customers moving to open type classes, the general level of customers seemed to have moved from a higher level at “Level I” to “Level ABC”. As highlighted by our client, this is consistent with a change in course structuring. At the early stages, Natasha offered a full 08 Weeks Courses, however, they tracked that attendance seemed to taper out nearer the end. Thus, they decided to split the Courses in shorter courses, 04 Weeks and 06 Weeks. However, given the high sequence discovery level, Natasha can consider to bundle Level ABC and Level I Courses together such as offering Course: Level I packages at a discount for those who have completed Course: Level ABC courses.