ANLY482 AY2016-17 T2 Group3: PROJECT FINDINGS Association

From Analytics Practicum
Revision as of 03:54, 22 April 2017 by Andrew.lim.2013 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
V Logo.png


HOME   ABOUT US   PROJECT OVERVIEW   PROJECT FINDINGS   PROJECT MANAGEMENT   DOCUMENTATION   ALL PROJECTS


ANALYSIS

The last analysis that we will be focusing on is Association Analysis. As mentioned earlier, the main purpose of this analysis is to better understand customers booking behaviour. Before conducting this analysis, there are a couple of technical terms that have to be defined first. Association Analysis basically tries to uncover any potential association between Items with the same Transaction ID. In this case, Item refers to the specific category of a service (e.g. nails, makeup etc.) that the customer includes in a booking. Transaction ID refers to the ID of the transaction that the item belongs to (i.e. the ID of the booking).


V 13 Bookings table.png
Figure 13. Bookings table


Currently, the Bookings table is structured in a way where each row is a booking record and each record can keep track of multiple service IDs to indicate the services that the customer has selected (as seen in Figure 13). However, this does not fit the requirement to conduct Association Analysis as it is impossible to derive any association rules this way. Hence, the data in the Bookings table had to be transposed in a way where each service ID forms an entirely new row on its own. Looking at Figure 14, the 2nd and 3rd row are actually services that belong to the same booking. After transposing the data, the level 1 category of the service (category_1) is generated for each row. The level 1 category refers to the master categories that Vanitee had defined, meaning that they are more generic in nature. In the later part of this section, we will look into whether this category_1 variable’s suitability as the Item in Association Analysis.


V 14 Bookings table (after transposing).png
Figure 14. Bookings table (after transposing)


V 15 Frequency distribution of category 1 (including bookings with only 1 service).png
Figure 15. Frequency distribution of category_1 (including bookings with only 1 service)


V 16 Frequency distribution of category 1 (excluding bookings with only 1 service).png
Figure 16. Frequency distribution of category_1 (excluding bookings with only 1 service)


The next issue faced was whether bookings with only 1 service should be included in the analysis. Understandably, these bookings will cause the results to weaken as no association can be drawn from just 1 service within the booking. However, we wanted to see if there were any huge difference in the analysis if we were to include or exclude these bookings. Figures 15 and 16 above show the frequency distributions of the category_1 variable including and excluding bookings with only 1 service respectively. Interestingly, both actually show that there is way more bookings that have included nail services as compared to the other service types. This is primarily due to the fact that there are much more nail services offered on this platform as seen in the EDA section earlier.


V 17 Results of Association Analysis using category 1 (including bookings with only 1 service).PNG
Figure 17. Results of Association Analysis using category_1 (including bookings with only 1 service)


Next, we utilized the Association Analysis platform under “Screening” in JMP Pro to customize a few parameters (as seen in at the top of Figure 17). Support refers to the proportion of transactions in which a specific item set appears. Confidence refers to the proportion of transactions that contain the consequent item set given that the condition item set is in the transaction. Lift refers to the ratio of an association rule’s confidence to its expected confidence. In general, the higher the value in these 3 variables, the more insightful the association rule is. However, in this analysis, we have set these variables to a minimum of almost zero, in an attempt to discover all association rules.

The bottom 2 tables in Figure 17 show the results of the analysis when bookings with only 1 service are included. The left table depicts the frequent item sets that occur while the right table states the association rules that have been found. For example, the {Nails} item set has a support of 75%, meaning 75% of the bookings have a nail service included. However, we can see that the support for the rest of the item sets are significantly lower. For the association rules, we can see that the first rule states that with the condition as ‘Hair Removal’ and consequent as ‘Nails”, the confidence is only 23% with a lift of 0.303. For it to be insightful, the confidence should be at least 50% and the lift should be more than or close to 1. Hence, we can observe that the results of this analysis are relatively insignificant.


V 18 Results of Association Analysis using category 1 (excluding bookings with only 1 service).PNG
Figure 18. Results of Association Analysis using category_1 (excluding bookings with only 1 service)


On the other hand, the analysis was rerun where bookings with only 1 service were excluded to see if better results could be obtained. A quick glance at the rules generated in Figure 18 show an improvement in the confidence and lift of the rules. Specifically, when we zoom into the first rule with ‘Brow & Lash’ as the condition and ‘Nails’ as the consequent, we observe a confidence of 49% and lift of 0.533. However, when we look at the condition, ‘Brow & Lash’, in the frequent item sets table, we noticed that its support is at a low value of 7%. This means that this rule may only be true 7% of the time.


V 19 Bookings table with category 1.png
Figure 19. Bookings table with category_1


Moving on, we decided to understand the potential reasons for the unsatisfactory results. As mentioned earlier, the results may have been affected due to the high frequency in the number of nail services, whether bookings with 1 service were included or excluded. Upon closer inspection, we realized that this high frequency resulted from the way the level 1 categories were classified. As shown in Figure 19, the services in the first 3 rows belong to the same booking. However, all of them were classified to be under Nails, which means that in this booking, there are no “difference” in the categories of services included. Hence, we deduced that the category_1 values are perhaps overly generic to be used in Association Analysis.


V 20 Bookings table with category 2.png
Figure 20. Bookings table with category_2


Eventually, we decided to breakdown each level 1 category into its level 2 category (category_2) which are categories that are still defined by Vanitee but are much more specific in nature. For example, looking at Figure 20, the ‘Nails’ category can be split into ‘Gel’ and ‘Nail Art’ while the ‘Hair Styling’ category can be further categorized into ‘Colour’ and ‘Cut’. In this case, we can be certain that there are at least bookings with a differentiated number of services included.


V 21 Results of Association Analysis using category 2 (including bookings with only 1 service).PNG
Figure 21. Results of Association Analysis using category_2 (including bookings with only 1 service)


Next, we applied the same methodology and ran the same analysis, starting with including bookings with only 1 service. As expected, there was a huge improvement in the results, especially in the association rules generated. As seen in Figure 21, most of the association rules have a lift that is close to 1 and some of them have a confidence close to or more than 50% (as highlighted by the red box). However, we noticed that there was the same issue of low support for the condition in the association rule. For example, the rule with ‘Nail Art, Removal’ as the condition and ‘Classic’ as the consequent has a healthy confidence of 61% and lift of 1.524. However, when we look at the condition, ‘Nail Art, Removal’, in the frequent item sets table, its support is at an extremely low value of 2%. This means that this rule may only be true 2% of the time which is highly insignificant.


V 22 Results of Association Analysis using category 2 (excluding bookings with only 1 service).PNG
Figure 22. Results of Association Analysis using category_2 (excluding bookings with only 1 service)


Lastly, we decided that the previous issue was brought about by including bookings with only 1 service. Hence, the analysis was run again but this time bookings with only 1 service were excluded. Looking at Figure 22, the results generated are largely encouraging where the support of the condition in the association rules are largely adequate. For example, the rule with ‘Nail Art’ as the condition and ‘Classic’ as the consequent has a healthy confidence of 60% and lift of 0.998. Furthermore, the ‘Nail Art’ has a decent support of 42%. To put it in words, we can state that 60% of customers that include a ‘Nail Art’ service, will be likely to include a ‘Classic’ service within the same booking. Also, this is going to be rather prevalent where there are already 42% of existing bookings that have a ‘Nail Art’ service.

However, we do acknowledge some of the limitations posed by the behaviour of actual customers and professionals. For example, customers tend to engage in services that are related to each other (e.g. within nails category itself) instead of engaging in two distinctively different services (e.g. nails and facial). Furthermore, professionals usually offer services that they specialize in (e.g. a nail artist does not do hair styling).

Through this analysis, we got to dive deeper into understanding the way customers go about making their bookings. Even though the rules generated may be obvious, they still give an insightful view as to what is the exact percentage that a specific rule is true.