ANLY482 AY2017-18T2 Group10 Analysis & Findings

From Analytics Practicum
Revision as of 23:52, 25 February 2018 by Bryan.wee.2014 (talk | contribs) (→‎EDA For PLU Data)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Tennet logo.png


HOME

ABOUT US

PROJECT OVERVIEW

ANALYSIS & FINDINGS

PROJECT MANAGEMENT

BACK TO MAIN ANLY482

EDA

Recommendations

Model

Due to confidentiality, we will not be able to upload any charts onto the wiki. The fully disclosed analysis report is available in our Interim Report submission.

EDA For Inventory Data

80/20 Analysis

Given that the sponsor has a few hundred items in their inventory listing, we would like to identify which items make up a larger percentage of the total usage cost. We could then choose to focus on items which make up the bulk of the inventory costs, as the impact of any inventory forecasting model would be greater for those items.

From our analysis, we have found out that the top 70 items make up 60% of the total usage cost for the past two years, while the top 140 items make up almost 80% of the total usage cost for the same duration. The detailed From this we can tell that a substantial proportion of items make up 80% of the total usage value and thus not quite ‘80-20’.

The main cost drivers are largely the meat items, such as different cuts of lamb and beef, as opposed to the vegetables. Seafood items such as crayfish and prawns also take up a high percentage of total usage cost. This is not surprising since consumers want to maximize their “return” i.e. getting the most value for what they pay for. This leads to higher meat consumption since meat is often seen as “higher value” due to its higher cost. Models to be developed will take this into account and cater more accuracy towards predicting these items.


Total Inventory Usage Across Months Breakdown by Outlet

Next, we want to observe the general trend of inventory usage cost over the months to see if there is any seasonal trend in usage cost. Given that there are only two years worth of data, any trends observed might have other confounding factors and may not be completely attributed to the time of year.

Next, we can observe that the December months incurred the larger amount of inventory costs. On the other hand, the month of March for both years incurred one of the lowest inventory costs. Generally, inventory costs are increasing from the month of April up till December. Next, we also want to investigate the proportion of each outlet’s contribution to the total inventory cost by looking at the 100% stacked chart. We can see that the percentage of each outlet’s contribution to the total inventory cost does not vary substantially even though the actual usage values do fluctuate across the months. This shows that each outlet appears to be affected by the seasonality factors by roughly the same amount.

Finally, we also want to check if the inventory usage for the different outlets are statistically different. To do this we generated the quantiles, connecting letters report and the ordered differences report in JMP Pro 13.

From the connecting letters report we can see that there are two pairs of outlets which are not significantly different from each other - Outlet A - Outlet B and Outlet C- Outlet D. This shows that the outlets located at major heartland areas (A & B) - are not significantly different. While C and D are also both similar and are both located in town. Apart from these two pairs, all other pairs of outlets are significantly different from each other.


Inventory Usage Across Months Breakdown By Product

Subsequently, we wish to check if there are any trends present in the inventory usage for individual products across the months. In general, over the entire year, we observe that there is little trend for all the ingredients on the whole and that some ingredients have a more noticeable trend pattern than others. There are, however, some slight spikes in usage for the months of December and August for a majority of the items.

The high usage items (Usage Value > $10,000) appear to have a more noticeable trend than the others. These items include the “XXX” , “YYY” and the “ZZZ” (Yellow). However, we also observe that these items have been discontinued with effect from May 2017. The low usage items (Usage Value < $10,000) on the other hand, show minimal changes across the months and are fairly constant throughout the year. Another observation is on the introduction of new items such as the “YYY- XXX” which was added to the menu only in April 2017 but has seen steady growth.

EDA For Sales Data

Analysis of Revenue per Outlet per Day

Given that management would like to know if there are discrepancies in the ordering across the various outlets, a revenue analysis was performed to determine each outlet's respective contribution to the total revenue. Interestingly, we note that the top revenue generating outlet overall, was not always the top performing outlet in each individual month. This means that month-on-month performance does tend to fluctuate across outlets.

Next, to develop a more appropriate inventory model, distribution of sales was analysed on a per day basis. From the charts, it can be seen that sales tends to peak during the Friday - Sunday period. Such an observation is not surprising given that this period is considered to be the “weekend” period where customers have more time to spend dining. On closer observation, the distribution of sales across each outlet on a daily basis is relatively consistent, indicating that consumption pattern may not be random. Furthermore, the proportion of customers contributed by each outlet to the total is also fairly consistent throughout the week.

Applying the Tukey-Kramer’s test to generate the Connecting Letters Report and Ordered Differences Report for revenue against day of the week, we note that differences in revenue between days of the week are largely statistically significant (p-value < 0.0001). Thus, it may be said that the day of the week would play a part in predicting inventory usage, since it reflects the consumer demand.


Analysis of No. of Pax per Outlet per Day

From the quantiles output, it can be seen that average number of customers per day does vary quite greatly from outlet to outlet. This is further supported by the Connecting Letters Report and Ordered Differences Report generated in JMP Pro 13. From the Connecting Letters Report, it can be seen that three of the outlets are fairly similar, while the remaining three outlets are dissimilar from the first three outlets as well as each of the other outlets. This is further illustrated by the Ordered Differences Report indicating a p-value of < 0.0001. This means that if outlet name is used as a variable for modelling, these differences must be taken into consideration to ensure a more appropriate model.

EDA For PLU Data

Analysis of Breakdown of Patrons and Revenue over time

When analysing the breakdown of the number of patrons and revenue over the months, we could see a trend where there are more patrons/revenue are in the month of January and December. Such an observation may be because of the December Holidays or Public Holidays such as Chinese New Year and the New Year. In the other months, the amount of patrons/revenue that was obtained by adults were also fairly consistent.

Another observation for the student graph is that we identified a huge increase between August 2017 to December 2017 as compared to the previous year. This has resulted in 2 times more patrons and 3 times more revenue obtained from students. After consulting with the our sponsor, they have identified that they have changed the pricing model for students during August 2017 and it could be the reason for the spike. Thus, we can conclude that the new pricing strategy adopted by XYZ Company seems to be effective.


Analysis of proportion of patron categories for each outlet

The proportion of patrons for each outlet are shown based on their category (Adult, Child, Senior, Student, Tourist and FOC PAX). There are 2 highly distinct clusters that can be observed, the cluster which is heavily dependent on tourists and the cluster less dependent on tourists, we refer to them as the Heartland cluster. Using this information, we can help XYZ formulate various strategies, such as partnering with Travel Agencies or popular tourist areas to offer promotions for tourists, or setting up more “Heartland” outlets such as in places with many people, such as Jurong East or Bedok. We have also identified one outlet that is could possibly be under performing. Between 2016 and 2017, they only had around 33,000 adults and 23,000 tourists who patronised the outlet. On average, there are 80,000 adults and 37,000 tourists patronising the 6 outlets over the same period of time.


Statistical Significance of Daily PLU Data

Applying the Tukey-Kramer’s test to generate the Connecting Letters Report and Ordered Differences Report for number of patrons to the different outlets, we note that differences in no. of patrons between different outlets are largely statistically significant (p-value < 0.0001), less a few outlets. Thus, it may be said that those outlets that have a p-value < 0.0001 have a part to play for no. of patrons in predicting inventory usage.

We applied the Tukey-Kramer’s test to generate the Connecting Letters Report and Ordered Differences Report for number of patrons to the various patron categories, we note that differences in no. of patrons between various patron categories are largely statistically significant (p-value < 0.0001). Thus, it may be said that the no of patrons would play a part in predicting inventory usage.