ANLY482 AY2016-17 T2 Group10 Analysis & Findings
Initial Data Exploration and Analysis
We conducted initial data analysis using Exploratory Data Analysis (EDA) to gain general insights on key determinants that govern the relationship between interaction counts and sales revenue. We hypothesized that such factors could be “channel”, “therapy area” (sales team).
“Channel” is the classification for different types of clinics, such as General Practitioners, Restructured & Private Hospitals, Specialists. From our basic understanding, each channel has its own protocols and practices that are likely to affect receptiveness of interactions. For instance, interactions with hospital doctors may not be that impactful as that with GP doctors because hospital doctors get their drugs from a centralized system, while GP doctors have the power to make decisions for their own clinics.
“Therapy area” defines the name of sales teams, such as Uro CNS (Urology), Respi (Respiratory), Paed Vx (Pediatrics Vaccines), Allergy, Al Derm (Dermatology), Ad Vx (Adult Vaccines), and it decides the corresponding product brands to promote. We postulate that different drugs have different demands and established drugs may need small number of interactions to achieve good sales results whereas new types of drugs may need more interactions to achieve the same level of sales results.
To give us a better understanding of the natures of different channels and therapy areas, in this initial data analysis, we will explore how sales revenue differs for different channels across different quarters and how each therapy area performs in terms of sales revenue for different channels.
Exploring sales revenue by channels and quarters allow us to understand significant demand patterns that arise from practices or secondary consumers (patients).
For instance, we will plot a line graph of total sales amount (response) against different quarters (explanatory) by different channels.
A first look at the visualization gives us an understanding that there are indeed intrinsic differences across different channels and quarters.
An observation of trend across quarters is that the highest sales for most channels were made in the first quarter. This is especially prominent for pharmacy, which made more than half of the second quarter sales. To rationalize such trend, we propose two reasons, 1) higher demand from secondary consumers and 2) practice of stockpiling at the start of the year.
Exploring how each therapy area performs in terms of sales revenue for different channels can help to identify which channel each therapy area should focus on.
To visualize, we will plot a mosaic plot of sales revenue by therapy area and channel.
The use of a mosaic plot is to visualize data from two or more categorical variables in terms of a numerical variable (weight). Referencing Figure 12, x-axis shows the different channels, y-axis shows the different therapy area, area of each box depicts the proportion of sales revenue and labels shows the percentage of each therapy area when compared within a channel.
For instance, we can understand that within a channel, which therapy area generates the highest sales revenue; for Restructured Hospital, Urology team generates the highest sales revenue, for Polyclinics; Paediatrics Vaccine team generates the highest sales revenue.
Secondly, we can also understand how each channel fare against each other in terms of total sales revenue from its relative width; Restructured Hospital has the highest sales revenue, followed by General Practitioner.
This piece of information not only shows inherent variations in terms of sales revenue for therapy area, but also that of therapy area and channel together as it illustrates secondary consumers’ (patients) preference over certain channels when it comes to treating different illnesses. This could be important in future work of using two-way ANOVA to understand the effects of multi-factor-group impacts on sales interaction vs sales revenue.