Difference between revisions of "ANLY482 AY2016-17 T1 Group6/Midterm Progress"

From Analytics Practicum
Jump to navigation Jump to search
Line 132: Line 132:
 
Each of these metrics will be calculated on for each month from January 1st to August 31st to establish time series trends.
 
Each of these metrics will be calculated on for each month from January 1st to August 31st to establish time series trends.
  
'''A. Relationship Score'''
+
'''A) Relationship Score'''
 +
 
 
The relationship score is given to each record in the Relationship Report and is to determine the strength of the relationship between a salesperson and an External/Internal party. The score focuses on two factors are Frequency of communication and Recency of communication.  
 
The relationship score is given to each record in the Relationship Report and is to determine the strength of the relationship between a salesperson and an External/Internal party. The score focuses on two factors are Frequency of communication and Recency of communication.  
  
Purpose:  
+
 
 +
''Purpose: ''
 +
 
 
1. To evaluate the relationship performance in terms of both building and maintaining them with internal and external parties.
 
1. To evaluate the relationship performance in terms of both building and maintaining them with internal and external parties.
 +
 
2. To evaluate the balance between internal and external relationships maintained by the salesperson
 
2. To evaluate the balance between internal and external relationships maintained by the salesperson
Formulation:  
+
 
Overall Relationship Score (ORS): Calculated for each internal and external relationship of the salesperson. Both X & Y variables will be standardized to give each equal weightage.
+
 
X: Total frequency of emails exchanged since Jan 1 2016 between the salesperson and the other party  
+
''Formulation:''
Y: The number of days since the last communication with the other party  
+
*Overall Relationship Score (ORS): Calculated for each internal and external relationship of the salesperson. Both X & Y variables will be standardized to give each equal weightage.
 +
*X: Total frequency of emails exchanged since Jan 1 2016 between the salesperson and the other party  
 +
*Y: The number of days since the last communication with the other party  
 +
 
 +
 
 
Formula = X+1/Y
 
Formula = X+1/Y
Explanation: 1/Y in this case is the recency factor as the number of days since the last contact will inversely affect the relationship score. (A higher number of days since last spoken will mean a lower relationship score)  
+
 
Total External to Internal Network Ratio of each salesperson
+
 
ΣORSE /ΣORSI
+
''Explanation:''
(ORSE: External Relationship Score, ORSI: Internal Relationship Score)
+
 
 +
1/Y in this case is the recency factor as the number of days since the last contact will inversely affect the relationship score. (A higher number of days since last spoken will mean a lower relationship score)  
 +
Total External to Internal Network Ratio of each salesperson:
 +
 
 +
ΣORSE /ΣORSI
 +
 
 +
ORSE: External Relationship Score, ORSI: Internal Relationship Score)
 +
 
 
Therefore, Sum of All External Relationship Scores divided by the Sum of All Internal Relationship Scores.
 
Therefore, Sum of All External Relationship Scores divided by the Sum of All Internal Relationship Scores.
  
B. Hot & Cold Relationships
 
  
The client advised us that a relationship can be considered considered Cold if no contact was made within 30 days and Hot if contact was made within the past 30 days. Therefore hot and cold relationships are a binary classification in this case. There can be cases in which people have not communicated within 30 days for a legitimate reason such as the contact being out of office etc. But we felt that 30 days is a lenient assumption for it is unusual that for an active conversations to not be for 30 days.  
+
'''B) Hot & Cold Relationships'''
 +
 
 +
The client advised us that a relationship can be considered considered Cold if no contact was made within 30 days and Hot if contact was made within the past 30 days. Therefore hot and cold relationships are a binary classification in this case. There can be cases in which people have not communicated within 30 days for a legitimate reason such as the contact being out of office etc. But we felt that 30 days is a lenient assumption for it is unusual that for an active conversations to not be for 30 days.  
 +
 
 +
'''C) Strong & Weak Relationships'''
 +
 
 +
Strong and Weak Relationships will be determined by comparing monthly relationship scores to the average baseline relationship score for January for all salespeople. Strong relationships in this case would be ones with above average relationship score and weak would be ones with below average relationship score.
 +
 
 +
'''D) Sales Stages'''
  
C. Strong & Weak Relationships
+
The absolute number of email threads belonging to each Sales Stage in the current month. (Determined by Subject Header Data)
• Strong and Weak Relationships will be determined by comparing monthly relationship scores to the average baseline relationship score for January for all salespeople. Strong relationships in this case would be ones with above average relationship score and weak would be ones with below average relationship score.
 
  
D. Sales Stages
+
'''E) Sales Conversion Rate'''
• The absolute number of email threads belonging to each Sales Stage in the current month. (Determined by Subject Header Data)
 
  
E. Sales Conversion Rate
+
The number of threads in a particular stage in current month divided by number of threads in that stage in the previous month.
The number of threads in a particular stage in current month divided by number of threads in that stage in the previous month.
 
  
 
</font></div>
 
</font></div>
Line 166: Line 185:
  
 
<div style="border-style: solid; border-width:0; background: #3CB371; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Data Cleaning & Transforming</font></div>
 
<div style="border-style: solid; border-width:0; background: #3CB371; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Data Cleaning & Transforming</font></div>
 +
'''1) Removing Junk & Irrelevant Emails'''
 +
 +
Figure 3 - Junk Cleared Dataset
 +
 +
 +
'''2)Creating Dataset 1 -  For Identifying Stages and Conversion Rates'''
 +
 +
(To be referred to in the rest of the report as The Sales Stage Dataset)
 +
 +
 +
A. Following The Data Cleaning in Step 1. We wished to reduce the emails to unique threads, i.e. 1 record for each thread, instead of each email.
 +
 +
 +
Out of these 20090 unique salesperson – contact – subject thread email communications, there were some with and without replies. Further steps needed to be taken to remove irrelevant emails.
 +
 +
 +
B. Two key aspects were used to determine if each unique salesperson–contact communication is relevant:
 +
*Is there at least 1 reply within each unique salesperson – contact relationship?
 +
*Do the subject threads between each unique salesperson – contact relationship contains the following keywords (“Relationship analytics”, “POC”)?
 +
 +
 +
''Rationale:''
 +
The Sales cycle consists of 5 stages. Email subject headers may change during these stages. There are no specific templates or subject headers used except for the following 2 keywords (“Relationship analytics”, “POC”). There would also be some subject headers that required a reply and those that don’t. Our team has concluded that within each sales process, there should be a minimum of 1 reply between a salesperson and his contact. We have categorized such unique salesperson – contact relationship with at least 1 reply for any thread to be relevant.
 +
 +
 +
Figure 4: Email exchange between unique salesperson – contact with at least 1 reply
  
 +
 +
C. Although a unique salesperson – contact relationship may have at least 1 reply throughout their email exchange, their communication exchange may not be relevant to sales or the process of acquiring a potential client. TrustSphere has a number of partners whom are important to them. As such, this communication may be email exchanges between TrustSphere and their partners, which are irrelevant. For each unique salesperson – contact – thread record, our team has identified if the thread has one of the following keywords (“Relationship analytics”, “POC”). A unique salesperson – contact communication is only relevant if there is at least 1 reply and at least 1 thread with stated keywords throughout their communication exchange.
 +
 +
 +
Figure 5 - Email exchange between unique salesperson – contact with (i) at least 1 reply and (ii) at least 1 thread with the keywords (“relationship analytics”, “POC”)
 +
 +
 +
D. At the end of this steps, a total of 1814 / 20090 unique salesperson – contact – thread records are identified as relevant. These unique records contain a total of 5864 emails sent and received.
 +
Our next step is to look at each unique salesperson – contact communication and determine their sales stages. TrustSphere has provided the team with a list of their partners. For each of these 1814 unique salesperson – contact – thread records, out team has categorized them into the following groups:
 +
 +
#Prospecting
 +
#Meeting
 +
#POC
 +
#Technical visibility call/configuration call
 +
#Analysis session
 +
#Closing
 +
#Renewal
 +
 +
 +
The team will consult with TrustSphere to ensure these categorizations are accurate. The team will also delve into the use of text analytics for further insights and alternative categorizations methodologies.
 +
 +
 +
E. Final Dataset 1 (Sales Stage Report)
 +
 +
Figure 6 - Cleaned & Transformed Sales Stage Dataset
 +
 +
 +
*Salesperson: TrustSphere Sales Staff
 +
*Contact: Email address of person a salesperson is in communication with
 +
*Thread: ThreadSubject
 +
*Incoming count: Total count of inbound emails in the thread
 +
*Outgoing count: Total count of outbound emails in the thread
 +
*Date First In / Out: Date email is first sent or received for a particular thread
 +
*Date Last In: Date email is last received for a particular thread
 +
*Date Last Out: Date email is last sent for a particular thread
 +
*Average Response Time: Average time taken for an email reply
 +
*Total Response Time: Total time taken for all email replies
 +
*Relevant: True if a Salesperson – Contact communication has at least 1 reply in their entire email communication. False if otherwise.
 +
*Keyword: True if a Salesperson – Contact communication with at least 1 thread with the keywords “relationship analytics” or “poc” in their entire email communication. False if otherwise.
 +
*Partner: True if Contact is a TrustSphere partner. False if otherwise.
 +
*Phase: Stage of the sales process (Prospecting, Meeting, POC, Technical visibility call/configuration call, Analysis session, Closing, Renewal)
 +
 +
 +
'''3. Creating Dataset 2 - For Identifying Each Unique Relationship Relationship of a Salesperson (referred to as The Relationship Report)'''
 +
*The 36,886 record junk cleared email (step 1) dataset was plugged into JMP.
 +
*They were grouped by Receiver and Originator Address using summary tables to get a unique Originator-Reciever record with the frequency of emails sent from the Originator to the Receiver
 +
 +
Note: N rows here is essentially the ‘Outbound Count’. An important point to note here is that 2 record of each ‘relationship’ exist at the moment. A—>B and B—>A as highlighted in the above image.
 +
 +
 +
A. A column to calculate ‘Last Out’ (Number of Days since the last email was sent out from the originator to the receiver) was added into the original 36,886 record junk cleared email communications dataset which we started with in step 1. The formula of that column  –
 +
 +
 +
B. This data was then imported into excel and VLOOKUP was used to get reverse frequency (‘Inbound Count’) of emails i.e the number of emails the originator received from the receiver. 
 +
 +
 +
C. VLOOKUP was again used to determine ‘Last In’ (The number of days ago the originator received an email from the receiver). The MIN function was used to calculate ‘Last Contact’.
 +
 +
As you can see duplicate records still exist. Therefore our next step is to remove duplicate records to get one unique relationship record for each relationship.
 +
 +
 +
D. Data was plugged into JMP to remove duplicate records using the select rows function.
 +
 +
 +
E. The Final Dataset 2 - Relationship Report - 2242 Records
 +
*Originator Address: Person A in the relationship
 +
*Receiver Address: Person B in the relationship
 +
*Relationship Type: Whether it is an external/internal relationship
 +
*Outbound Count: Number of Emails sent from Person A to Person B (YTD till 31st August)
 +
*Inbound Count: Number of Emails received by Person A from Person B (YTD till 31st August).
 +
*Total Frequency: Sum of Inbound and Outbound Count. Total emails exchanged between Person A and Person B.
 +
*Last Out: Number of days ago from the present date the last email from sent from Person A to Person B. 
 +
*Last In: Number of days ago from the present date the last email was received by Person A from Person B.
 +
*Last Contact: The lesser of Last in and Last Out. The number of days ago from the present date the last email between Person A and Person B was sent or received.
 +
 +
</font></div>
  
 
<div style="border-style: solid; border-width:0; background: #3CB371; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Future Tasks and Deliverables</font></div>
 
<div style="border-style: solid; border-width:0; background: #3CB371; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>Future Tasks and Deliverables</font></div>
  
 +
</font></div>
  
 
<div style="border-style: solid; border-width:0; background: #3CB371; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>In Progress: Preliminary Calculations & Testing</font></div>
 
<div style="border-style: solid; border-width:0; background: #3CB371; padding: 7px; font-weight: bold; text-align:left; line-height: wrap_content; text-indent: 20px; font-size:20px; font-family:Century Gothic;border-bottom:5px solid white; border-top:5px solid black"><font color= #ffffff>In Progress: Preliminary Calculations & Testing</font></div>
 +
 +
</font></div>

Revision as of 23:57, 16 October 2016

MST Logo.jpeg

Home Team Project Overview Midterm Progress Final Progress Project Management Documentation


Recap & Objectives

Our sponsor, Trustsphere is a software company that provides relationship analytics solutions. Their products deliver insights that help clients across the globe improve key business issues including sales force effectiveness, enterprise-wide collaboration and corporate governance. The company engaged our team to utilize our technical and analytical capabilities to help them understand and tackle their business problem of little growth in sales and a longer than ideal sales cycle.

While the field of Sales Analytics has received plenty attention in the past, recent studies reveal that few companies have also delved into the area of Sales People Analytics. Salespeople communications to potential clients, especially in the B2B sphere, are wholly relied upon for marketing the company’s product. Furthermore, Steward et al. (2010) found that higher-performing salespeople also regularly activated their internal company networks, to coordinate a team of experts tailored to serve a particular customer. Just sales figures to evaluate salespeople performance covers a very narrow perspective as it disregards cycle time and in-progress pitches, therefore our team has defined our scope as to analyze the sales team’s internal and external communications to gain insight into their relationships with internal and external parties and to identify the sales stages that act as bottlenecks in the sales process.


Data Provided by Client

For this project, our team is working with two sets of data provided to us by TrustSphere:

A) Daily email communication data (main dataset)

This dataset contains year-to-date (up till 31 August 2016) records of daily email communication data of all 19 Trustsphere sales people across the globe. This data includes the following variables:

Date: Includes the date and time of a particular email being sent

Originator address: Sender email address

Recipient address: Receiver email address

Direction: Nature of communication (internal, inbound or outbound)

MsgID: Unique message ID of emails sent

Email Subject: Email subject header


Figure 1 - Original Dataset


B) Staff List

The dataset lists all of TrustSphere staff (57) with the following variables:


Name

Hierarchy

Department

Position

Location


We were also provided with a Relationship dataset, as mentioned previously in the proposal, which contained individual records of salespeople relationships – however we are not using this dataset in any of our analyses.


Exploratory Research & Revision of Scope

Our research objectives under the ‘Scope of Work’ section of our proposal remain largely unchanged. However, we have decided to remove most of the social network visualisations as TrustSphere is already working on that area.

We also had the opportunity to speak to Ms. Annabel Koh from the Sales Department. She provided us with a clearer idea of the sales stages so we could incorporate their importance (weightage column) into our analysis of the subject headers.

Figure 2 - Sales stages and its importance in the sales cycle.

She also gave us insight into other areas she thought would be useful to know for the Sales Department:


a. Time taken to transit between sales stages

b. Conversion rates

  • Response rate of prospects into meetings
  • Conversion rate from meetings into Proof of Concept (POC) trials

c. Amount of time spent communicating internally

  • Worries of over-collaboration

d. Overlap of relationships

  • Are there different salespeople pursuing/serving the same account?


Finalised Scope of Work & Analysis Metrics
Analysis Metrics for Each Salesperson


MST Salesperson Metrics 1.jpeg



Analysis Metrics Across the Entire Sales Team


MST SalesTeam Metrics 1.jpeg




Calculations of Metrics

Each of these metrics will be calculated on for each month from January 1st to August 31st to establish time series trends.

A) Relationship Score

The relationship score is given to each record in the Relationship Report and is to determine the strength of the relationship between a salesperson and an External/Internal party. The score focuses on two factors are Frequency of communication and Recency of communication.


Purpose:

1. To evaluate the relationship performance in terms of both building and maintaining them with internal and external parties.

2. To evaluate the balance between internal and external relationships maintained by the salesperson


Formulation:

  • Overall Relationship Score (ORS): Calculated for each internal and external relationship of the salesperson. Both X & Y variables will be standardized to give each equal weightage.
  • X: Total frequency of emails exchanged since Jan 1 2016 between the salesperson and the other party
  • Y: The number of days since the last communication with the other party


Formula = X+1/Y


Explanation:

1/Y in this case is the recency factor as the number of days since the last contact will inversely affect the relationship score. (A higher number of days since last spoken will mean a lower relationship score) Total External to Internal Network Ratio of each salesperson:

ΣORSE /ΣORSI

ORSE: External Relationship Score, ORSI: Internal Relationship Score)

Therefore, Sum of All External Relationship Scores divided by the Sum of All Internal Relationship Scores.


B) Hot & Cold Relationships

The client advised us that a relationship can be considered considered Cold if no contact was made within 30 days and Hot if contact was made within the past 30 days. Therefore hot and cold relationships are a binary classification in this case. There can be cases in which people have not communicated within 30 days for a legitimate reason such as the contact being out of office etc. But we felt that 30 days is a lenient assumption for it is unusual that for an active conversations to not be for 30 days.

C) Strong & Weak Relationships

Strong and Weak Relationships will be determined by comparing monthly relationship scores to the average baseline relationship score for January for all salespeople. Strong relationships in this case would be ones with above average relationship score and weak would be ones with below average relationship score.

D) Sales Stages

The absolute number of email threads belonging to each Sales Stage in the current month. (Determined by Subject Header Data)

E) Sales Conversion Rate

The number of threads in a particular stage in current month divided by number of threads in that stage in the previous month.


Data Cleaning & Transforming

1) Removing Junk & Irrelevant Emails

Figure 3 - Junk Cleared Dataset


2)Creating Dataset 1 - For Identifying Stages and Conversion Rates

(To be referred to in the rest of the report as The Sales Stage Dataset)


A. Following The Data Cleaning in Step 1. We wished to reduce the emails to unique threads, i.e. 1 record for each thread, instead of each email.


Out of these 20090 unique salesperson – contact – subject thread email communications, there were some with and without replies. Further steps needed to be taken to remove irrelevant emails.


B. Two key aspects were used to determine if each unique salesperson–contact communication is relevant:

  • Is there at least 1 reply within each unique salesperson – contact relationship?
  • Do the subject threads between each unique salesperson – contact relationship contains the following keywords (“Relationship analytics”, “POC”)?


Rationale: The Sales cycle consists of 5 stages. Email subject headers may change during these stages. There are no specific templates or subject headers used except for the following 2 keywords (“Relationship analytics”, “POC”). There would also be some subject headers that required a reply and those that don’t. Our team has concluded that within each sales process, there should be a minimum of 1 reply between a salesperson and his contact. We have categorized such unique salesperson – contact relationship with at least 1 reply for any thread to be relevant.


Figure 4: Email exchange between unique salesperson – contact with at least 1 reply


C. Although a unique salesperson – contact relationship may have at least 1 reply throughout their email exchange, their communication exchange may not be relevant to sales or the process of acquiring a potential client. TrustSphere has a number of partners whom are important to them. As such, this communication may be email exchanges between TrustSphere and their partners, which are irrelevant. For each unique salesperson – contact – thread record, our team has identified if the thread has one of the following keywords (“Relationship analytics”, “POC”). A unique salesperson – contact communication is only relevant if there is at least 1 reply and at least 1 thread with stated keywords throughout their communication exchange.


Figure 5 - Email exchange between unique salesperson – contact with (i) at least 1 reply and (ii) at least 1 thread with the keywords (“relationship analytics”, “POC”)


D. At the end of this steps, a total of 1814 / 20090 unique salesperson – contact – thread records are identified as relevant. These unique records contain a total of 5864 emails sent and received. Our next step is to look at each unique salesperson – contact communication and determine their sales stages. TrustSphere has provided the team with a list of their partners. For each of these 1814 unique salesperson – contact – thread records, out team has categorized them into the following groups:

  1. Prospecting
  2. Meeting
  3. POC
  4. Technical visibility call/configuration call
  5. Analysis session
  6. Closing
  7. Renewal


The team will consult with TrustSphere to ensure these categorizations are accurate. The team will also delve into the use of text analytics for further insights and alternative categorizations methodologies.


E. Final Dataset 1 (Sales Stage Report)

Figure 6 - Cleaned & Transformed Sales Stage Dataset


  • Salesperson: TrustSphere Sales Staff
  • Contact: Email address of person a salesperson is in communication with
  • Thread: ThreadSubject
  • Incoming count: Total count of inbound emails in the thread
  • Outgoing count: Total count of outbound emails in the thread
  • Date First In / Out: Date email is first sent or received for a particular thread
  • Date Last In: Date email is last received for a particular thread
  • Date Last Out: Date email is last sent for a particular thread
  • Average Response Time: Average time taken for an email reply
  • Total Response Time: Total time taken for all email replies
  • Relevant: True if a Salesperson – Contact communication has at least 1 reply in their entire email communication. False if otherwise.
  • Keyword: True if a Salesperson – Contact communication with at least 1 thread with the keywords “relationship analytics” or “poc” in their entire email communication. False if otherwise.
  • Partner: True if Contact is a TrustSphere partner. False if otherwise.
  • Phase: Stage of the sales process (Prospecting, Meeting, POC, Technical visibility call/configuration call, Analysis session, Closing, Renewal)


3. Creating Dataset 2 - For Identifying Each Unique Relationship Relationship of a Salesperson (referred to as The Relationship Report)

  • The 36,886 record junk cleared email (step 1) dataset was plugged into JMP.
  • They were grouped by Receiver and Originator Address using summary tables to get a unique Originator-Reciever record with the frequency of emails sent from the Originator to the Receiver

Note: N rows here is essentially the ‘Outbound Count’. An important point to note here is that 2 record of each ‘relationship’ exist at the moment. A—>B and B—>A as highlighted in the above image.


A. A column to calculate ‘Last Out’ (Number of Days since the last email was sent out from the originator to the receiver) was added into the original 36,886 record junk cleared email communications dataset which we started with in step 1. The formula of that column –


B. This data was then imported into excel and VLOOKUP was used to get reverse frequency (‘Inbound Count’) of emails i.e the number of emails the originator received from the receiver.


C. VLOOKUP was again used to determine ‘Last In’ (The number of days ago the originator received an email from the receiver). The MIN function was used to calculate ‘Last Contact’.

As you can see duplicate records still exist. Therefore our next step is to remove duplicate records to get one unique relationship record for each relationship.


D. Data was plugged into JMP to remove duplicate records using the select rows function.


E. The Final Dataset 2 - Relationship Report - 2242 Records

  • Originator Address: Person A in the relationship
  • Receiver Address: Person B in the relationship
  • Relationship Type: Whether it is an external/internal relationship
  • Outbound Count: Number of Emails sent from Person A to Person B (YTD till 31st August)
  • Inbound Count: Number of Emails received by Person A from Person B (YTD till 31st August).
  • Total Frequency: Sum of Inbound and Outbound Count. Total emails exchanged between Person A and Person B.
  • Last Out: Number of days ago from the present date the last email from sent from Person A to Person B.
  • Last In: Number of days ago from the present date the last email was received by Person A from Person B.
  • Last Contact: The lesser of Last in and Last Out. The number of days ago from the present date the last email between Person A and Person B was sent or received.
Future Tasks and Deliverables
In Progress: Preliminary Calculations & Testing