Difference between revisions of "APA Feature Engineering"
Jump to navigation
Jump to search
Line 26: | Line 26: | ||
|} | |} | ||
<br> | <br> | ||
− | + | <div style="text-align: center;">[[Image:FeatureEngg.PNG|900px]]</div><br><br> | |
+ | ---- | ||
<P> | <P> | ||
'''Subject Line weightage:''' | '''Subject Line weightage:''' | ||
Line 42: | Line 43: | ||
<div style="text-align: center;">[[Image:Subjectlineweightagescreenshot.jpg|1000px]]</div> | <div style="text-align: center;">[[Image:Subjectlineweightagescreenshot.jpg|1000px]]</div> | ||
<br> | <br> | ||
− | + | ||
'''Email Exchange Ratio:'''<br> | '''Email Exchange Ratio:'''<br> | ||
This metric will show the number of emails exchanged between the two employees as a ratio of the total number of emails exchanged by these employees. | This metric will show the number of emails exchanged between the two employees as a ratio of the total number of emails exchanged by these employees. | ||
Line 50: | Line 51: | ||
</div> | </div> | ||
<div style="text-align: center;">[[Image:EmailExchangeRatioResults.jpg|400px]]</div> | <div style="text-align: center;">[[Image:EmailExchangeRatioResults.jpg|400px]]</div> | ||
+ | <br> | ||
+ | '''Average Email Exchange Size:'''<br> | ||
+ | This metric takes the average of email sizes of all the emails exchanged between two employees A and B. | ||
+ | <div style="text-align: center;"> | ||
+ | [[Image:EmailexSizeFormula.PNG|600px]] | ||
+ | [[Image:EmailexSizeSQL.png|600px]] | ||
+ | [[Image:EmailexSizeResults.png|400px]]</div> |
Revision as of 20:20, 22 February 2017
Subject Line weightage: We will be using subject line weightage as one of the components in determining how important and relevant a single email exchange is to the business. Our approach will be as follows:
- First run an analysis on all the terms occurring in the entire dataset
- This analysis will filter out common words, prepositions and other unimportant words that could potentially skew the results.
- The analysis will return a listen of words along with the frequency of the term’s occurrence in the dataset.
- Based on the results obtained, we would like to calculate the tf-idf of each term.
- tf: how often does the term occur in the document
- idf: how often does the term occur in other documents
- tf-idf will allow us to find the most important terms in the set of documents
- Using the value of this tf-idf, we will assign each term a weightage based on how important it is in determining the importance to the business
- Each subject line of an email will then have an aggregated weightage of the terms appearing in itself.
Email Exchange Ratio:
This metric will show the number of emails exchanged between the two employees as a ratio of the total number of emails exchanged by these employees.
Average Email Exchange Size:
This metric takes the average of email sizes of all the emails exchanged between two employees A and B.