Difference between revisions of "AY1718 T2 Group21 Techonology"

From Analytics Practicum
Jump to navigation Jump to search
 
Line 43: Line 43:
  
 
<div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Source</strong></font></div></div>
 
<div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Source</strong></font></div></div>
TO BE UPDATED
+
As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.
 
<br>
 
<br>
 
<br/>
 
<br/>
  
 
<div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Dictionary</strong></font></div></div>
 
<div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Dictionary</strong></font></div></div>
TO BE UPDATED
+
<br>
 +
 
 +
[[File:AY1718 T2 Group21 Metadata 25Feb.PNG | 1000px | center ]]
 
<br>
 
<br>
 
<br/>
 
<br/>
  
 
<div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Preparation</strong></font></div></div>
 
<div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Preparation</strong></font></div></div>
TO BE UPDATED
+
<br>
 +
Data Preparation was a tedious and almost entirely manual process. As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.
 +
The Google Analytics API itself for Python was accessed and brief test queries were written to extract data too; alongside we also tried to connect out data set via a Power BI extension.
 +
Both these methods failed to achieve the desired results, and our hypothesis is that this happened because Google Analytics has no separate Event tabs earmarked, that give us a
 +
way to extract the exact pattern of websites visited by each customer and user.
 +
 
 +
There were two main sets of data that were extracted finally, and the overarching steps taken for each are:
 +
<br>
 +
 
 +
<strong>1. Information on all Users: Dataset #1 </strong>
 +
<table class="wikitable centered" width="70%" color="blue">
 +
  <tr>
 +
  <th>No. </th>
 +
  <th >Step</th>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>1. </td>
 +
  <td> Identify key information that we want to know about each User and list all possible subsets of each user</td>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>2. </td>
 +
  <td> Export list of ClientIDs that are associated with each subset and label the subset</td>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>3. </td>
 +
  <td> Combine all UserIDs with respective key information </td>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>4. </td>
 +
  <td> Combine all information and information on subset into one data set </td>
 +
  </tr>
 +
  </table>
 +
 
 +
<br>
 +
 
 +
<strong>2. Information on all Customers: Dataset #2</strong>
 +
<table class="wikitable centered" width="70%" color="blue">
 +
  <tr>
 +
  <th >No. </th>
 +
  <th >Step</th>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>1. </td>
 +
  <td> Filter Users to obtain all ClientIDs that</td>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>2. </td>
 +
  <td> Export list of ClientIDs that are associated with each subset and label the subset</td>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>3. </td>
 +
  <td> Combine all UserIDs with respective key information </td>
 +
  </tr>
 +
 
 +
  <tr>
 +
  <td>4. </td>
 +
  <td> Combine all information and information on subset into one data set </td>
 +
  </tr>
 +
  </table>
 +
 
 +
<br>
 +
<br>
 +
 
 +
NOTE: For information on Data Cleaning process, proceed to Interim Section or click [[AY1718 T2 Group21 Midterm Findings | <font color="#000000"><strong>here</strong></font>]]
 +
 
 +
 
 
<br>
 
<br>
 
<br/>
 
<br/>

Latest revision as of 22:00, 25 February 2018

AY1718 T2 Group21 Logo.png

HOME

ABOUT US

PROJECT OVERVIEW

FINDINGS

DOCUMENTATION

PROJECT MANAGEMENT

BACK TO PROJECTS

Motivation & Objectives

Data

Methodology



Data Source

As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.

Data Dictionary


AY1718 T2 Group21 Metadata 25Feb.PNG



Data Preparation


Data Preparation was a tedious and almost entirely manual process. As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data. The Google Analytics API itself for Python was accessed and brief test queries were written to extract data too; alongside we also tried to connect out data set via a Power BI extension. Both these methods failed to achieve the desired results, and our hypothesis is that this happened because Google Analytics has no separate Event tabs earmarked, that give us a way to extract the exact pattern of websites visited by each customer and user.

There were two main sets of data that were extracted finally, and the overarching steps taken for each are:

1. Information on all Users: Dataset #1

No. Step
1. Identify key information that we want to know about each User and list all possible subsets of each user
2. Export list of ClientIDs that are associated with each subset and label the subset
3. Combine all UserIDs with respective key information
4. Combine all information and information on subset into one data set


2. Information on all Customers: Dataset #2

No. Step
1. Filter Users to obtain all ClientIDs that
2. Export list of ClientIDs that are associated with each subset and label the subset
3. Combine all UserIDs with respective key information
4. Combine all information and information on subset into one data set



NOTE: For information on Data Cleaning process, proceed to Interim Section or click here