Difference between revisions of "AY1718 T2 Group21 Techonology"

Latest revision as of 22:00, 25 February 2018

HOME

ABOUT US

Motivation & Objectives

Data

Methodology

Data Source

As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.

Data Dictionary

Data Preparation

Data Preparation was a tedious and almost entirely manual process. As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data. The Google Analytics API itself for Python was accessed and brief test queries were written to extract data too; alongside we also tried to connect out data set via a Power BI extension. Both these methods failed to achieve the desired results, and our hypothesis is that this happened because Google Analytics has no separate Event tabs earmarked, that give us a way to extract the exact pattern of websites visited by each customer and user.

There were two main sets of data that were extracted finally, and the overarching steps taken for each are:

1. Information on all Users: Dataset #1

No.	Step
1.	Identify key information that we want to know about each User and list all possible subsets of each user
2.	Export list of ClientIDs that are associated with each subset and label the subset
3.	Combine all UserIDs with respective key information
4.	Combine all information and information on subset into one data set

2. Information on all Customers: Dataset #2

No.	Step
1.	Filter Users to obtain all ClientIDs that
2.	Export list of ClientIDs that are associated with each subset and label the subset
3.	Combine all UserIDs with respective key information
4.	Combine all information and information on subset into one data set

NOTE: For information on Data Cleaning process, proceed to Interim Section or click here

@@ Line 43: / Line 43: @@
 <div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Source</strong></font></div></div>
-TO BE UPDATED
+As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.
 <br>
 <br/>
 <div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Dictionary</strong></font></div></div>
-TO BE UPDATED
+<br>
+[[File:AY1718 T2 Group21 Metadata 25Feb.PNG | 1000px | center ]]
 <br>
 <br/>
 <div style="background: #688E26; line-height: 0.3em; font-family:Century Gothic;  border-left: #FAA613 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#000000"><strong>Data Preparation</strong></font></div></div>
-TO BE UPDATED
+<br>
+Data Preparation was a tedious and almost entirely manual process. As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.
+The Google Analytics API itself for Python was accessed and brief test queries were written to extract data too; alongside we also tried to connect out data set via a Power BI extension.
+Both these methods failed to achieve the desired results, and our hypothesis is that this happened because Google Analytics has no separate Event tabs earmarked, that give us a
+way to extract the exact pattern of websites visited by each customer and user.
+There were two main sets of data that were extracted finally, and the overarching steps taken for each are:
+<br>
+<strong>1. Information on all Users: Dataset #1 </strong>
+<table class="wikitable centered" width="70%" color="blue">
+  <tr>
+  <th>No. </th>
+  <th >Step</th>
+  </tr>
+  <tr>
+  <td>1. </td>
+  <td> Identify key information that we want to know about each User and list all possible subsets of each user</td>
+  </tr>
+  <tr>
+  <td>2. </td>
+  <td> Export list of ClientIDs that are associated with each subset and label the subset</td>
+  </tr>
+  <tr>
+  <td>3. </td>
+  <td> Combine all UserIDs with respective key information </td>
+  </tr>
+  <tr>
+  <td>4. </td>
+  <td> Combine all information and information on subset into one data set </td>
+  </tr>
+  </table>
+<br>
+<strong>2. Information on all Customers: Dataset #2</strong>
+<table class="wikitable centered" width="70%" color="blue">
+  <tr>
+  <th >No. </th>
+  <th >Step</th>
+  </tr>
+  <tr>
+  <td>1. </td>
+  <td> Filter Users to obtain all ClientIDs that</td>
+  </tr>
+  <tr>
+  <td>2. </td>
+  <td> Export list of ClientIDs that are associated with each subset and label the subset</td>
+  </tr>
+  <tr>
+  <td>3. </td>
+  <td> Combine all UserIDs with respective key information </td>
+  </tr>
+  <tr>
+  <td>4. </td>
+  <td> Combine all information and information on subset into one data set </td>
+  </tr>
+  </table>
+<br>
+<br>
+NOTE: For information on Data Cleaning process, proceed to Interim Section or click [[AY1718 T2 Group21 Midterm Findings | <font color="#000000"><strong>here</strong></font>]]
 <br>
 <br/>

Difference between revisions of "AY1718 T2 Group21 Techonology"

Latest revision as of 22:00, 25 February 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools