AY1718 T2 Group21 Techonology
As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.
Data Preparation was a tedious and almost entirely manual process. As all data was available on Google Analytics and did not allow us to automatically extract the information - we had to design how to present the primary data.
The Google Analytics API itself for Python was accessed and brief test queries were written to extract data too; alongside we also tried to connect out data set via a Power BI extension.
Both these methods failed to achieve the desired results, and our hypothesis is that this happened because Google Analytics has no separate Event tabs earmarked, that give us a
way to extract the exact pattern of websites visited by each customer and user.
There were two main sets of data that were extracted finally, and the overarching steps taken for each are:
1. Information on all Users: Dataset #1
No. | Step |
---|---|
1. | Identify key information that we want to know about each User and list all possible subsets of each user |
2. | Export list of ClientIDs that are associated with each subset and label the subset |
3. | Combine all UserIDs with respective key information |
4. | Combine all information and information on subset into one data set |
2. Information on all Customers: Dataset #2
No. | Step |
---|---|
1. | Filter Users to obtain all ClientIDs that |
2. | Export list of ClientIDs that are associated with each subset and label the subset |
3. | Combine all UserIDs with respective key information |
4. | Combine all information and information on subset into one data set |
NOTE: For information on Data Cleaning process, proceed to Interim Section or click here