Difference between revisions of "ANLY482 AY2017-18T2 Group30 Youtube"

From Analytics Practicum
Jump to navigation Jump to search
(Created page with "<!--Team Logo--> center|300px| <!--End of Team Logo--> <br/> <!--Main Navigation--> <center> {|style="background-color:#5A6B96; color:#5A6B96; width=...")
 
Line 49: Line 49:
 
<div align="center">
 
<div align="center">
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Data Source</font></div>
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Data Source</font></div>
<div style="width:90%;">
+
</div>
<font style="text-align: left">
 
<p>
 
<b>Facebook</b><br>
 
For data files from <i>Facebook Insights Data Export (Post Level)</i>, the sponsor provided exported data from different periods of the year, with different metric tabs in Excel format.  The tabs included are:
 
# Key Metrics
 
# Lifetime: Number of unique people who have created a story about your Page post by interacting with it (unique users)
 
# Lifetime: Number of people who have clicked anywhere in your post, by type (unique users)
 
# Lifetime: Number of people who have given negative feedback on your post, by type (unique users)
 
<br></p>
 
  
<font style="text-align: left">
+
<div align="left">
<p>
+
<div style=" width: 85%; padding:75px; font-family: Arimo; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px;">
For data files from <i>Facebook Insights Data Export (Video Post)</i>, the sponsor provided exported data from different periods of the year, with different metric tabs in Excel format.  The tabs included are:
+
<font>
# Lifetime Post Total Impression/Reach/Views
+
For data files from YouTube(Watch Time), the sponsor provided exported data for Watch Time, with different metric tabs in Excel format. The tabs included are:
# Geographic Views
 
# Demographic Views
 
# Lifetime Post Toal Views by (page_owned / Shared)
 
</p></font>
 
 
 
<div style="width:90%;">
 
<font style="text-align: left">
 
<b>YouTube</b><br/>
 
For data files from <i>YouTube(Watch Time)</i>, the sponsor provided exported data for Watch Time, with different metric tabs in Excel format. The tabs included are:
 
 
# Video
 
# Video
 
#Geography
 
#Geography
Line 82: Line 64:
 
#Video Information Language
 
#Video Information Language
 
<br>
 
<br>
For data files from <i>YouTube(Demographics)</i>, the sponsor provided exported data for watch time for different Demographic, with different metric tabs in Excel format. The tabs included are:
+
For data files from YouTube(Demographics), the sponsor provided exported data for watch time for different Demographic, with different metric tabs in Excel format. The tabs included are:
 
# Viewer Age
 
# Viewer Age
 
# Viewer Gender
 
# Viewer Gender
 
<br>
 
<br>
For data files from <i>YouTube(Traffic Sources)</i>, the sponsor provided exported data for watch time from different traffic source type
+
For data files from YouTube(Traffic Sources), the sponsor provided exported data for watch time from different traffic source type
</font>
 
</div>
 
 
 
<br/><b>Instagram</b><br/>
 
To retrieve data from the company's instagram, we made use of a web-scraping script from [https://github.com/timgrossmann/instagram-profilecrawl Github]. We made modifications to the script to include timestamp as well as caption, the data includes:
 
* Caption
 
* Timestamp
 
* Img URL
 
* Tags
 
* No. of Likes
 
* No. of Comments
 
 
 
<br/><b>Blog</b><br/>
 
To retrieve data from the company's posts, we used [https://scrapy.org/ Scrapy], a fast and powerful open-sourced web-scraper to extract data from the blog. We collected data from the beginning of the first blog post, with the following information:
 
* Timestamp
 
* Author(s)
 
* Headline
 
* Category
 
* URL
 
* Tags
 
 
 
 
</font></div>
 
</font></div>
<br/>
 
  
 
<div align="center">
 
<div align="center">
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Data Preparation</font></div>
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Data Preparation</font></div>
 
<div style="width:90%;">
 
<div style="width:90%;">
<font style="text-align: left">
+
</div>
<p>
+
<div align="left">
To help us have an overview of the data throughout the year, we consolidated the various tabs, whilst concatenating the various periods of data for the same columns, into one combined file. This was carried out using the software, IBM JMP Pro, in the following steps:  
+
<div style=" width: 85%; padding:75px; font-family: Arimo; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px;">
* With Post ID, Permalink (permanent link of the campaign content), Post Message, Type, Countries and Posted columns as key identifiers among the different tabs for the excel files, we appended desired columns from the other tabs to the end of the Key Metrics. They included the Share, Like, Comment columns from Tab 2; Other Clicks, Link Clicks, Photo View, Video Play columns from Tab 3; Hide_Clicks , Hide_all_clicks, Unlike_page_clicks, report_spam_clicks columns from Tab 4. <br>This was conducted using the <i>Tables > Join </i>function, with “Matching Specification” as the key identifiers and “Output Columns” of the appended desired columns.
+
<font>
 
+
<!---------Enter Text Here ------->
* Next, for each period of data files (appended with new columns) from multiple tabs, we concatenate the data across different time periods to have a full year collection of data.<br>This was conducted using the <i>Tables > Concatenate </i> function, while adding multiple data tables into “Data Tables to be Concatenated”.
+
</font>
 
+
</div>
* Finally, we check for missing data in the different columns. For example, under the column Type, we have five different types, namely: Link, Photo, Shared Video, Status and Video. However, in the instances of missing data, we will cross check with the permalink of the campaign post, and check the Type of medium was posted and fill it in accordingly.
+
</div>
</p>
 
</font></div>
 
<br/>
 
  
 
<div align="center">
 
<div align="center">
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Data Cleaning</font></div>
+
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Exploratory Data Analysis</font></div>
<div style="width:90%;">
+
</div>
 
 
 
<div align="left">
 
<div align="left">
<b>Instagram Data</b><br/>
+
<div style=" width: 85%; padding:75px; font-family: Arimo; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px;">
After scraping the data, we realised that the data needed cleaning. The indexes of the column values were off as seen here:
+
<font>
(image)
+
<!---------Enter Text Here ------->
We also concatenated the "tags" into a single column.
+
</font>
</div></div>
+
</div>
 
+
</div>
 
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Exploratory Data Analysis</font></div>
 
 
<br/>
 
<br/>
  
 
+
<div align="center">
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Final Application: Learning Dashboard</font></div>
 
<div style=" width: 85%; background: #E6EDFA; padding: 12px; font-family: Arimo; font-size: 18px; font-weight: bold; line-height: 1em; text-indent: 15px; border-left: #8c8d94 solid 32px;"><font color="#5A6B96">Final Application: Learning Dashboard</font></div>
 +
</div>
 +
<div align="left">
 +
<div style=" width: 85%; padding:75px; font-family: Arimo; font-size: 14px; font-weight: bold; line-height: 1em; text-indent: 15px;">
 +
<font>
 +
<!---------Enter Text Here ------->
 +
</font>
 +
</div>
 +
</div>
 
<br/>
 
<br/>
  
 
</div>
 
</div>

Revision as of 16:30, 7 February 2018

APex Logo.PNG


HOME ABOUT US PROJECT OVERVIEW PROJECT FINDINGS PROJECT MANAGEMENT DOCUMENTATION MAIN PAGE
Facebook Post Facebook Video Youtube Instagram Blog Post


Data Source

For data files from YouTube(Watch Time), the sponsor provided exported data for Watch Time, with different metric tabs in Excel format. The tabs included are:

  1. Video
  2. Geography
  3. Date
  4. Subscription Status
  5. Youtube Product
  6. Device Type
  7. Subtitles and CC
  8. Video Information Language


For data files from YouTube(Demographics), the sponsor provided exported data for watch time for different Demographic, with different metric tabs in Excel format. The tabs included are:

  1. Viewer Age
  2. Viewer Gender


For data files from YouTube(Traffic Sources), the sponsor provided exported data for watch time from different traffic source type

Data Preparation

Exploratory Data Analysis


Final Application: Learning Dashboard