IS480 Team wiki: 2017T1 ELIXIR FinalWiki
- 1 Project Progress Summary
- 2 Project Management
- 3 Technical Complexity
- 4 Quality of product
- 5 Intermediate Deliverables
- 6 User Testing
Project Progress Summary
Project Athena is an intranet solution. Demo will be performed by remote access into TSL.
- User Acceptance Test 3 was completed on 2 November
- Implemented Voting module (Up vote/Down vote for each post) in Sprint 11 (2 Oct - 15 Oct)
- Implemented Client Reporting module for TSL Facebook, Instagram and YouTube page in Sprint 12 (16 Oct - 29 Oct)
- Implemented Telegram Bot to send daily reports containing new posts and Top 10 Viral posts across platforms and in Sprint 12-13 (16 Oct - 12 Nov)
- Integrated Telegram Bot to automatically send daily reports to 90 employees at the end of Sprint 13
- Integrated Google Analytics to track user activities at the end of Sprint 13
Project Schedule has no changes since Mid Terms.
Burndown Charts (Mid Terms - Finals)
- Y-axis: Number of story points remaining
Sprint 12 has a larger deviance between Target and Ideal Burndown because the team was implementing a module with a relatively larger scope - Client Reporting module. The module involves the integration of retrieving post and video statistics from TSL pages (across Facebook, Instagram and YouTube), which took the team a longer time to implement than planned. However, the team still managed to deliver client reporting on time by the end of Sprint 12.
Overview of all Burndown Charts
Sprint 3 - 7: Before Acceptance
Sprint 8 - 10: Before MidTerms
Sprint 10 - 13: Before Finals
We had no major change(s) after Mid Terms.
Changes After Mid Terms
|Sprint No.||Date||Category||Change Request||Issuer||Description||Decision||Action Taken||Status|
|13||2 Nov||User Interface||For client reporting module, retrieve data and group them under to Reach, Engagement and Views.||Sponsor||This is important as we have consider the right parameters when retrieving information from TSL pages.||Accept||Implementation to take place in Sprint 12.||Closed|
|13||2 Nov||User Experience||At the client reporting page, highlight to user if the URL entered is correct or wrong, according to the chosen platform||Sponsor||This is for users to realise if their input are possibly valid or invalid (red: wrong, yellow: warning, green: correct)||Accept||Implementation to take place in Sprint 13 and assigned to Jia Kai.||Closed|
Significant Changes in Project Athena
|Sprint No.||Date||Category||Change Request||Issuer||Description||Decision||Action Taken||Status|
|7||19 August||Functionality||Crawler Module (Advanced): Enhancement of Instagram & Website Crawler to enable pagination crawl||Jia Kai||This enhancement will enable a larger number of posts to be crawled at one time||Accept||Updated project schedule to take in this change. Features to be implemented in Sprint 8.||Closed|
|7||28 August||Functionality||Crawler Module: Facebook crawler will be dropped from project scope||Sponsor||Unable to get Facebook Access Token for authorised use of Graph API for Facebook crawling||Accept||Updated project scope take in this change. Informed Supervisor about this change.||Closed|
|8||28 August||Functionality||Crawler Module (Advanced): Implement batch call for RSS websites||Jia Kai||This is to reduce number of calls on Graph API||Accept||Updated project schedule to take in this change. Task will take place in Sprint 9.||Closed|
Top 3 Risks
|S/N||Risk Type||Risk Event||Likelihood||Impact||Level||Mitigation|
|1||Resource Risk||API Failure||Low||High||B||Check for API updates across all platforms and legacy issues regularly|
|2||Technical Risk||Overloading of crawler when there are too many sites which crawler is unable to support.||Medium||Medium||B||Implement multithreading and a effective queue management where server prioritises sites to crawl|
|3||Technical Risk||Benchmarking being set too high which will lead to viral post being missed out on||Medium||High||A||Threshold is set to be lower initially. (10%) so that we do not miss out on viral post. Subsequent adjustments will be made to the threshold.|
In our Mid Terms wiki, we introduced technical complexity regarding the Instagram API, YouTube API, Websites RSS Standardisation, Web Site Job Batching and using SQL statements for calculating statistics. For Finals wiki, the technical complexity for Telegram Bot and Client Reporting will be highlighted.
Requirement: Send a Daily Posts Summary Report on the top 10 viral posts and new posts for each platform in the last 24 hours (i.e. Telegram Report) on Telegram in PDF at 9.00 AM.
Problem Statement: While we want to create user friendly reports, we were faced with unfamiliar APIs (Telegram Bot API, iTextPDF, J2HTML).
We decided that we want to include tables, headings, content page, and links so that reader can read each posts more easily, know which part of the report they are looking at, and able to visit the post with a click on the title on the PDF. There is only 1 mature Java library that is used to generate PDF, i.e. iTextPDF. Due to our unfamiliarity in the library, it would be very time consuming to even change the alignment of texts let alone adding tables and positioning the tables. Hence, we decided to format out report using HTML before converting the HTML code into a PDF.
It was tedious, error-prone, and time consuming to use Strings to write HTML code as we need to consider the syntax. We decided to use a mature Java library, J2HTML, to write HTML file so that we rely on the library for the syntax and focus solely on the content formatting.
We needed to include a content page in the Telegram Report as most of the time the number of daily new posts can go up to hundreds. Having a content page allows reader to click on the section and goes to the selected page.
To generate this content page, we need to generate the sections as reports before merging all into one final report. The relevant data (viral posts and new posts in the last 24 hours) is retrieved from the database before formatting these information on HTML for each of the sections.
The technical challenges of our unfamiliarity with iTextPDF makes this implementation complex as we need to use the specific method to get the number of pages of the sub-sections and to implement the function to link a click on the sections on the content page to the specified page number.
Another implementation we must make is to send the report via a Telegram Bot. Telegram requires us to follow their set of Telegram HTTP APIs to do so. We coded our own class object that make use of the HTTP APIs to send the report via a POST request.
Requirement: Based on the different media campaigns, gather the required statistics demonstrating the overall media reach and engagement. These statistics are required to be formatted into a report for user to download as a PDF file (most used document format). This file could then be easily shared with TSL clients, showing the result of the client campaigns.
Problem Statement: Retrieving statistics of posts and videos from TSL Facebook Page using the Facebook Graph API was new to the team. For user experience, a dynamic form field was required as a client can have multiple campaigns, each advertised on a different platform from another.
The complexity faced is highly related to the user experience with the input forms and the PDF report. We wanted to provide dynamic fields allowing users to add and remove fields as needed. We used a looping method to create multiple fields with different item id. However, problems arise when users try to delete fields in the middle of the list after adding. In this case, it is easy to replace all the ids of all the field items on the screen. The team finds this methodology to be inefficient as a lot of operations will be triggered every time a user deletes a field in the middle. After much research, we figured that the best solution is to use a field array, which allows provides easy manipulation of the fields in the array. With this, we are able to achieve an input form which minimises chances of entering wrong fields to the wrong platforms and also allow users the flexibility to use shortcuts.
Dynamic field arrays:
Secondly, despite the assistance of the API documentation and Graph API Explorer, the usage of Facebook Graph API and its token came as a challenge. This was due to the many fields being needed for the statistics, which required tweaking and familiarisation with the query language of Facebook Graph API. For example, the usage of limit(0).summary(true) to get the compiled total value or the wrong usage of insights object.
Example of improper usage of 'Insights':
Lastly, the technical challenge we had to overcome was the unfamiliarity with iTextPDF. After explorations, we figured to use the specific method to get the number of pages of the sub-sections and to implement the function to link a click on the sections on the content page to the specified page number.
Quality of product
Master Slave Architecture for Scalability
To allow our server to handle the increase in sites that our client will need in the future as more sites should be monitored, we had to ensure that it should be scalable. This is to avoid the singular machine to overwork or require more resources than necessary to get high end computers to be able to achieve the outcome.
Therefore, Project Athena is designed with a Master Slave Architecture. The server will provide web services to the crawlers which are essentially the codes that is able to monitor any platform and type of jobs that comes its way. The crawlers do not take much memory and is small scaled. A singular computer can run many instances of the crawler depending on the capabilities of the hardware on that computer. The server will not do the automated crawling and will focus its role as the allocator, hence the ‘Master’. Below is an abstract of how the architecture and communication looks like:
Multithreading Queue Management
As the central server is the core management of all crawlers, it provides the webservice to handle all the requests from the crawler depending of the type of job that comes in. Each platform’s job is unique and requires a different time to process with a different format. Queue Management is crucial as jobs need to be allocated with with priority so that jobs do not get stuck in the queue whilst a singular platform is engaged. To monitor the social sites, we have three different type of jobs for each platform at any point in time. The first being a job that allows us to get X number of posts from the site itself, sorted from latest to earliest. The second job allowing us to look for posts on a site that are later than the most current post stored in the database and the last job being those that goes into a post and retrieves all its statistical information to store in our database as a record.
Handling Different RSS Sites
Due to the large amount of different websites required for crawling, the most appropriate method to retrieve those data is through RSS. At the same time, further research shows that there is no standardised XML solution. In other to cater to different RSS formats across different websites, we have in place multiple layers of format cleaning and filtering. This was achieved by numerous testing on our RSS reader and figuring out how we can modify our code to fit in the additional non-parsable site.
This is an illustration of how the Websites Crawler is operationalised:
In the process of XML Cleaning, there are 3 stages:
1) String replacement to standardise tag
2) Remove xml declaration
3) Remove html codes in tags
Reducing number of calls on Graph API through Job Batching
To fall below the limit of Facebook Graph API, we use a Graph API batch calling such that maximum of 50 request are being compiled into a single request and this counts as 1 API call. Therefore, the central server will collate Graph API jobs into 1 job and send to only 1 crawler (Only specialize for Graph API), since all crawler uses the same network. The crawler will then perform a batch call.
Old vs new format of calls:
|Topic of Interest||Link|
|Project Overview||Project Scope|
|Project Management||Project Schedule|
Access detailed versions of UATs here: