IS480 Team wiki: 2017T1 ELIXIR FinalWiki

From IS480
Jump to navigation Jump to search











Project Progress Summary

Elixir finalsprojectsumamry.jpg

Elixirfinalsslideicon.png Elixirfinalspostericon.png

Project Athena is an intranet solution. Demo will be performed by remote access into TSL.

Project Highlights

  • User Acceptance Test 3 was completed on 2 November
  • Implemented Voting module (Up vote/Down vote for each post) in Sprint 11 (2 Oct - 15 Oct)
  • Implemented Client Reporting module for TSL Facebook, Instagram and YouTube page in Sprint 12 (16 Oct - 29 Oct)
  • Implemented Telegram Bot to send daily reports containing new posts and Top 10 Viral posts across platforms and in Sprint 12-13 (16 Oct - 12 Nov)
  • Integrated Telegram Bot to automatically send daily reports to 90 employees at the end of Sprint 13
  • Integrated Google Analytics to track user activities at the end of Sprint 13

Project Management

Project Status

Elixir finalsheader.jpg

Elixir finalsprogress.jpg

Project Schedule

Elixir finalizedschedule.png

Project Schedule has no changes since Mid Terms.

Project Metrics

Bug Management

Elixir bug.png

Project Management

Elixir finalspm.png

Burndown Charts (Mid Terms - Finals)

  • Y-axis: Number of story points remaining
Elixir finals3sprints.png

Sprint 12 has a larger deviance between Target and Ideal Burndown because the team was implementing a module with a relatively larger scope - Client Reporting module. The module involves the integration of retrieving post and video statistics from TSL pages (across Facebook, Instagram and YouTube), which took the team a longer time to implement than planned. However, the team still managed to deliver client reporting on time by the end of Sprint 12.

Overview of all Burndown Charts

Sprint 3 - 7: Before Acceptance
Sprint 8 - 10: Before MidTerms
Sprint 10 - 13: Before Finals

Elixir finalsscrum.png

Change Management

We had no major change(s) after Mid Terms.

Changes After Mid Terms

Sprint No. Date Category Change Request Issuer Description Decision Action Taken Status
13 2 Nov User Interface For client reporting module, retrieve data and group them under to Reach, Engagement and Views. Sponsor This is important as we have consider the right parameters when retrieving information from TSL pages. Accept Implementation to take place in Sprint 12. Closed
13 2 Nov User Experience At the client reporting page, highlight to user if the URL entered is correct or wrong, according to the chosen platform Sponsor This is for users to realise if their input are possibly valid or invalid (red: wrong, yellow: warning, green: correct) Accept Implementation to take place in Sprint 13 and assigned to Jia Kai. Closed

Significant Changes in Project Athena

Sprint No. Date Category Change Request Issuer Description Decision Action Taken Status
7 19 August Functionality Crawler Module (Advanced): Enhancement of Instagram & Website Crawler to enable pagination crawl Jia Kai This enhancement will enable a larger number of posts to be crawled at one time Accept Updated project schedule to take in this change. Features to be implemented in Sprint 8. Closed
7 28 August Functionality Crawler Module: Facebook crawler will be dropped from project scope Sponsor Unable to get Facebook Access Token for authorised use of Graph API for Facebook crawling Accept Updated project scope take in this change. Informed Supervisor about this change. Closed
8 28 August Functionality Crawler Module (Advanced): Implement batch call for RSS websites Jia Kai This is to reduce number of calls on Graph API Accept Updated project schedule to take in this change. Task will take place in Sprint 9. Closed

Project Risks

Top 3 Risks

S/N Risk Type Risk Event Likelihood Impact Level Mitigation
1 Resource Risk API Failure Low High B Check for API updates across all platforms and legacy issues regularly
2 Technical Risk Overloading of crawler when there are too many sites which crawler is unable to support. Medium Medium B Implement multithreading and a effective queue management where server prioritises sites to crawl
3 Technical Risk Benchmarking being set too high which will lead to viral post being missed out on Medium High A Threshold is set to be lower initially. (10%) so that we do not miss out on viral post. Subsequent adjustments will be made to the threshold.

Technical Complexity

In our Mid Terms wiki, we introduced technical complexity regarding the Instagram API, YouTube API, Websites RSS Standardisation, Web Site Job Batching and using SQL statements for calculating statistics. For Finals wiki, the technical complexity for Telegram Bot and Client Reporting will be highlighted.

Telegram Bot

Requirement: Send a Daily Posts Summary Report on the top 10 viral posts and new posts for each platform in the last 24 hours (i.e. Telegram Report) on Telegram in PDF at 9.00 AM.
Problem Statement: While we want to create user friendly reports, we were faced with unfamiliar APIs (Telegram Bot API, iTextPDF, J2HTML).

We decided that we want to include tables, headings, content page, and links so that reader can read each posts more easily, know which part of the report they are looking at, and able to visit the post with a click on the title on the PDF. There is only 1 mature Java library that is used to generate PDF, i.e. iTextPDF. Due to our unfamiliarity in the library, it would be very time consuming to even change the alignment of texts let alone adding tables and positioning the tables. Hence, we decided to format out report using HTML before converting the HTML code into a PDF.


It was tedious, error-prone, and time consuming to use Strings to write HTML code as we need to consider the syntax. We decided to use a mature Java library, J2HTML, to write HTML file so that we rely on the library for the syntax and focus solely on the content formatting.


We needed to include a content page in the Telegram Report as most of the time the number of daily new posts can go up to hundreds. Having a content page allows reader to click on the section and goes to the selected page.


To generate this content page, we need to generate the sections as reports before merging all into one final report. The relevant data (viral posts and new posts in the last 24 hours) is retrieved from the database before formatting these information on HTML for each of the sections.


The technical challenges of our unfamiliarity with iTextPDF makes this implementation complex as we need to use the specific method to get the number of pages of the sub-sections and to implement the function to link a click on the sections on the content page to the specified page number.



Another implementation we must make is to send the report via a Telegram Bot. Telegram requires us to follow their set of Telegram HTTP APIs to do so. We coded our own class object that make use of the HTTP APIs to send the report via a POST request.


Client Reporting

Requirement: Based on the different media campaigns, gather the required statistics demonstrating the overall media reach and engagement. These statistics are required to be formatted into a report for user to download as a PDF file (most used document format). This file could then be easily shared with TSL clients, showing the result of the client campaigns.
Problem Statement: Retrieving statistics of posts and videos from TSL Facebook Page using the Facebook Graph API was new to the team. For user experience, a dynamic form field was required as a client can have multiple campaigns, each advertised on a different platform from another.

The complexity faced is highly related to the user experience with the input forms and the PDF report. We wanted to provide dynamic fields allowing users to add and remove fields as needed. We used a looping method to create multiple fields with different item id. However, problems arise when users try to delete fields in the middle of the list after adding. In this case, it is easy to replace all the ids of all the field items on the screen. The team finds this methodology to be inefficient as a lot of operations will be triggered every time a user deletes a field in the middle. After much research, we figured that the best solution is to use a field array, which allows provides easy manipulation of the fields in the array. With this, we are able to achieve an input form which minimises chances of entering wrong fields to the wrong platforms and also allow users the flexibility to use shortcuts.

Dynamic field arrays:


Secondly, despite the assistance of the API documentation and Graph API Explorer, the usage of Facebook Graph API and its token came as a challenge. This was due to the many fields being needed for the statistics, which required tweaking and familiarisation with the query language of Facebook Graph API. For example, the usage of limit(0).summary(true) to get the compiled total value or the wrong usage of insights object.


Example of improper usage of 'Insights':


Lastly, the technical challenge we had to overcome was the unfamiliarity with iTextPDF. After explorations, we figured to use the specific method to get the number of pages of the sub-sections and to implement the function to link a click on the sections on the content page to the specified page number.

Quality of product

Master Slave Architecture for Scalability

To allow our server to handle the increase in sites that our client will need in the future as more sites should be monitored, we had to ensure that it should be scalable. This is to avoid the singular machine to overwork or require more resources than necessary to get high end computers to be able to achieve the outcome.
Therefore, Project Athena is designed with a Master Slave Architecture. The server will provide web services to the crawlers which are essentially the codes that is able to monitor any platform and type of jobs that comes its way. The crawlers do not take much memory and is small scaled. A singular computer can run many instances of the crawler depending on the capabilities of the hardware on that computer. The server will not do the automated crawling and will focus its role as the allocator, hence the ‘Master’. Below is an abstract of how the architecture and communication looks like:


Multithreading Queue Management

As the central server is the core management of all crawlers, it provides the webservice to handle all the requests from the crawler depending of the type of job that comes in. Each platform’s job is unique and requires a different time to process with a different format. Queue Management is crucial as jobs need to be allocated with with priority so that jobs do not get stuck in the queue whilst a singular platform is engaged. To monitor the social sites, we have three different type of jobs for each platform at any point in time. The first being a job that allows us to get X number of posts from the site itself, sorted from latest to earliest. The second job allowing us to look for posts on a site that are later than the most current post stored in the database and the last job being those that goes into a post and retrieves all its statistical information to store in our database as a record.

Handling Different RSS Sites

Due to the large amount of different websites required for crawling, the most appropriate method to retrieve those data is through RSS. At the same time, further research shows that there is no standardised XML solution. In other to cater to different RSS formats across different websites, we have in place multiple layers of format cleaning and filtering. This was achieved by numerous testing on our RSS reader and figuring out how we can modify our code to fit in the additional non-parsable site.

This is an illustration of how the Websites Crawler is operationalised:


In the process of XML Cleaning, there are 3 stages:
1) String replacement to standardise tag
2) Remove xml declaration
3) Remove html codes in tags

Reducing number of calls on Graph API through Job Batching

To fall below the limit of Facebook Graph API, we use a Graph API batch calling such that maximum of 50 request are being compiled into a single request and this counts as 1 API call. Therefore, the central server will collate Graph API jobs into 1 job and send to only 1 crawler (Only specialize for Graph API), since all crawler uses the same network. The crawler will then perform a batch call.

Old vs new format of calls:


Intermediate Deliverables

Topic of Interest Link
Project Overview Project Scope
X factor
Project Management Project Schedule
Sprint Backlog
Risk Management
Change Management
Documentation Diagrams

User Testing


Elixir finalsUATdescription.png

Access detailed versions of UATs here:

Elixir ut1icon.png       Elixir ut2icon.png       Elixir uat3icon.png      


Elixir finalsreflection.png