IS480 Team wiki: 2015T2 MineSweep Final

From IS480
Jump to navigation Jump to search


Team MineSweep

Project Overview

Project Management

Project Documentation

Executive Summary

This project is sponsored by Dentsu Aegis (DA), a multi-national PR and marketing firm. Our client is interested in discovering what consumers are saying online about their customers' products before and after their campaigns or PR strategies.

In this project, we aim to gather as much data as possible from the social media platform Twitter and the popular e-commerce website Qoo10.sg. This data is then used for an initial analysis, to determine the key topics of what users are discussing in their tweets or product reviews. Our application will generate these key topics and allow the data analysts to label them. These labels can be, but are not limited to, the consumer journey as provided by our client as shown below.


After the marketing / PR strategy is complete, the team can return to the application and retrieve data once more. They then compare the difference in key topics from before and after their strategy. They then utilise the data to measure their effectiveness in strategy.

Project Progress

Key Milestones



Challenges Faced

In the duration of the project, our group faced issues both with changing requirements, as well as technical challenges.

The nature of our project is intensive on 2 aspects:

  • Collection of Data
  • Analysis of Data

These two key areas are computationally exhaustive and issues arose in several areas.


The application consists of a linear sequence of events where:

  1. Social media feeds are pulled;
  2. Feeds go through a series of text-processing and transformation;
  3. Passed as input to the Latent Dirichlet allocation (LDA) algorithm that groups the feeds into cluster of topics.

Technical Challenges

  • Each of these steps is important, computationally intensive and time-consuming.
  • Each step is dependent on each other, and must run in series.
  • We had to write very efficient code and modify existing open source codes to squeeze the performance out of the above three steps.

Changing Requirements

  • Key requirements were discussed at the onset of the project, and the team took the initial confirmed requirements to Acceptance;
  • However, our client later decided that our project could be more in-depth, and added additional requirements just prior to mid-terms. These scope change discussions will be explored in later sections.
  • These changing requirements proved to have a significant impact on our timeline and development progress.

Key Achievements

  • 3 successful user testing sessions
  • Usage of application by client for an ongoing project

Project Management

Project Scope

Our project scope changed between our major milestones. These scope changes were both due to technical difficulties that could not be overcome, as well as client requests. We implemented changes accordingly and appropriately.

Acceptance Scope

Post-Acceptance Update:
Due to Facebook having changed its privacy regulations, we have pushed forward the Qoo10 scrape instead and removed the Facebook scrape function.

Mid-Term Scope


Just prior to the mid-term presentation, we received new client requests on the 23rd of February:

New Client Requests:

  • 2-Step Process:
  1. For the first run of data retrieval, the application will carry out topic modelling to establish ground truth. The user will then assign topic names to keywords, and the application would have assigned certain percentage values for each keyword to topics named by the user.
  2. For subsequent runs, the application will assign keywords to topics based on the ground truth established in the first run above.

  • Flexibility of Application:
  1. Allow users to modify keywords / documents from certain topics.
  2. Allow users to define k number of topics.
  3. Allow for scraping of reviews from more Singapore-based e-commerce websites.
  4. Allow for users to import data retrieved from other sources into application for analysis.

  • Usability of Application:
  1. Allow for users to scroll through multiple tweets, instead of viewing just one most relevant tweet.

  • Other Inputs for User Interface:
  1. Keyword input to consider using AND as well instead of just OR
  2. Provide a location filter

Final Scope


Project Timeline


Project Metrics

Schedule Metrics

Mnswp-Schedule Metrics.png

In the course of our project, there were no significant schedule anomalies in the development. Our core issue in the process of carrying out the project was more of managing expectations rather than following the schedule.

Bug Metrics

Mnswp-Bug Metrics.png

Project Risks



Changing requirements from client.

Meet more frequently with client, to lower risk of misunderstanding and for them to understand why we take up certain functions and are unable to do others.

Insufficient data is pulled from social media sites during a certain period.

Implementation of an option to upload data that was pulled previously and stored, or upload data that is purchased.

Team member is unable to attend meetings / carry out work due to personal reasons.

Members of the team are to be constantly updated of each others' progress, so as to be able to take over in the event that a certain member is unable to come.

Project Details

Final Deliverables

Key Features

Data Retrieval


Data Analysis


Analysis Comparison


Additional Features

The additional features that set our project apart from others include:

  • At any one time, our database supports 1 to 5 million database records;
  • 3 separate modular software components that work seamlessly together (virtually seeming as one single web app);
  • Usage of a large open-source LDA algorithm that had to be modified to allow for performance and provide the "Comparison Analysis" feature;
  • Usage of 1 db server, 1 background processing server and 1 web application server;
  • Written using Java and Linux shell script.

Technology Used

Technology Used




Front End

  • JavaScript/jQuery
  • HTML
  • CSS3
  • BootstrapUI Framework
  • d3.js Chart Framework

For implementing an intuitive user interface.

  • Best libraries but, may be limited.
  • Customisation is difficult due to libraries written with various programming conventions and different programming concept.
  • Contribute to the open source codes and convince the original author to accept the changes.
  • Research on sites like stackoverflow and learn from the tech/development community.

Back End

  • Language(s): Java
  • Libraries: Twitter4J
  • Database: MySQL
  • Host: Amazon Web Services
  • Software: NetBeans IDE, Sublime Text 2
  • Backend functions support the front-end uses
  • Scraping of irrelevant data
  • Storage space on host

Data Analysis

  • JMallet
  • Used for clustering and topic modeling

Clustering & Topic Modelling are data analytic tools typically written in R. However, Mallet is written in Java which reduces the learning curve

Tutorial materials are comprehensively documented at http://mallet.cs.umass.edu/topics-devel.php

Google Topic Modelling Toolkit makes use of the Mallet API as its base source code, lending credence to Mallet as a viable tool.

  • Relative to R, there is a lack of an active online support community. Should we run into issues in development, we might not be able to get timely support.
  • Consider other packages in R with similar functionalities, in the event that a particular issue is impossible to resolve.

Technical Complexity

Configuring Mallet

A totally independent software module was written to serve as a wrapper on top of the open-source software Mallet, that implements the LDA algorithm. Originally, the Mallet software writes the results to multiple text files on the server hardrive.

However, this was a slow and space consuming process. As such, the Mallet software was modified to save all results directly to the MySQL database. This direct integration to the MySQL database increased the speed execution of the mallet software and also relieved the server of storage space.

Separate Services

Also, these three steps had to be implemented as three separate software modules to allow better performance and code maintenance. Therefore, 3 separate software modules were implemented in this project.

  1. Twitter and Qoo10 pulling web service
  2. LDA (i.e. Topic Model) web service
  3. Web application that provides project management, web based interface to the afforementioned web services, and data visualization for the results from topic modelling.

Separate Servers

Mnswp-Archi Structure.png

For this project, our team implemented 2 servers:

  • The first server is dedicated to data collection and analysis.
    • This is due to the assumption that data pulling will occur regularly, in parallel, and for extended periods of time.
    • This is because Dentsu Aegis runs multiple projects at the same time.
    • Each project is either in the "pulling feeds" phase or "analysing data" phase.
  • The second server is dedicated to the web application.

User Experience

User Testing

User Testing


Number of Participants

User Testing 1 - 28/01/2015

  • Gather information to improve user experience.

10 DA Data Analysts

User Testing 2 - 23/02/2015

  • Gather information to improve user experience
  • Test load capacity of application
  • Test new functions

20 DA Data Analysts and SMU undergraduates

User Testing 3 - 10/04/2015

  • Allow users to interact with final application
  • Gather feedback

10 DA Data Analysts


Deployed Application

Final Reflections

Team Reflections

  • Rachel Yap

As project manager for MineSweep, I learned how to interact with our client as well as manage their expectations. Initially when they requested for scope change, I was unable to properly balance the interests of the client with the interests of the team. I learned how to really exercise mitigation plans when risks become reality. This became increasingly pronounced as when the project progressed, the scope change requested by the client did not cease. Toward the end of the project, I learned how to scope to best give benefits for both the team and the client.

  • In Jin Zaw

Initially I was not very sure what kind of business value this project could bring to the users. In addition, at the beginning of the project, there was a steep learning curve with regards to the algorithms, and we slowly but surely overcame this challenging process. During the whole year, there were many useful things that I learnt, both life skills and academically. With respect to life skills, i learned how to balance meeting the client's expectations and ensuring that the team's capability to match up with the expectations. Being the only FYP group that deals with social media analytics, this project has really enriched my learning journey in university as an analytics major. Towards the end of this project, after going through many insightful discussions with the support from the team, profs and client, and getting feedback from testers, I am very sure that this project we have took can be very valuable to whoever uses it. Gaining this confidence have been very uplifting and definitely is a source of motivation to create an even better application.

  • Jedaiah Tan

This project taught me tenacity as I had to familiarise myself with the LDA algorithm, and be able to explain it to every and anyone who asked. As an analytics student, it really gave me direction in what I might want to pursue in the future.

  • Kim Evangelista

This project gave me a crash course on consumer behavior and analytics. I learned a lot about data acquisition from utilizing the APIs provided by social media entities. Furthermore, the project was technically challenging given its nature of intensive data collection and analytics. Overall, it was a rewarding experience working on this project together with the talented team members of Minesweep.

  • Mohamed Safiullah

I have learnt to evaluate different user interface designing principles such as flat and material design through self-exploration. Also, I have learnt about the different visualization techniques to present insightful information on social media feeds. Furthermore, due to the technical complexity of our project it was difficult to integrate the backend business logic with the user interface. However, I've learnt that by exposing the backend business logic as a web service, it is easy to integrate the user interface with it.

Client Feedback

It helps us understand how social media works, and it gives us some direction of how we’re going to improve. Nobody in the market has tried to create this type of product as of yet.