HeaderSIS.jpg

IS480 Team wiki: 2015T2 MineSweep Final

From IS480
Revision as of 00:05, 11 April 2015 by Rachel.yap.2012 (talk | contribs)
Jump to navigation Jump to search
Logo-2.png

Home

Team MineSweep

Project Overview

Project Management

Project Documentation


Executive Summary

This project is sponsored by Dentsu Aegis (DA), a multi-national PR and marketing firm. Our client is interested in discovering what consumers are saying online about their customers' products before and after their campaigns or PR strategies.

In this project, we aim to gather as much data as possible from the social media platform Twitter and the popular e-commerce website Qoo10.sg. This data is then used for an initial analysis, to determine the key topics of what users are discussing in their tweets or product reviews. Our application will generate these key topics and allow the data analysts to label them. These labels can be, but are not limited to, the consumer journey as provided by our client as shown below.

Minesweep-Consumer-journey.png



After the marketing / PR strategy is complete, the team can return to the application and retrieve data once more. They then compare the difference in key topics from before and after their strategy. They then utilise the data to measure their effectiveness in strategy.

Project Progress

Key Milestones

Acceptance

Mid-Term

Challenges Faced

In the duration of the project, our group faced issues both with changing requirements, as well as technical challenges.

The nature of our project is intensive on 2 aspects:

  • Collection of Data
  • Analysis of Data


These two key areas are computationally exhaustive and issues arose in several areas.

3-step-tech-difficulty.png


The application consists of a linear sequence of events where:

  1. Social media feeds are pulled;
  2. Feeds go through a series of text-processing and transformation;
  3. Passed as input to the Latent Dirichlet allocation (LDA) algorithm that groups the feeds into cluster of topics.

Technical Challenges

  • Each of these steps is important, computationally intensive and time-consuming.
  • Each step is dependent on each other, and must run in series.
  • We had to write very efficient code and modify existing open source codes to squeeze the performance out of the above three steps.

Changing Requirements

  • Key requirements were discussed at the onset of the project, and the team took the initial confirmed requirements to Acceptance;
  • However, our client later decided that our project could be more in-depth, and added additional requirements just prior to mid-terms. These scope change discussions will be explored in later sections.
  • These changing requirements proved to have a significant impact on our timeline and development progress.

Key Achievements

  • 3 successful user testing sessions
  • Usage of application by client for an ongoing project

Project Management

Project Scope

Our project scope changed between our major milestones. These scope changes were both due to technical difficulties that could not be overcome, as well as client requests. We implemented changes accordingly and appropriately.

Acceptance Scope

Mswp-final-accpt-scope.png
Post-Acceptance Update:
Due to Facebook having changed its privacy regulations, we have pushed forward the Qoo10 scrape instead and removed the Facebook scrape function.

Mid-Term Scope

Mnswp-final-midtermscope.png

Just prior to the mid-term presentation, we received new client requests on the 23rd of February:

New Client Requests:

  • 2-Step Process:
  1. For the first run of data retrieval, the application will carry out topic modelling to establish ground truth. The user will then assign topic names to keywords, and the application would have assigned certain percentage values for each keyword to topics named by the user.
  2. For subsequent runs, the application will assign keywords to topics based on the ground truth established in the first run above.


  • Flexibility of Application:
  1. Allow users to modify keywords / documents from certain topics.
  2. Allow users to define k number of topics.
  3. Allow for scraping of reviews from more Singapore-based e-commerce websites.
  4. Allow for users to import data retrieved from other sources into application for analysis.


  • Usability of Application:
  1. Allow for users to scroll through multiple tweets, instead of viewing just one most relevant tweet.


  • Other Inputs for User Interface:
  1. Keyword input to consider using AND as well instead of just OR
  2. Provide a location filter

Final Scope

Mnswp-final-finalscope.png

Project Timeline

Mnswp-finaltimeline.png

Project Metrics

Schedule Metrics

Mnswp-Schedule Metrics.png

In the course of our project, there were no significant schedule anomalies in the development. Our core issue in the process of carrying out the project was more of managing expectations rather than following the schedule.

Bug Metrics

Mnswp-Bug Metrics.png

Project Risks

Project Details

Final Deliverables

Key Features

Data Retrieval
Data Analysis
Analysis Comparison

Additional Features

The additional features that set our project apart from others include:

  • At any one time, our database supports 1 to 5 million database records;
  • 3 separate modular software components that work seamlessly together (virtually seeming as one single web app);
  • Usage of a large open-source LDA algorithm that had to be modified to allow for performance and provide the "Comparison Analysis" feature;
  • Usage of 1 db server, 1 background processing server and 1 web application server;
  • Written using Java and Linux shell script.

Technology Used

Technology Used

Usage

Risks

Mitigation

Front End

  • JavaScript/jQuery
  • HTML
  • CSS3
  • BootstrapUI Framework
  • d3.js Chart Framework

For implementing an intuitive user interface.

  • Best libraries but, may be limited.
  • Customisation is difficult due to libraries written with various programming conventions and different programming concept.
  • Contribute to the open source codes and convince the original author to accept the changes.
  • Research on sites like stackoverflow and learn from the tech/development community.

Back End

  • Language(s): Java
  • Libraries: Twitter4J
  • Database: MySQL
  • Host: Amazon Web Services
  • Software: NetBeans IDE, Sublime Text 2
  • Backend functions support the front-end uses
  • Scraping of irrelevant data
  • Storage space on host

Data Analysis

  • JMallet
  • Used for clustering and topic modeling

Clustering & Topic Modelling are data analytic tools typically written in R. However, Mallet is written in Java which reduces the learning curve

Tutorial materials are comprehensively documented at http://mallet.cs.umass.edu/topics-devel.php

Google Topic Modelling Toolkit makes use of the Mallet API as its base source code, lending credence to Mallet as a viable tool.

  • Relative to R, there is a lack of an active online support community. Should we run into issues in development, we might not be able to get timely support.
  • Consider other packages in R with similar functionalities, in the event that a particular issue is impossible to resolve.

Technical Complexity

Configuring Mallet

A totally independent software module was written to serve as a wrapper on top of the open-source software Mallet, that implements the LDA algorithm. Originally, the Mallet software writes the results to multiple text files on the server hardrive.

However, this was a slow and space consuming process. As such, the Mallet software was modified to save all results directly to the MySQL database. This direct integration to the MySQL database increased the speed execution of the mallet software and also relieved the server of storage space.

Separate Services

Also, these three steps had to be implemented as three separate software modules to allow better performance and code maintenance. Therefore, 3 separate software modules were implemented in this project.

  1. Twitter and Qoo10 pulling web service
  2. LDA (i.e. Topic Model) web service
  3. Web application that provides project management, web based interface to the afforementioned web services, and data visualization for the results from topic modelling.

Separate Servers

Mnswp-Archi Structure.png

For this project, our team implemented 2 servers:

  • The first server is dedicated to data collection and analysis.
    • This is due to the assumption that data pulling will occur regularly, in parallel, and for extended periods of time.
    • This is because Dentsu Aegis runs multiple projects at the same time.
    • Each project is either in the "pulling feeds" phase or "analysing data" phase.
  • The second server is dedicated to the web application.

User Experience

User Testing

User Testing

Purpose

Number of Participants

Outcome

User Testing 1 - 28/01/2015

  • Gather information to improve user experience.

10 DA Data Analysts

Outcome 1

User Testing 2 - 23/02/2015

  • Gather information to improve user experience
  • Test load capacity of application
  • Test new functions

20 DA Data Analysts and SMU undergraduates

Outcome 1

User Testing 3 - 10/04/2015

  • Allow users to interact with final application
  • Gather feedback

Number X

Outcome 1

Deployment

Final Reflections

Team Reflections

  • Rachel Yap

As project manager for MineSweep, I learned how to interact with our client as well as manage their expectations. Initially when they requested for scope change, I was unable to properly balance the interests of the client with the interests of the team. I learned how to really exercise mitigation plans when risks become reality. This became increasingly pronounced as when the project progressed, the scope change requested by the client did not cease. Toward the end of the project, I learned how to scope to best give benefits for both the team and the client.

  • In Jin Zaw
  • Jedaiah Tan
  • Kim Evangelista
  • Mohamed Safiullah

Client Feedback

It helps us understand how social media works, and it gives us some direction of how we’re going to improve. Nobody in the market has tried to create this type of product as of yet.

Resources