Difference between revisions of "IS480 Team wiki: 2015T2 MineSweep Final"
(updated to include data comparison pic) |
(included data analytics pic) |
||
Line 130: | Line 130: | ||
===== Data Analysis ===== | ===== Data Analysis ===== | ||
+ | [[File:Mnswp-Data-Analysis.png|600px]] | ||
+ | |||
===== Analysis Comparison ===== | ===== Analysis Comparison ===== | ||
[[File:Mnswp-data-comparison.png|600px]] | [[File:Mnswp-data-comparison.png|600px]] |
Revision as of 00:24, 11 April 2015
Contents
Executive Summary
This project is sponsored by Dentsu Aegis (DA), a multi-national PR and marketing firm. Our client is interested in discovering what consumers are saying online about their customers' products before and after their campaigns or PR strategies.
In this project, we aim to gather as much data as possible from the social media platform Twitter and the popular e-commerce website Qoo10.sg. This data is then used for an initial analysis, to determine the key topics of what users are discussing in their tweets or product reviews. Our application will generate these key topics and allow the data analysts to label them. These labels can be, but are not limited to, the consumer journey as provided by our client as shown below.
After the marketing / PR strategy is complete, the team can return to the application and retrieve data once more. They then compare the difference in key topics from before and after their strategy. They then utilise the data to measure their effectiveness in strategy.
Project Progress
Key Milestones
Acceptance
Mid-Term
Challenges Faced
In the duration of the project, our group faced issues both with changing requirements, as well as technical challenges.
The nature of our project is intensive on 2 aspects:
- Collection of Data
- Analysis of Data
These two key areas are computationally exhaustive and issues arose in several areas.
The application consists of a linear sequence of events where:
- Social media feeds are pulled;
- Feeds go through a series of text-processing and transformation;
- Passed as input to the Latent Dirichlet allocation (LDA) algorithm that groups the feeds into cluster of topics.
Technical Challenges
- Each of these steps is important, computationally intensive and time-consuming.
- Each step is dependent on each other, and must run in series.
- We had to write very efficient code and modify existing open source codes to squeeze the performance out of the above three steps.
Changing Requirements
- Key requirements were discussed at the onset of the project, and the team took the initial confirmed requirements to Acceptance;
- However, our client later decided that our project could be more in-depth, and added additional requirements just prior to mid-terms. These scope change discussions will be explored in later sections.
- These changing requirements proved to have a significant impact on our timeline and development progress.
Key Achievements
- 3 successful user testing sessions
- Usage of application by client for an ongoing project
Project Management
Project Scope
Our project scope changed between our major milestones. These scope changes were both due to technical difficulties that could not be overcome, as well as client requests. We implemented changes accordingly and appropriately.
Acceptance Scope
Post-Acceptance Update:
Due to Facebook having changed its privacy regulations, we have pushed forward the Qoo10 scrape instead and removed the Facebook scrape function.
Mid-Term Scope
Just prior to the mid-term presentation, we received new client requests on the 23rd of February:
New Client Requests:
- 2-Step Process:
- For the first run of data retrieval, the application will carry out topic modelling to establish ground truth. The user will then assign topic names to keywords, and the application would have assigned certain percentage values for each keyword to topics named by the user.
- For subsequent runs, the application will assign keywords to topics based on the ground truth established in the first run above.
- Flexibility of Application:
- Allow users to modify keywords / documents from certain topics.
- Allow users to define k number of topics.
- Allow for scraping of reviews from more Singapore-based e-commerce websites.
- Allow for users to import data retrieved from other sources into application for analysis.
- Usability of Application:
- Allow for users to scroll through multiple tweets, instead of viewing just one most relevant tweet.
- Other Inputs for User Interface:
- Keyword input to consider using AND as well instead of just OR
- Provide a location filter
Final Scope
Project Timeline
Project Metrics
Schedule Metrics
In the course of our project, there were no significant schedule anomalies in the development. Our core issue in the process of carrying out the project was more of managing expectations rather than following the schedule.
Bug Metrics
Project Risks
Project Details
Final Deliverables
Key Features
Data Retrieval
Data Analysis
Analysis Comparison
Additional Features
The additional features that set our project apart from others include:
- At any one time, our database supports 1 to 5 million database records;
- 3 separate modular software components that work seamlessly together (virtually seeming as one single web app);
- Usage of a large open-source LDA algorithm that had to be modified to allow for performance and provide the "Comparison Analysis" feature;
- Usage of 1 db server, 1 background processing server and 1 web application server;
- Written using Java and Linux shell script.
Technology Used
Technology Used |
Usage |
Risks |
Mitigation |
Front End
|
For implementing an intuitive user interface. |
|
|
Back End
|
|
|
|
Data Analysis
|
|
|
|
Technical Complexity
Configuring Mallet
A totally independent software module was written to serve as a wrapper on top of the open-source software Mallet, that implements the LDA algorithm. Originally, the Mallet software writes the results to multiple text files on the server hardrive.
However, this was a slow and space consuming process. As such, the Mallet software was modified to save all results directly to the MySQL database. This direct integration to the MySQL database increased the speed execution of the mallet software and also relieved the server of storage space.
Separate Services
Also, these three steps had to be implemented as three separate software modules to allow better performance and code maintenance. Therefore, 3 separate software modules were implemented in this project.
- Twitter and Qoo10 pulling web service
- LDA (i.e. Topic Model) web service
- Web application that provides project management, web based interface to the afforementioned web services, and data visualization for the results from topic modelling.
Separate Servers
For this project, our team implemented 2 servers:
- The first server is dedicated to data collection and analysis.
- This is due to the assumption that data pulling will occur regularly, in parallel, and for extended periods of time.
- This is because Dentsu Aegis runs multiple projects at the same time.
- Each project is either in the "pulling feeds" phase or "analysing data" phase.
- The second server is dedicated to the web application.
User Experience
User Testing
User Testing |
Purpose |
Number of Participants |
User Testing 1 - 28/01/2015 |
|
10 DA Data Analysts |
User Testing 2 - 23/02/2015 |
|
20 DA Data Analysts and SMU undergraduates |
User Testing 3 - 10/04/2015 |
|
10 DA Data Analysts |
Deployment
Final Reflections
Team Reflections
- Rachel Yap
As project manager for MineSweep, I learned how to interact with our client as well as manage their expectations. Initially when they requested for scope change, I was unable to properly balance the interests of the client with the interests of the team. I learned how to really exercise mitigation plans when risks become reality. This became increasingly pronounced as when the project progressed, the scope change requested by the client did not cease. Toward the end of the project, I learned how to scope to best give benefits for both the team and the client.
- In Jin Zaw
Initially I was not very sure what kind of business value this project could bring to the users. In addition, at the beginning of the project, there was a steep learning curve with regards to the algorithms, and we slowly but surely overcame this challenging process. During the whole year, there were many useful things that I learnt, both life skills and academically. With respect to life skills, i learned how to balance meeting the client's expectations and ensuring that the team's capability to match up with the expectations. Being the only FYP group that deals with social media analytics, this project has really enriched my learning journey in university as an analytics major. Towards the end of this project, after going through many insightful discussions with the support from the team, profs and client, and getting feedback from testers, I am very sure that this project we have took can be very valuable to whoever uses it. Gaining this confidence have been very uplifting and definitely is a source of motivation to create an even better application.
- Jedaiah Tan
This project taught me tenacity as I had to familiarise myself with the LDA algorithm, and be able to explain it to every and anyone who asked. As an analytics student, it really gave me direction in what I might want to pursue in the future.
- Kim Evangelista
This project gave me a crash course on consumer behavior and analytics. I learned a lot about data acquisition from utilizing the APIs provided by social media entities. Furthermore, the project was technically challenging given its nature of intensive data collection and analytics. Overall, it was a rewarding experience working on this project together with the talented team members of Minesweep.
- Mohamed Safiullah
I have learnt to evaluate different user interface designing principles such as flat and material design through self-exploration. Also, I have learnt about the different visualization techniques to present insightful information on social media feeds. Furthermore, due to the technical complexity of our project it was difficult to integrate the backend business logic with the user interface. However, I've learnt that by exposing the backend business logic as a web service, it is easy to integrate the user interface with it.
Client Feedback
It helps us understand how social media works, and it gives us some direction of how we’re going to improve. Nobody in the market has tried to create this type of product as of yet.