IS484 AY2020/21 Term 1: The Miners
Contents
Group Introduction
Team Members
Name | Role | |
---|---|---|
Chen Ziyi | ziyi.chen.2017@sis.smu.edu.sg | Project Manager |
Nurul Khairina Binte Abdul KADIR | khairinak.2017@smu.edu.sg | Deputy Project
Manager |
Chew Hui Ling | hlchew.2017@smu.edu.sg | IT Lead |
Wang Zi | zi.wang.2017@smu.edu.sg | Data Analytics Lead |
Tan Chou Leong | cltan.2016@business.smu.edu.sg | Quality Assurance |
Faculty Supervisor
Name | |
---|---|
Assistant Professor Manoj Thulasidas | manojt@smu.edu.sg |
Dr. Dennis Ng | dennisng@smu.edu.sg |
Assistant Professor Alan Megargel | alanmegargel@smu.edu.sg |
Sponsor and/or Clients
Name | Role | |
---|---|---|
Yuqian, Song | yuqian.song@citi.com | Head of APAC/EMEA Data Services and Head of Global Advanced Analytics Technology Solutions |
Project Overview
Project Description
Project Text Mining and Sentiment Analysis is a business analytics project that aims at delivering a system and an interactive dashboard for Citibank. This is done through conducting sentiment analysis on reviews that are in foreign languages.
Motivation
Currently, Citibank has a pre-trained sentiment analysis model specifically designed for English reviews. However, Citibank would like to extend that to non-English reviews as well Project Text Mining and Sentiment Analysis will help to add value to Citibank by having a sentiment analysis of those non-English reviews. It allows Citibank to identify the products and services customers most frequently complain about. Upon identifying these weaker offerings, Citibank would be able to further gain insights on the specific areas that are of concern to customers. This subsequently allows them to improve upon these products and services.
Stakeholders
Sponsors | Citibank – Yuqian,Song |
Users | Citibank |
Advisors/Practitioners/Mentors | Citibank - Awasthi, Ashish |
Deliverables
Outcome
The newly developed system and UI would allow users to pass in reviews and get the sentiment scores and statistics of the data. The interactive dashboard would generate actionable insights such as customers’ sentiment scores, patterns, as well as trends regarding the quality of Citibanks’ products and services. A database that keeps all the collected reviews for visualization.
Value Statement
Through the visualisation of meaningful statistics, data, and insights presented on the dashboard, clients can:
- Improve on product and service quality to generate more sales
- Improve on customer satisfaction level with customer-centric service
- Build up brand reputation and hence attract more retail customers
Scope, Constraints, Assumptions
Scope
Continuously tracking the patterns in sentiment score in a category will allow the user to self-identify stress points via any spike in negative sentiments.
After getting the information that customers using iPhone are generally dissatisfied with the Face ID in logging in Citibank iBanking mobile application through sentiment analysis of the complaints. Citibank can change face ID from the default login option to a secondary option.
Shows the features for the Sentiment Analysis Application.
Constraints
The insights drawn from this sentiment analysis model depends largely on the strength and integrity of the text input. There is considerable room for error when it comes to irony and sarcasm. Individuals may choose to express their negative sentiments with positive words and phrases. Consequently, the machine is likely to detect these sentiments at face value, without the necessary understanding of the context of the situation in which a feeling was expressed. Defining what neutrality means is also another challenge that needs to be tackled well in order to perform accurate sentiment analysis. Since tagging data requires that tagging criteria be consistent, a good definition of the problem is a must. The labeling of data may also require the input of the client since we may not have sufficient knowledge in this area.
Assumptions
We assume that each review possesses a high degree of sincerity and accurately reflect customer sentiments.
Project Plan
Project Milestone
Risks
Risk Type | Risk Description | Consequence | Level | Mitigation Strategy |
---|---|---|---|---|
Client Management Risk | Mismatch in understanding of requirements/ technical difficulty between the team and client | Progress of the project will be affected. May result in delay in project delivery | Likelihood: M Impact: H Grade: A | Hold bi-weekly sprint updates with the client to update on the project progress and receive feedback on improvements. Ensure that there is an alignment between client and members. |
Project Management Risk | Team may overestimate or underestimate the effort and time needed for tasks | Important milestones and deliverables may not be handed over on time. Project may be delayed | Likelihood: L Impact: H Grade: B | The team will plan early and constantly review and re-evaluate project schedule |
Technical Risk | Unfamiliarity with the libraries used in Python for sentiment analysis and the models used | Progress of the project will be affected. Risk of facing difficulties in processing unstructured data | Likelihood: M Impact: M Grade: B | Perform research and read the documentation. Team will share knowledge learned with the rest of the team. Seek advice from professors. |
Resource and Reference
Since the team is unfamiliar with text mining and sentiment analysis as well as the libraries used (eg. NLTK, TextBlob), we will utilize the various learning resources to understand the fundamental concepts in natural language processing and gain more experience in the technical aspects of the project.
Technology (Programming language, frameworks etc) | Description |
---|---|
Python | Data pre-processing, Machine learning models for sentiment analysis |
Tableau | Development of dashboard |
Visual Code Studio | Development of System and UI |
Type of Resource | Description | Link |
---|---|---|
Web Page Documentation | The Natural Language Toolkit (NLTK) and TextBlob library documentation will serve as a reference for the team throughout the project. It will expose us to the various features available in the library. | NLTK, Textblob |
Slides | The resources used in the IS450 Text Mining & Language Processing class will equip the team with the basics of Natural Language Processing, text pre-processing, text mining algorithms and applications such as sentiment analysis. | |
Online Course (DataCamp) | Microsoft offers a 2 month free subscription to DataCamp. We will use this platform to learn more about sentiment analysis in Python | DataCamp Course |
Online Course (Udemy) | The NLP - Natural Language Processing with Python course by Jose Portilla will equip us with the basics of NLP, utilizing the NLTK library for Python and the Spacy library as well as fundamental NLP concepts such as stemming and tokenization. | Udemy course |
Diagrams
Process Diagram
Architecture Diagram
Tools Used