IS484 AY2020/21 Term 1: The Miners

From IS Project Experience
Jump to navigation Jump to search

Group Introduction

Team Members

Name Email Role
Chen Ziyi ziyi.chen.2017@sis.smu.edu.sg Project Manager
Nurul Khairina Binte Abdul KADIR khairinak.2017@smu.edu.sg Deputy Project

Manager

Chew Hui Ling hlchew.2017@smu.edu.sg IT Lead
Wang Zi zi.wang.2017@smu.edu.sg Data Analytics Lead
Tan Chou Leong cltan.2016@business.smu.edu.sg Quality Assurance


Faculty Supervisor

Name Email
Assistant Professor Manoj Thulasidas manojt@smu.edu.sg
Dr. Dennis Ng dennisng@smu.edu.sg
Assistant Professor Alan Megargel alanmegargel@smu.edu.sg


Name Email Role
Yuqian, Song yuqian.song@citi.com Head of APAC/EMEA Data Services and Head of Global Advanced Analytics Technology Solutions


Project Overview

Project Description

Project Text Mining and Sentiment Analysis is a business analytics project that aims at delivering a system and an interactive dashboard for Citibank. This is done through conducting sentiment analysis on reviews that are in foreign languages.


Motivation

Currently, Citibank has a pre-trained sentiment analysis model specifically designed for English reviews. However, Citibank would like to extend that to non-English reviews as well Project Text Mining and Sentiment Analysis will help to add value to Citibank by having a sentiment analysis of those non-English reviews. It allows Citibank to identify the products and services customers most frequently complain about. Upon identifying these weaker offerings, Citibank would be able to further gain insights on the specific areas that are of concern to customers. This subsequently allows them to improve upon these products and services.


Stakeholders

Sponsors Citibank – Yuqian,Song
Users Citibank
Advisors/Practitioners/Mentors Citibank - Awasthi, Ashish


Deliverables

Outcome

The newly developed system and UI would allow users to pass in reviews and get the sentiment scores and statistics of the data. The interactive dashboard would generate actionable insights such as customers’ sentiment scores, patterns, as well as trends regarding the quality of Citibanks’ products and services. A database that keeps all the collected reviews for visualization.


Value Statement

Through the visualisation of meaningful statistics, data, and insights presented on the dashboard, clients can:

  • Improve on product and service quality to generate more sales
  • Improve on customer satisfaction level with customer-centric service
  • Build up brand reputation and hence attract more retail customers


Scope, Constraints, Assumptions

Scope

Project 1st Scope

Continuously tracking the patterns in sentiment score in a category will allow the user to self-identify stress points via any spike in negative sentiments.


Project 2nd Scope

After getting the information that customers using iPhone are generally dissatisfied with the Face ID in logging in Citibank iBanking mobile application through sentiment analysis of the complaints. Citibank can change face ID from the default login option to a secondary option.


Project 3rd Scope

Shows the features for the Sentiment Analysis Application.


Constraints

The insights drawn from this sentiment analysis model depends largely on the strength and integrity of the text input. There is considerable room for error when it comes to irony and sarcasm. Individuals may choose to express their negative sentiments with positive words and phrases. Consequently, the machine is likely to detect these sentiments at face value, without the necessary understanding of the context of the situation in which a feeling was expressed. Defining what neutrality means is also another challenge that needs to be tackled well in order to perform accurate sentiment analysis. Since tagging data requires that tagging criteria be consistent, a good definition of the problem is a must. The labeling of data may also require the input of the client since we may not have sufficient knowledge in this area.


Assumptions

We assume that each review possesses a high degree of sincerity and accurately reflect customer sentiments.


Project Plan

Project Milestone

Project Milestone


Risks

Risk Matrix Table




Risk Type Risk Description Consequence Level Mitigation Strategy
Client Management Risk Mismatch in understanding of requirements/ technical difficulty between the team and client Progress of the project will be affected. May result in delay in project delivery Likelihood: M Impact: H Grade: A Hold bi-weekly sprint updates with the client to update on the project progress and receive feedback on improvements. Ensure that there is an alignment between client and members.
Project Management Risk Team may overestimate or underestimate the effort and time needed for tasks Important milestones and deliverables may not be handed over on time. Project may be delayed Likelihood: L Impact: H Grade: B The team will plan early and constantly review and re-evaluate project schedule
Technical Risk Unfamiliarity with the libraries used in Python for sentiment analysis and the models used Progress of the project will be affected. Risk of facing difficulties in processing unstructured data Likelihood: M Impact: M Grade: B Perform research and read the documentation. Team will share knowledge learned with the rest of the team. Seek advice from professors.


Resource and Reference

Since the team is unfamiliar with text mining and sentiment analysis as well as the libraries used (eg. NLTK, TextBlob), we will utilize the various learning resources to understand the fundamental concepts in natural language processing and gain more experience in the technical aspects of the project.


Technology (Programming language, frameworks etc) Description
Python Data pre-processing, Machine learning models for sentiment analysis
Tableau Development of dashboard
Visual Code Studio Development of System and UI


Type of Resource Description Link
Web Page Documentation The Natural Language Toolkit (NLTK) and TextBlob library documentation will serve as a reference for the team throughout the project. It will expose us to the various features available in the library. NLTK, Textblob
Slides The resources used in the IS450 Text Mining & Language Processing class will equip the team with the basics of Natural Language Processing, text pre-processing, text mining algorithms and applications such as sentiment analysis.
Online Course (DataCamp) Microsoft offers a 2 month free subscription to DataCamp. We will use this platform to learn more about sentiment analysis in Python DataCamp Course
Online Course (Udemy) The NLP - Natural Language Processing with Python course by Jose Portilla will equip us with the basics of NLP, utilizing the NLTK library for Python and the Spacy library as well as fundamental NLP concepts such as stemming and tokenization. Udemy course


Diagrams

Process Diagram

none]Process Diagram


Architecture Diagram

none]Architecture Diagram


Tools Used

Tools Used


User Interface Prototype Diagram

User Interface Prototype