Difference between revisions of "GeViz"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 67: Line 67:
 
==<div style="background:#143c67; padding: 15px; font-weight: bold; line-height: 0.3em;letter-spacing:0.5em;font-size:20px"><font color=#fbfcfd face="Lato"><center>APPROACH</center></font></div>==
 
==<div style="background:#143c67; padding: 15px; font-weight: bold; line-height: 0.3em;letter-spacing:0.5em;font-size:20px"><font color=#fbfcfd face="Lato"><center>APPROACH</center></font></div>==
 
[[File:Approach 2.png|800px|frameless|center]]
 
[[File:Approach 2.png|800px|frameless|center]]
===Text Classification using Support Vector Classifier (SVC) ===
+
<b> Exploratory Data Analytics </b>
 +
We used Tableau to perform EDA to better understand our dataset and to aid us in the conceptualization of our story board.
 +
<br/>
 +
 
 +
<b> Data Cleaning and Feature Creation </b>
 +
We used Excel and Python to create a new column showing the Ministry that each agency belongs to by merging with data obtained from the Singapore Government Directory.
 +
<br/>
 +
 
 +
<b> Text Classification using Support Vector Classifier (SVC) </b>
 
One of the key challenges of working with the provided procurement dataset is the absence of categorization of each procurement transaction. Instead of labelling manually, we applied <b>machine learning</b> to classify the tender descriptions into different categories. We firstly scraped the procurement descriptions and categories from GeBiz website using <b> Selenium</b> and <b> BeautifulSoup </b> libraries in Python to be used as the training and validation dataset in our Support Vector Classifier model. We were able to achieve <b> 90% for training accuracy </b> before performing the categorization prediction. <br/><br/>
 
One of the key challenges of working with the provided procurement dataset is the absence of categorization of each procurement transaction. Instead of labelling manually, we applied <b>machine learning</b> to classify the tender descriptions into different categories. We firstly scraped the procurement descriptions and categories from GeBiz website using <b> Selenium</b> and <b> BeautifulSoup </b> libraries in Python to be used as the training and validation dataset in our Support Vector Classifier model. We were able to achieve <b> 90% for training accuracy </b> before performing the categorization prediction. <br/><br/>
  
Line 88: Line 96:
 
* Sub Category  
 
* Sub Category  
 
|}
 
|}
 +
 +
<br/>
 +
<b> Visualization in R </b>
 +
The web application will be built in R and deployed to Shinyapps.io
  
 
==<div style="background:#143c67; padding:15px; font-weight: bold; line-height: 0.3em;letter-spacing:0.5em;font-size:20px"><font color=#fbfcfd face="Lato"><center>BACKGROUND SURVEY OF RELATED WORKS</center></font></div>==
 
==<div style="background:#143c67; padding:15px; font-weight: bold; line-height: 0.3em;letter-spacing:0.5em;font-size:20px"><font color=#fbfcfd face="Lato"><center>BACKGROUND SURVEY OF RELATED WORKS</center></font></div>==

Revision as of 02:38, 25 November 2018

Geviz.png


Team

 

Proposal

 

Poster

 

Application

 

Research Paper


PROBLEM & MOTIVATION

GeBIZ is a Singapore Government’s one-stop e-procurement portal which facilitates tender activities between Singapore government and local and overseas suppliers. Currently, there is no available tool to aid the public and ministries to understand and gain insights on the procurement made by the government under each ministry. Hence, we are motivated to create an interactive visualisation tool on government's procurement spending to allow the public and ministries to identify spending patterns and gain insights into procurement spending under each ministry.

OBJECTIVES

In this project, we are creating a visualisation that is able to show the following:

  • Gain an overview of procurement spending made by each ministry and agency
  • Identify the relationships between ministries, agencies and suppliers
  • Identify what are the goods and services procured by ministries and agencies under each category


SELECTED DATASETS

The following datasets will be used for analysis , as elaborated below:

Dataset/Source Data Attributes Rationale of Usage
Government Procurement Data (https://data.gov.sg/dataset/government-procurement)
  • Tender No
  • Agency
  • Tender Description
  • Award Date
  • Tender Detail Status
  • Supplier Name
  • Awarded Amount
To gain information on government procurement such as tender description, amount and supplier information
Ministry and Agencies List
  • Ministry
  • Agency
We will be looking through the Singapore Government Directory (https://www.gov.sg/sgdi/ministries) to categorise the agencies into their respective ministries. This will allow us to visualise the procurement spending on a ministry level.


APPROACH

Approach 2.png

Exploratory Data Analytics We used Tableau to perform EDA to better understand our dataset and to aid us in the conceptualization of our story board.

Data Cleaning and Feature Creation We used Excel and Python to create a new column showing the Ministry that each agency belongs to by merging with data obtained from the Singapore Government Directory.

Text Classification using Support Vector Classifier (SVC) One of the key challenges of working with the provided procurement dataset is the absence of categorization of each procurement transaction. Instead of labelling manually, we applied machine learning to classify the tender descriptions into different categories. We firstly scraped the procurement descriptions and categories from GeBiz website using Selenium and BeautifulSoup libraries in Python to be used as the training and validation dataset in our Support Vector Classifier model. We were able to achieve 90% for training accuracy before performing the categorization prediction.

Government Procurement Dataset after Text Classification

Dataset/Source Data Attributes
Government Procurement Data
  • Tender No
  • Agency
  • Tender Description
  • Award Date
  • Tender Detail Status
  • Supplier Name
  • Awarded Amount
  • Category
  • Sub Category


Visualization in R The web application will be built in R and deployed to Shinyapps.io

BACKGROUND SURVEY OF RELATED WORKS

Some of these visualizations that we draw inspiration from, are as follows:

Reference of Other Interactive Visualization What We Can Learn

Title : Pareto Analysis of Suppliers

Pareto Analysis Reference.png

Source: https://goo.gl/P9RjHk

  • The use of time series chart allows users to view the rise and fall of prices and prevents users from getting overwhelmed by too much cluttered data as compared to using bar charts.

Title : Word Cloud on Procurement Details

Word cloud reference.png

Source: https://linpack-for-tableau.com/data-visualizations/tableau-dashboards/procurement-dashboard/procurement-cockpit/

  • We can learn from this animation the temporal transition of the data points.
  • We can see the evolution of the data points for example in our case we can show the time transition for the lease end date. User will be able to see the change of the node from green to red if the lease is ending soon.

Title : Breakdown on Government Cost Savings

Treemap ref.png

Source: http://www.thevisualeverything.com/tag/budgets/

  • What we can learn on this project is the use of cross filtering to provide an interactive filtering of data.
  • The charts on the map will zoom into the details based on the user’s filter preference.

Title : Team Budget Breadown

Sankey diagram reference.png

Source:https://acquireprocure.com/spend-analysis-visualisation/3-reasons-procurement-professionals-use-sankey-diagrams

  • Area shading map allows us to quickly see which area has more HDB flats of which type.
  • We can also understand that there are new areas and drawing of boundary changes across the years.
  • There is also the information at a glance at the side for the users to view.

Title : Analyzing Involved Authorities, Tenders and Companies

Graph network references.png

Source: https://linkurio.us/blog/exploring-e1-3-trillion-in-public-contracts-with-graph-visualization/#!prettyPhotooard

  • Data is sorted in descending order, making sure that the viewer will be able to have quick inference.

BRAINSTORMING SESSIONS

First Draft

Brainstorm 1.png

[1] Treemap to show the spending breakdown for each category of all agencies under the selected ministry. The filters are year and ministry.
[2] Network diagram to show the relationship of agencies and suppliers of the selected ministry. The filters are year and ministry.
[3] Sankey diagram to show the cash flow between selected agency and suppliers for the selected category. The filters are year, ministry, agency and category.
[4] Word cloud to show an overview of the tender description for the selected agency and selected category. The filters are year, ministry, agency and category.

After consulting with prof, we made improvements to our first draft. Below is the second and finalised draft for our procurement dashboard.

Second Draft

Brainstorm 2.png

[1] Treemap to show the spending breakdown for each category of all agencies under the selected ministry. The filters are year and ministry.
[2] Network diagram to show the relationship of agencies and suppliers of the selected ministry. The filters are year and ministry. We added a new filter which allows the user to filter the suppliers based on the procurement amount.
[3] Sankey diagram to show the cash flow between selected agency and suppliers for the selected category. The filters are year, ministry, agency and category.
[4] Word cloud to show an overview of the tender description for the selected agency and selected category. The filters are year, ministry, agency and category. We added a searchable table below the word cloud to allow the user to search for keywords and view the exact tender description.

PROPOSED STORYBOARD


To be filled!

TECHNOLOGIES

Tools and technologies

Tools used.png


Data Architecture

Data architecture 2.png


KEY CHALLENGES

The following are some of the key technical challenges that we may face throughout the course of the project:

Key Challenges Mitigation Plan
Unfamiliarity with R and Rshiny Libraries
  • Attend R Shiny Workshop
  • Independent learning via online resources such as Datacamp
  • Ask team mates for help
Unfamiliarity with Libraries for Machine Learning and Web Crawling
  • Clean, transform and analyse data together
  • Independent learning via online resources
Data Cleaning and Transformation
  • Need to crawl data on website to obtain training data for text classification
  • Clean, transform the data together


TIMELINE


To be filled!



COMMENTS

Feel free to leave us some comments so that we can improve! We dont bite :)

No. Name Date Comments
1. Insert your name here Insert date here Insert comment here
2. Insert your name here Insert date here Insert comment here
3. Insert your name here Insert date here Insert comment here