HeaderSIS.jpg

IS480 Team wiki: 2017T1 Team Atoms MidTerm

From IS480
Revision as of 04:19, 1 October 2017 by Jeremy.lee.2014 (talk | contribs)
Jump to navigation Jump to search
Atoms Logo.png


Team Atom Icon Home.png

HOME

  Team Atom Icon Aboutus.png

ABOUT US

  Team Atom Icon Projectoverview.png

PROJECT OVERVIEW

  Team Atom Icon Projectmanagement.png

PROJECT MANAGEMENT

  Team Atom Icon Documents.png

DOCUMENTATION



Project Progress Summary


Atoms midterm dashboard.png

Atoms midterm slides.png Atoms midterm deployedsite.png


User: demo | Password: demopassword123

Project Management

Project Status

Atoms midterm projectstatus.png

Project Schedule (Plan vs. Actual)

Several changes were made to the project schedule due to greater emphasis on Data Classification module and preparation for Lab Release(Live usage) as requested by sponsors. Hence, Atoms had rescheduled and dropped afew tasks that were not critical for the lab completion (as reflected in the actual schedule). The changes in the iterations were made to ensure the completion of the project on time and to also optimize the sponsor's requirements. Progress of the team is well-paced and optimistic.

Planned Project Schedule

Atoms midterm plannedschedule.png

Actual Project Schedule

Atoms midterm actualschedule.png


Project Metrics

Change Management

Below is the change log for Iteration 7 to 10 (after acceptance):

Change Log
Iteration Date Type Change Request Rationale Feasibility Outcome Priority Status of Request Issued By
7 19/8/2017 Scope & Schedule Reschedule classification and clustering module Reschedule functions based on lab exercise and release dates Fits into schedule without any expected delay as a consequence. Accepted High Closed Team
8 23/8/2017 Scope Remove Neural Network function Team feels that scope is too large and functionality is not critical for labs completion Group, sponsor and supervisor agree to removal of unnecessary functions Accepted High Closed Team
8 02/9/2017 Scope Add Aggregation function Team realize that the functions is required to effectively complete Lab1 Fits into schedule without any expected delay as a consequence Accepted Low Closed Team
9 08/8/2017 Scope Add XG Boost(New Classification technique) function Sponsor highlighted that this is a commonly used classification technique and will be really useful for student projects. Team has excess time to complete this Fits into schedule without any expected delay as a consequence Accepted Low Closed Sponsor
9 08/8/2017 Scope Add static documentation web page Sponsor request for documentation/user guide to allow new users to be familiarize with the system. Team has excess time to complete this function Fits into schedule without any expected delay as a consequence Accepted Low Closed Sponsor

Project Risks

Technical Complexity

Frontend

1. Canvas Graph Traversing Algorithm

In the KDDLabs, user can draw their own data mining process in the canvas with all the legit combination of functions. When users execute the process in the canvas, they are able to choose to partially execute the process or fully execute. In order to achieve this feature, our team designed a graph traversing algorithm to handle all the possible combinations the user could draw in the canvas. The pseudo code and logical flow are as follow:

Atoms midterms pseudocode canvas.PNG
Atoms midterms canvaspseudocode.jpg

Following is an example when user is trying to execute the “Decision Tree Classifier 2”.

Atoms midterms processflow.jpg

Only the nodes in execution list will be traversed and executed as shown below.

Atoms midterms processflow2.jpg



Backend

2. Concurrency issue with Django & Matplotlib

  • Django’s default architecture handles multiple requests using a built-in load balancer to cater to concurrent users and actions.
  • Matplotlib is a library in Python used for plotting charts.
  • Standard requests such as read/write operations work out of the box without issue
Atoms midterm concurrency.png
Atoms midterms processflow2.jpg

Problem

  • An issue will arise from a common use case shown above, when an Ensemble node runs and triggers multiple Decision Trees
  • Each Decision Tree when executed, plots images (Confusion Matrix) - as a result, images appear to be drawn on a same canvas and 3 charts overlap each other (which becomes unreadable)


Solution

  • There is limited resources and documentation on this specific topic, therefore we had to find a solution ourselves
  • One workaround is to assign a random id to each plot from (0,10000) and have every chart function create a plot on a different Figure object in the backend
  • We also found out that each figure has to be closed after saving to prevent further complications (memory leaks)
  • As a result, we also had to implement this for every other visualization to prevent the same issue when multiple users run a plot at the same time.
Atoms midterm concurrencyconfusionmatrix.png



3. Ensemble Algorithm

In Machine Learning (Classification), A hard Voting Classifier Ensemble technique “combines conceptually different machine learning classifiers and use a majority vote” (Sklearn). For most algorithms, we make use of sklearn libraries to perform tasks. However, there is a problem in this particular use case shown below:

Atoms midterms processflow2.jpg

Problem

  • This scenario means that each Classifier (Decision Tree) is trained before the Ensemble combines the results
  • However, sklearn’s VotingClassifier requires each Classifier to be created and trained together as a whole - Once the Ensemble is created to accept different Classifiers it loses its trained state!
  • This means that we cannot use this library and would have to implement our own Voting Classifier.
Atoms midterm ensemble.png

Solution

  • Once we understood how and Ensemble Voting works, we had to call each Classifier’s “predict” function and select the most occurring value for each row
  • This would mean overriding Ensemble’s predict function for our use case as shown in the code below:
Atoms midterm overwritingensemble.png
Atoms midterm ensembleoutput.png


Quality of Product

Intermediate Deliverables

Topic of Interest Link
Project Management Project Schedule
Minutes
Metrics
Risk Management
Change Management
Project Overview Project Overview
Team's Motivation
Project Scope
Project Documentation Diagrams
Technologies Implemented
Low & Mid Fidelity Prototypes
Testing Testing Documentation


Deployment

Note: Application link provided is for our test server, which is only available within the SMU Network. Otherwise consider using VPN to access the SMU Network
To view application, visit
Test server: http://10.0.106.101/
Username: demo
Password: demopassword123


Note: This public server is currently being utilized for Live Usage by the IS424 Data Mining and Business Analytics students for their labs and project completion.
Production server: https://kddlabs.cn/

Testing

Atoms internaltesting.PNG Atoms UAT1.PNG Atoms UAT2.PNG Atoms Liveusage.PNG

Internal Testing

We engage in comprehensive manual testing in every iteration. The developers will conduct individual testing before committing their codes on our shared repository, GitHub. We believe in testing the application manually at this level because tests can be specially adjusted to cater to changes in the application, both on the front and back end. Furthermore, manual testing brings about the human factor, allowing us to better discover problems that might surface during real usage due to natural human behavior.

Once the developers have fixed the bugs, the fixed set of codes will be merged and integrated with the other functionalities. Subsequently, the integrated code is then deployed on the test server and the lead quality assurance will run a final check against the set of test cases created. This helps to ensure that the deployed application works with no major incidents.

The team's lead quality assurance then performs regression testing on the test server where previous functionalities developed are tested again. This helps to ensure that existing functionalities in the application are not affected by the integration. Once bugs have been identified, the lead quality assurance will then update the bug-tracking Excel sheet and notify the relevant developers of the issues and the corresponding priority level.


The team’s list of test cases can be found on our private repository here.

User Acceptance Test 1 & 2

Team Atoms has conducted 2 user tests which allowed us to better manage sponsor expectations as well as improve on usability of our application interface.

Atoms midterms uat summary.png

For more detailed version of Team Atoms user acceptance test results, access it here:

Atoms UAT1.PNG     Atoms UAT2.PNG


Live Usage

Through our live usage and roll out, IS424 students were able to complete their take home lab assignments on our system. Thus we were able to gather feedback from our end users about KDD Labs system directly. In addition, we were also able to compare the user experience with the existing alternative used in class (SAS EM). From our feedback, we have received positive response about the KDD Labs system stating that the students were able to complete their in class lab exercise. Not only that, they also found that the KDD Labs system was easier to use as compared to SAS EM.


Lab1

Release Date: 01 Sep 2017, Friday
Duration: 2-3 hours per user
Number of Users(s): 45 
Lab1 User Guide: Created instructions can be found here

FEEDBACK RESULTS FROM LIVE USERS

Atoms lab1feedback Q1.png
Atoms lab1feedback Q4.png

Lab1 Feedback Results: The rest of the results from the Live user feedback can be found here


Lab2

Release Date: 15 Sep 2017, Friday
Duration: 2-3 hours per user
Number of Users(s): 45 
Lab2 User Guide: Created instructions can be found here
Lab2 Feedback Results: We are still in the process of collecting user feedback

Reflection

Team Reflection

Sponsors' Testimonial

Individual Reflections