Difference between revisions of "IS480 Team wiki: 2017T1 Team Atoms MidTerm"
(44 intermediate revisions by the same user not shown) | |||
Line 30: | Line 30: | ||
{| style="background-color:white; color:white padding: 5px 0 0 0;" width="100%" height=50px cellspacing="0" cellpadding="0" valign="top" border="0" | | {| style="background-color:white; color:white padding: 5px 0 0 0;" width="100%" height=50px cellspacing="0" cellpadding="0" valign="top" border="0" | | ||
− | | style="vertical-align:top;width:20%;" | <div style="padding: 1px; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #1D1D1D; font-family:Roboto"> [[IS480 Team wiki: 2017T1 | + | | style="vertical-align:top;width:20%;" | <div style="padding: 1px; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #1D1D1D; font-family:Roboto"> [[IS480 Team wiki: 2017T1 Atoms| <font color="#38474E"><b>Main Wiki</b>]] |
| style="vertical-align:top;width:20%;" | <div style="padding: 1px; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #1D1D1D; font-family:Roboto"> [[IS480 Team wiki: 2017T1 Team Atoms MidTerm| <font color="#A01D21"><b>MidTerm Wiki</b>]] | | style="vertical-align:top;width:20%;" | <div style="padding: 1px; text-align:center; line-height: wrap_content; font-size:16px; border-bottom:1px solid #1D1D1D; font-family:Roboto"> [[IS480 Team wiki: 2017T1 Team Atoms MidTerm| <font color="#A01D21"><b>MidTerm Wiki</b>]] | ||
Line 43: | Line 43: | ||
==Project Progress Summary== | ==Project Progress Summary== | ||
<br/> | <br/> | ||
− | [[Image:Atoms midterm dashboard.png|center| | + | [[Image:Atoms midterm dashboard.png|center|800px]] |
<center> | <center> | ||
− | [[Image:Atoms midterm slides.png|100px|link=]] | + | [[Image:Atoms midterm slides.png|100px|link=File:Atoms IS480 Midterm slide.pdf]] |
[[Image:Atoms midterm deployedsite.png|140px|link=http://10.0.106.101/]] | [[Image:Atoms midterm deployedsite.png|140px|link=http://10.0.106.101/]] | ||
Line 52: | Line 52: | ||
</center> | </center> | ||
− | |||
− | |||
==Project Management== | ==Project Management== | ||
− | ===Project Status=== | + | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Project Status</font></div>=== |
[[Image:Atoms midterm projectstatus.png|center|900px]] | [[Image:Atoms midterm projectstatus.png|center|900px]] | ||
− | ===Project Schedule (Plan vs. Actual)=== | + | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Project Schedule (Plan vs. Actual)</font></div>=== |
<div style="font-family:Roboto;font-size:16px"> | <div style="font-family:Roboto;font-size:16px"> | ||
− | Several changes were made to the project schedule due to greater emphasis on Data Classification module and preparation for Lab Release(Live usage) as requested by sponsors. Hence, Atoms had dropped afew tasks that were not critical for the lab completion (as reflected in the actual schedule). The changes in the iterations were made to ensure the completion of the project on time and to also optimize the sponsor's requirements. Progress of the team is well-paced and optimistic. | + | Several changes were made to the project schedule due to greater emphasis on Data Classification module and preparation for Lab Release(Live usage) as requested by sponsors. Hence, Atoms had rescheduled and dropped afew tasks that were not critical for the lab completion (as reflected in the actual schedule). The changes in the iterations were made to ensure the completion of the project on time and to also optimize the sponsor's requirements. Progress of the team is well-paced and optimistic. |
<br/> | <br/> | ||
− | ===Planned Project Schedule=== | + | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Planned Project Schedule</font></div>=== |
− | ===Actual Project Schedule=== | + | [[Image:Atoms midterm plannedschedule.png|center|1100px]] |
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Actual Project Schedule</font></div>=== | ||
+ | [[Image:Atoms midterm actualschedule.png|center|1100px]] | ||
+ | |||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Project Metrics</font></div>=== | ||
+ | <!--Content--> | ||
+ | <br> | ||
+ | [[Image:Atoms schedule metric description.png|center|600px]] | ||
+ | [[Image:Atoms schedule metric.png|center|900px]] | ||
+ | <font size=3><center><b>Schedule Metric Breakdown: </b>https://www.dropbox.com/s/tmpczn3wr520nqf/Project_schedule%20.xlsx?dl=0 <font color="#20BCD2" size=2></font></center></font> | ||
+ | <br> | ||
+ | |||
+ | [[Image:Atoms bug metric description.png|center|600px]] | ||
+ | <br> | ||
+ | [[Image:Atoms bug metric.jpg|center|800px]] | ||
+ | <br> | ||
+ | [[Image:Atoms bug score.png|center|800px]] | ||
+ | <!--/Content--> | ||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Change Management</font></div>=== | ||
<div style="font-family:Roboto;font-size:16px"> | <div style="font-family:Roboto;font-size:16px"> | ||
Below is the change log for Iteration 7 to 10 (after acceptance): | Below is the change log for Iteration 7 to 10 (after acceptance): | ||
Line 125: | Line 144: | ||
|style="text-align: center;"| 08/8/2017 | |style="text-align: center;"| 08/8/2017 | ||
|style="text-align: center;"| Scope | |style="text-align: center;"| Scope | ||
− | |style="text-align: center;"| Add | + | |style="text-align: center;"| Add XG Boost(New Classification technique) function |
|style="text-align: center;"| Sponsor highlighted that this is a commonly used classification technique and will be really useful for student projects. Team has excess time to complete this | |style="text-align: center;"| Sponsor highlighted that this is a commonly used classification technique and will be really useful for student projects. Team has excess time to complete this | ||
|style="text-align: center;"| Fits into schedule without any expected delay as a consequence | |style="text-align: center;"| Fits into schedule without any expected delay as a consequence | ||
Line 148: | Line 167: | ||
</center> | </center> | ||
− | ===Project | + | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Project Risks</font></div>=== |
− | === | + | <br/> |
− | === | + | [[Image:Atoms Risk category.png|center|800px]] |
+ | <br/> | ||
+ | |||
+ | '''Existing & Potential Risk''' | ||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | Currently there are <b>no outstanding risk</b>. All identified risk and challenges have been addressed. However, from the period of Acceptance till before Mid-Terms, we have faced and resolved concerns arising from 1) Technical Risk and 2) Client Management Risk as described below: | ||
+ | |||
+ | </div> | ||
+ | <br/> | ||
+ | |||
+ | '''Risks & Challenges Faced''' | ||
+ | [[Image:Atoms midterm risk.png|center|900px]] | ||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Technical Complexity</font></div>=== | ||
+ | ====System Architecture==== | ||
+ | [[Image:Atoms system architecture.PNG| 500px]] | ||
+ | [[Image:Atoms system architecture2.PNG| 500px]] | ||
+ | ====Frontend==== | ||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''1. Canvas Graph Traversing Algorithm''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | In the KDDLabs, user can draw their own data mining process in the canvas with all the legit combination of functions. When users execute the process in the canvas, they are able to choose to partially execute the process or fully execute. In order to achieve this feature, our team designed a graph traversing algorithm to handle all the possible combinations the user could draw in the canvas. | ||
+ | The pseudo code and logical flow are as follow: | ||
+ | </div> | ||
+ | [[Image:Atoms midterms pseudocode canvas.PNG|center|450px]] | ||
+ | [[Image:Atoms midterms canvaspseudocode.jpg|center|650px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Following is an example when user is trying to execute the “Decision Tree Classifier 2”. | ||
+ | </div> | ||
+ | [[Image:Atoms midterms processflow.jpg|center|600px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Only the nodes in execution list will be traversed and executed as shown below. | ||
+ | </div> | ||
+ | [[Image:Atoms midterms processflow2.jpg|center|650px]] | ||
+ | <br/><br/> | ||
+ | |||
+ | ====Backend==== | ||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''2. Concurrency issue with Django & Matplotlib''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | *Django’s default architecture handles multiple requests using a built-in load balancer to cater to concurrent users and actions. | ||
+ | *Matplotlib is a library in Python used for plotting charts. | ||
+ | *Standard requests such as read/write operations work out of the box without issue | ||
+ | </div> | ||
+ | [[Image:Atoms midterm concurrency.png|center|450px]] | ||
+ | [[Image:Atoms midterms processflow2.jpg|center|650px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | '''Problem''' | ||
+ | *An issue will arise from a common use case shown above, when an Ensemble node runs and triggers multiple Decision Trees | ||
+ | *Each Decision Tree when executed, plots images (Confusion Matrix) - as a result, images appear to be drawn on a same canvas and 3 charts overlap each other (which becomes unreadable) | ||
+ | <br/> | ||
+ | '''Solution''' | ||
+ | *There is limited resources and documentation on this specific topic, therefore we had to find a solution ourselves | ||
+ | *One workaround is to assign a random id to each plot from (0,10000) and have every chart function create a plot on a different Figure object in the backend | ||
+ | *We also found out that each figure has to be closed after saving to prevent further complications (memory leaks) | ||
+ | *As a result, we also had to implement this for every other visualization to prevent the same issue when multiple users run a plot at the same time. | ||
+ | </div> | ||
+ | [[Image:Atoms midterm concurrencyconfusionmatrix.png|center|450px]] | ||
+ | <br/><br/> | ||
+ | |||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''3. Ensemble Algorithm''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | In Machine Learning (Classification), A hard Voting Classifier Ensemble technique “combines conceptually different machine learning classifiers and use a majority vote” (Sklearn). | ||
+ | For most algorithms, we make use of sklearn libraries to perform tasks. However, there is a problem in this particular use case shown below: | ||
+ | </div> | ||
+ | [[Image:Atoms midterms processflow2.jpg|center|650px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | '''Problem''' | ||
+ | *This scenario means that each Classifier (Decision Tree) is trained before the Ensemble combines the results | ||
+ | *However, sklearn’s VotingClassifier requires each Classifier to be created and trained together as a whole - Once the Ensemble is created to accept different Classifiers it loses its trained state! | ||
+ | *This means that we cannot use this library and would have to <b>implement our own Voting Classifier</b>. | ||
+ | [[Image:Atoms midterm ensemble.png|center|500px]] | ||
+ | '''Solution''' | ||
+ | *Once we understood how and Ensemble Voting works, we had to call each Classifier’s “predict” function and select the most occurring value for each row | ||
+ | *This would mean overriding Ensemble’s predict function for our use case as shown in the code below: | ||
+ | [[Image:Atoms midterm overwritingensemble.png|center|600px]] | ||
+ | [[Image:Atoms midterm ensembleoutput.png|center|400px]] | ||
+ | <br/> | ||
==Quality of Product== | ==Quality of Product== | ||
− | ===Intermediate Deliverables=== | + | <div style="font-family:Roboto;font-size:18px"> |
− | ===Deployment=== | + | '''1. Deployment Script''' |
− | ===Testing=== | + | </div> |
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Manual deployment can lead to multiple human error. Hence, we have created a deployment shell script that partially automates the process of the deployment of our web application. The steps that are automated includes: | ||
+ | <br/><br/> | ||
+ | 1. Stopping/Starting of system services running our web server and our web application<br/> | ||
+ | 2. Downloading of new source code from git repository<br/> | ||
+ | 3. Changing file system permission of directories and files<br/> | ||
+ | 4. Execution of Django specific deployment command<br/> | ||
+ | |||
+ | With a frequent deployment rate (every iteration - 2 weeks) the chances of error due to manual deployment is much higher. Hence with the deployment script we will be able to reduce such errors. | ||
+ | </div> | ||
+ | <br/> | ||
+ | |||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''2. Bench marking for Visualization''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | This is a scatter matrix plot for the famous Iris data set. It involves plotting every column against one another. Therefore, the computation is (# of rows) x (# of columns)2 | ||
+ | <br/><b>Complexity:</b> [[Image:Atoms midterm benchmarkingcomplexity.png|100px]] | ||
+ | </div> | ||
+ | [[Image:Atoms midterm benchmarking iris.png|center|500px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | '''Problem''' | ||
+ | *This would cause datasets which a large number of columns to take a considerable amount of time, consuming resources for a single user | ||
+ | *If many users execute this chart at the same time, it would result in a very long response time | ||
+ | <br/> | ||
+ | '''Solution''' | ||
+ | *To accurately measure how much time it would take for different dimensions of data sets, we generated datasets of different columns and rows and ran each charting function to see how much time it took. | ||
+ | </div> | ||
+ | [[Image:Atoms midterm benchmarking timing.png|center|800px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | '''Findings''' | ||
+ | *From the benchmark tests, we found out that the number of rows did not affect the performance as much as the number of columns | ||
+ | *From our findings, we implemented validations in place to disallow users to select too many columns (>10) for scatter matrix | ||
+ | <br/> | ||
+ | </div> | ||
+ | <br/> | ||
+ | |||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''3. Secure API - System security''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | *All backend APIs require user login to prevent unauthorized direct API calls | ||
+ | *For each API request to modify files, there is an implementation to verify if the file belongs to the user before the operation. | ||
+ | *If there is an unauthorized API request to the system, an appropriate error message will be displayed and the request will also be logged for investigation | ||
+ | </div> | ||
+ | [[Image:Atoms midterm secureapi.png|center|600px]] | ||
+ | <br/> | ||
+ | |||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''4. Google Analytics tracking implementation''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Google analytics tool can help our sponsor understand how students are interacting with our KDD Labs website, where they’re coming from and how often they visit, what parts of the site are capturing their attention and what parts aren’t sparking interest. | ||
+ | </div> | ||
+ | [[Image:Atoms midterm googleanalytics.png|center|600px]] | ||
+ | <br/> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | We also keep tracking user's’ behaviour and count number of events triggered in our system. This will allow us to keep a close monitor on functionalities being utilized in our system and assist us with tracking abnormal behavior. Furthermore, this will also be a useful tracking tool for the teaching team to understand the students usage behaviour. There are a total of 32 different activities that have been tracked since the website was announced to the students on the 7th Sep 17. The graph below shows the top 10 activities in our website. | ||
+ | [[Image:Atoms midterm googleanalytics top10.png|center|800px]] | ||
+ | </div> | ||
+ | <br/> | ||
+ | |||
+ | <div style="font-family:Roboto;font-size:18px"> | ||
+ | '''5. System logger''' | ||
+ | </div> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Our system will consistently monitor and log down critical user actions and the problems the user encounters. This will help us to automatically track errors made in the system which will be used for our internal feedback when users utilize the KDD Labs system. We will then analyse these errors further and derive the root cause of such errors to try improve on our system if possible. | ||
+ | </div> | ||
+ | [[Image:Atoms midterm systemlogger.png|center|600px]] | ||
+ | <br/> | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Furthermore, to make the logging files easy to locate, we have created a logger that will rotate the logging file twice a day. The file name will be changed to the date and time when the file was last modified. | ||
+ | [[Image:Atoms midterm systemlogger rotation.png|center|500px]] | ||
+ | </div> | ||
+ | <br/> | ||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Intermediate Deliverables</font></div>=== | ||
+ | |||
+ | {| class="wikitable" style="background-color:#FFFFFF; font-family: Roboto" | ||
+ | |- | ||
+ | ! style="font-weight: bold;background: #38474E;color:#fff;" | Topic of Interest | ||
+ | ! style="font-weight: bold;background: #38474E;color:#fff;" | Link | ||
+ | |- | ||
+ | |rowspan="5"| Project Management | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Project Schedule | Project Schedule]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Meeting Minutes | Minutes]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Metrics | Metrics]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Risk Management | Risk Management]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Change Management | Change Management]] | ||
+ | |- | ||
+ | |||
+ | |rowspan=3| Project Overview | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Project Overview | Project Overview]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Motivation | Team's Motivation]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Project Scope | Project Scope]] | ||
+ | |- | ||
+ | |||
+ | |rowspan="3"| Project Documentation | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Diagrams | Diagrams]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Technologies | Technologies Implemented]] | ||
+ | |- | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Documentation | Low & Mid Fidelity Prototypes]] | ||
+ | |- | ||
+ | |||
+ | |rowspan="3"| Testing | ||
+ | || [[IS480 Team wiki: 2017T1 Team Atoms Internal Testing | Testing Documentation]] | ||
+ | |} | ||
+ | |||
+ | <br/> | ||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Deployment</font></div>=== | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | '''Note: Application link provided is for our test server, which is only available within the SMU Network. Otherwise consider using VPN to access the SMU Network''' | ||
+ | <br/> | ||
+ | To view application, visit <br/> | ||
+ | Test server: http://10.0.106.101/ <br/> | ||
+ | Username: demo<br/> | ||
+ | Password: demopassword123 | ||
+ | |||
+ | |||
+ | '''Note: This public server is currently being utilized for Live Usage by the IS424 Data Mining and Business Analytics students for their labs and project completion. '''<br/> | ||
+ | Production server: https://kddlabs.cn/ | ||
+ | </div> | ||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Testing</font></div>=== | ||
+ | <center> | ||
+ | [[Image: Atoms internaltesting.PNG|180px|link=IS480 Team wiki: 2017T1 Team Atoms Internal Testing]] | ||
+ | [[Image: Atoms UAT1.PNG|150px|link=IS480 Team wiki: 2017T1 Team Atoms User Testing 1]] | ||
+ | [[Image: Atoms UAT2.PNG|150px|link=IS480 Team wiki: 2017T1 Team Atoms User Testing 2]] | ||
+ | [[Image: Atoms Liveusage.PNG|150px|link=IS480 Team wiki: 2017T1 Team Atoms Live Usage]] | ||
+ | </center> | ||
+ | ====Internal Testing==== | ||
+ | <div style="font-family: Roboto; font-size:16px"> | ||
+ | <p> | ||
+ | We engage in comprehensive manual testing in every iteration. The developers will conduct individual testing before committing their codes on our shared repository, GitHub. We believe in testing the application manually at this level because tests can be specially adjusted to cater to changes in the application, both on the front and back end. Furthermore, manual testing brings about the human factor, allowing us to better discover problems that might surface during real usage due to natural human behavior. | ||
+ | </p> | ||
+ | |||
+ | <p> | ||
+ | Once the developers have fixed the bugs, the fixed set of codes will be merged and integrated with the other functionalities. Subsequently, the integrated code is then deployed on the test server and the lead quality assurance will run a final check against the set of test cases created. This helps to ensure that the deployed application works with no major incidents. | ||
+ | </p> | ||
+ | |||
+ | <p> The team's lead quality assurance then performs regression testing on the test server where previous functionalities developed are tested again. This helps to ensure that existing functionalities in the application are not affected by the integration. Once bugs have been identified, the lead quality assurance will then update the bug-tracking Excel sheet and notify the relevant developers of the issues and the corresponding priority level. | ||
+ | </p> | ||
+ | <br/> | ||
+ | |||
+ | <p> | ||
+ | The team’s list of test cases can be found on our private repository [https://www.dropbox.com/sh/qiw71fh79t57bgc/AAC9vhkbvkcdIsbVseHNkZB1a?dl=0 here]. | ||
+ | </p> | ||
+ | </div> | ||
+ | |||
+ | ====User Acceptance Test 1 & 2==== | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | Team Atoms has conducted 2 user tests which allowed us to better manage sponsor expectations as well as improve on usability of our application interface. | ||
+ | </div> | ||
+ | <center> | ||
+ | [[Image: Atoms midterms uat summary.png|800px]] | ||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | For more detailed version of Team Atoms user acceptance test results, access it here: | ||
+ | </div> | ||
+ | [[Image: Atoms UAT1.PNG|150px|link=IS480 Team wiki: 2017T1 Team Atoms User Testing 1]] | ||
+ | | ||
+ | [[Image: Atoms UAT2.PNG|150px|link=IS480 Team wiki: 2017T1 Team Atoms User Testing 2]] | ||
+ | </center> | ||
+ | <br/> | ||
+ | |||
+ | ====Live Usage==== | ||
+ | <div style="font-family:Roboto; font-size:16px"> | ||
+ | Through our live usage and roll out, IS424 students were able to complete their take home lab assignments on our system. Thus we were able to gather feedback from our end users about KDD Labs system directly. In addition, we were also able to compare the user experience with the existing alternative used in class (SAS EM). From our feedback, we have received positive response about the KDD Labs system stating that the students were able to complete their in class lab exercise. Not only that, they also found that the KDD Labs system was easier to use as compared to SAS EM. | ||
+ | </div> | ||
+ | |||
+ | <br/> | ||
+ | <b>Lab1</b> | ||
+ | <div style="font-family:Roboto; font-size:16px"> | ||
+ | <p> | ||
+ | <b>Release Date: </b>01 Sep 2017, Friday <br /> | ||
+ | <b>Duration: </b>2-3 hours per user<br /> | ||
+ | <b>Number of Users(s):</b> 45 <br /> | ||
+ | <b>Lab1 User Guide:</b> Created instructions can be found [https://www.dropbox.com/s/vblj4ebf20li9vp/Lab_1_KDD.pdf?dl=0 here] <br /> | ||
+ | <center><div style="font-family:Roboto; font-size:16px"> </div> | ||
+ | <b> FEEDBACK RESULTS FROM LIVE USERS </b> | ||
+ | |||
+ | [[Image:Atoms lab1feedback Q1.png|center|600px]] | ||
+ | [[Image:Atoms lab1feedback Q4.png|center|600px]] | ||
+ | </center> | ||
+ | <b>Lab1 Feedback Results:</b> The rest of the results from the Live user feedback can be found [https://www.dropbox.com/s/3k2w0rusy92ehf3/Lab_1_KDD_Feedback.pdf?dl=0 here] <br /> | ||
+ | </p> | ||
+ | </div> | ||
+ | <br/> | ||
+ | |||
+ | <b>Lab2</b> | ||
+ | <div style="font-family:Roboto; font-size:16px"> | ||
+ | <p> | ||
+ | <b>Release Date: </b>15 Sep 2017, Friday<br /> | ||
+ | <b>Duration: </b>2-3 hours per user<br /> | ||
+ | <b>Number of Users(s):</b> 45 <br /> | ||
+ | <b>Lab2 User Guide:</b> Created instructions can be found [https://www.dropbox.com/s/lych7e221u7a2y1/Lab_2_KDD.pdf?dl=0 here] <br /> | ||
+ | <b>Lab2 Feedback Results:</b> Live user feedback can be found [https://www.dropbox.com/s/gurygkcam51obiy/Lab_2_KDD_Feedback.pdf?dl=0 here] <br /> | ||
+ | </p> | ||
+ | </div> | ||
==Reflection== | ==Reflection== | ||
− | ===Team Reflection=== | + | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Team Reflection</font></div>=== |
− | ===Sponsors' Testimonial=== | + | |
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | This journey has proven to be an enriching learning experience for Team Atoms. The project had many new learning points for the team as it was highly technical- we had to understand and grasp the concepts of the data mining process and algorithms within a short period of time. In addition, we also learnt the importance of good stakeholder management which allows us to better react to unforeseen circumstances. Through an active team participation and communication we were able to mitigate existing issues and deliver a quality project on time. | ||
+ | </div> | ||
+ | |||
+ | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Sponsors' Testimonial</font></div>=== | ||
+ | <br/> | ||
+ | [[Image:Atoms midterms sponsor.PNG|350px|center]] | ||
+ | |||
+ | <div style="font-family:Roboto;font-size:16px"> | ||
+ | "Team ATOMS is a capable and sincere team, that has done very well in the course of the project. KDD Labs project is very challenging and in particular requires a very diverse set of technical skills. ATOMS have made substantial efforts in acquiring new skills and integrating them to deliver a quality product in a timely manner. They have stoically faced the technical issues and challenging feature/change requests, and demonstrated an excellent work ethic in delivering on their targets. Discussions with them have been thought provoking and rewarding, and have significantly contributed towards improving the product quality" - <b>Sponsor, Doyen Sahoo </b> | ||
+ | </div> | ||
− | ===Individual Reflections=== | + | ===<div style="background: #38474E; padding: 15px; line-height: 0.3em; text-indent: 15px; font-size:16px; font-family:Garamond"><font color= #FFFFFF>Individual Reflections</font></div>=== |
+ | [[Image:Atoms midterms reflections.PNG|750px|center]] | ||
<!--/Content--> | <!--/Content--> |
Latest revision as of 20:30, 29 October 2017
Project Progress Summary
User: demo | Password: demopassword123
Project Management
Project Status
Project Schedule (Plan vs. Actual)
Several changes were made to the project schedule due to greater emphasis on Data Classification module and preparation for Lab Release(Live usage) as requested by sponsors. Hence, Atoms had rescheduled and dropped afew tasks that were not critical for the lab completion (as reflected in the actual schedule). The changes in the iterations were made to ensure the completion of the project on time and to also optimize the sponsor's requirements. Progress of the team is well-paced and optimistic.
Planned Project Schedule
Actual Project Schedule
Project Metrics
Change Management
Below is the change log for Iteration 7 to 10 (after acceptance):
Iteration | Date | Type | Change Request | Rationale | Feasibility | Outcome | Priority | Status of Request | Issued By |
---|---|---|---|---|---|---|---|---|---|
7 | 19/8/2017 | Scope & Schedule | Reschedule classification and clustering module | Reschedule functions based on lab exercise and release dates | Fits into schedule without any expected delay as a consequence. | Accepted | High | Closed | Team |
8 | 23/8/2017 | Scope | Remove Neural Network function | Team feels that scope is too large and functionality is not critical for labs completion | Group, sponsor and supervisor agree to removal of unnecessary functions | Accepted | High | Closed | Team |
8 | 02/9/2017 | Scope | Add Aggregation function | Team realize that the functions is required to effectively complete Lab1 | Fits into schedule without any expected delay as a consequence | Accepted | Low | Closed | Team |
9 | 08/8/2017 | Scope | Add XG Boost(New Classification technique) function | Sponsor highlighted that this is a commonly used classification technique and will be really useful for student projects. Team has excess time to complete this | Fits into schedule without any expected delay as a consequence | Accepted | Low | Closed | Sponsor |
9 | 08/8/2017 | Scope | Add static documentation web page | Sponsor request for documentation/user guide to allow new users to be familiarize with the system. Team has excess time to complete this function | Fits into schedule without any expected delay as a consequence | Accepted | Low | Closed | Sponsor |
Project Risks
Existing & Potential Risk
Currently there are no outstanding risk. All identified risk and challenges have been addressed. However, from the period of Acceptance till before Mid-Terms, we have faced and resolved concerns arising from 1) Technical Risk and 2) Client Management Risk as described below:
Risks & Challenges Faced
Technical Complexity
System Architecture
Frontend
1. Canvas Graph Traversing Algorithm
In the KDDLabs, user can draw their own data mining process in the canvas with all the legit combination of functions. When users execute the process in the canvas, they are able to choose to partially execute the process or fully execute. In order to achieve this feature, our team designed a graph traversing algorithm to handle all the possible combinations the user could draw in the canvas. The pseudo code and logical flow are as follow:
Following is an example when user is trying to execute the “Decision Tree Classifier 2”.
Only the nodes in execution list will be traversed and executed as shown below.
Backend
2. Concurrency issue with Django & Matplotlib
- Django’s default architecture handles multiple requests using a built-in load balancer to cater to concurrent users and actions.
- Matplotlib is a library in Python used for plotting charts.
- Standard requests such as read/write operations work out of the box without issue
Problem
- An issue will arise from a common use case shown above, when an Ensemble node runs and triggers multiple Decision Trees
- Each Decision Tree when executed, plots images (Confusion Matrix) - as a result, images appear to be drawn on a same canvas and 3 charts overlap each other (which becomes unreadable)
Solution
- There is limited resources and documentation on this specific topic, therefore we had to find a solution ourselves
- One workaround is to assign a random id to each plot from (0,10000) and have every chart function create a plot on a different Figure object in the backend
- We also found out that each figure has to be closed after saving to prevent further complications (memory leaks)
- As a result, we also had to implement this for every other visualization to prevent the same issue when multiple users run a plot at the same time.
3. Ensemble Algorithm
In Machine Learning (Classification), A hard Voting Classifier Ensemble technique “combines conceptually different machine learning classifiers and use a majority vote” (Sklearn). For most algorithms, we make use of sklearn libraries to perform tasks. However, there is a problem in this particular use case shown below:
Problem
- This scenario means that each Classifier (Decision Tree) is trained before the Ensemble combines the results
- However, sklearn’s VotingClassifier requires each Classifier to be created and trained together as a whole - Once the Ensemble is created to accept different Classifiers it loses its trained state!
- This means that we cannot use this library and would have to implement our own Voting Classifier.
Solution
- Once we understood how and Ensemble Voting works, we had to call each Classifier’s “predict” function and select the most occurring value for each row
- This would mean overriding Ensemble’s predict function for our use case as shown in the code below:
Quality of Product
1. Deployment Script
Manual deployment can lead to multiple human error. Hence, we have created a deployment shell script that partially automates the process of the deployment of our web application. The steps that are automated includes:
1. Stopping/Starting of system services running our web server and our web application
2. Downloading of new source code from git repository
3. Changing file system permission of directories and files
4. Execution of Django specific deployment command
With a frequent deployment rate (every iteration - 2 weeks) the chances of error due to manual deployment is much higher. Hence with the deployment script we will be able to reduce such errors.
2. Bench marking for Visualization
This is a scatter matrix plot for the famous Iris data set. It involves plotting every column against one another. Therefore, the computation is (# of rows) x (# of columns)2
Complexity:
Problem
- This would cause datasets which a large number of columns to take a considerable amount of time, consuming resources for a single user
- If many users execute this chart at the same time, it would result in a very long response time
Solution
- To accurately measure how much time it would take for different dimensions of data sets, we generated datasets of different columns and rows and ran each charting function to see how much time it took.
Findings
- From the benchmark tests, we found out that the number of rows did not affect the performance as much as the number of columns
- From our findings, we implemented validations in place to disallow users to select too many columns (>10) for scatter matrix
3. Secure API - System security
- All backend APIs require user login to prevent unauthorized direct API calls
- For each API request to modify files, there is an implementation to verify if the file belongs to the user before the operation.
- If there is an unauthorized API request to the system, an appropriate error message will be displayed and the request will also be logged for investigation
4. Google Analytics tracking implementation
Google analytics tool can help our sponsor understand how students are interacting with our KDD Labs website, where they’re coming from and how often they visit, what parts of the site are capturing their attention and what parts aren’t sparking interest.
We also keep tracking user's’ behaviour and count number of events triggered in our system. This will allow us to keep a close monitor on functionalities being utilized in our system and assist us with tracking abnormal behavior. Furthermore, this will also be a useful tracking tool for the teaching team to understand the students usage behaviour. There are a total of 32 different activities that have been tracked since the website was announced to the students on the 7th Sep 17. The graph below shows the top 10 activities in our website.
5. System logger
Our system will consistently monitor and log down critical user actions and the problems the user encounters. This will help us to automatically track errors made in the system which will be used for our internal feedback when users utilize the KDD Labs system. We will then analyse these errors further and derive the root cause of such errors to try improve on our system if possible.
Furthermore, to make the logging files easy to locate, we have created a logger that will rotate the logging file twice a day. The file name will be changed to the date and time when the file was last modified.
Intermediate Deliverables
Topic of Interest | Link |
---|---|
Project Management | Project Schedule |
Minutes | |
Metrics | |
Risk Management | |
Change Management | |
Project Overview | Project Overview |
Team's Motivation | |
Project Scope | |
Project Documentation | Diagrams |
Technologies Implemented | |
Low & Mid Fidelity Prototypes | |
Testing | Testing Documentation |
Deployment
Note: Application link provided is for our test server, which is only available within the SMU Network. Otherwise consider using VPN to access the SMU Network
To view application, visit
Test server: http://10.0.106.101/
Username: demo
Password: demopassword123
Note: This public server is currently being utilized for Live Usage by the IS424 Data Mining and Business Analytics students for their labs and project completion.
Production server: https://kddlabs.cn/
Testing
Internal Testing
We engage in comprehensive manual testing in every iteration. The developers will conduct individual testing before committing their codes on our shared repository, GitHub. We believe in testing the application manually at this level because tests can be specially adjusted to cater to changes in the application, both on the front and back end. Furthermore, manual testing brings about the human factor, allowing us to better discover problems that might surface during real usage due to natural human behavior.
Once the developers have fixed the bugs, the fixed set of codes will be merged and integrated with the other functionalities. Subsequently, the integrated code is then deployed on the test server and the lead quality assurance will run a final check against the set of test cases created. This helps to ensure that the deployed application works with no major incidents.
The team's lead quality assurance then performs regression testing on the test server where previous functionalities developed are tested again. This helps to ensure that existing functionalities in the application are not affected by the integration. Once bugs have been identified, the lead quality assurance will then update the bug-tracking Excel sheet and notify the relevant developers of the issues and the corresponding priority level.
The team’s list of test cases can be found on our private repository here.
User Acceptance Test 1 & 2
Team Atoms has conducted 2 user tests which allowed us to better manage sponsor expectations as well as improve on usability of our application interface.
For more detailed version of Team Atoms user acceptance test results, access it here:
Live Usage
Through our live usage and roll out, IS424 students were able to complete their take home lab assignments on our system. Thus we were able to gather feedback from our end users about KDD Labs system directly. In addition, we were also able to compare the user experience with the existing alternative used in class (SAS EM). From our feedback, we have received positive response about the KDD Labs system stating that the students were able to complete their in class lab exercise. Not only that, they also found that the KDD Labs system was easier to use as compared to SAS EM.
Lab1
Release Date: 01 Sep 2017, Friday
Duration: 2-3 hours per user
Number of Users(s): 45
Lab1 User Guide: Created instructions can be found here
FEEDBACK RESULTS FROM LIVE USERS
Lab1 Feedback Results: The rest of the results from the Live user feedback can be found here
Lab2
Release Date: 15 Sep 2017, Friday
Duration: 2-3 hours per user
Number of Users(s): 45
Lab2 User Guide: Created instructions can be found here
Lab2 Feedback Results: Live user feedback can be found here
Reflection
Team Reflection
This journey has proven to be an enriching learning experience for Team Atoms. The project had many new learning points for the team as it was highly technical- we had to understand and grasp the concepts of the data mining process and algorithms within a short period of time. In addition, we also learnt the importance of good stakeholder management which allows us to better react to unforeseen circumstances. Through an active team participation and communication we were able to mitigate existing issues and deliver a quality project on time.
Sponsors' Testimonial
"Team ATOMS is a capable and sincere team, that has done very well in the course of the project. KDD Labs project is very challenging and in particular requires a very diverse set of technical skills. ATOMS have made substantial efforts in acquiring new skills and integrating them to deliver a quality product in a timely manner. They have stoically faced the technical issues and challenging feature/change requests, and demonstrated an excellent work ethic in delivering on their targets. Discussions with them have been thought provoking and rewarding, and have significantly contributed towards improving the product quality" - Sponsor, Doyen Sahoo
Individual Reflections