HeaderSIS.jpg

Difference between revisions of "IS480 Team wiki: 2012T1 6-bit Project Management UT4"

From IS480
Jump to navigation Jump to search
 
(21 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
<div class="center" style="width:60%; margin-left:auto; margin-right:auto;">'''6-bit's Chapalang!''' is a social utility that connects people with friends and new friends <br> by offering a place for exchanging ideas and information on its public domain. <br> http://www.chapalang.com
 
<div class="center" style="width:60%; margin-left:auto; margin-right:auto;">'''6-bit's Chapalang!''' is a social utility that connects people with friends and new friends <br> by offering a place for exchanging ideas and information on its public domain. <br> http://www.chapalang.com
 
</div>
 
</div>
<font face="Arial" size="3">
+
<font face="Calibri">
 
<!--Navigation-->
 
<!--Navigation-->
 
{| style="background-color:#ffffff; color:#000000" width="100%" cellspacing="0" cellpadding="8" valign="top" border="1" |
 
{| style="background-color:#ffffff; color:#000000" width="100%" cellspacing="0" cellpadding="8" valign="top" border="1" |
  
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:90%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Final_Wikipage | <font color="#FFFFFF"><b>Final Wikipage</b></font>]]
+
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:110%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Final_Wikipage | <font color="#FFFFFF"><b>Final Wikipage</b></font>]]
  
 
|}
 
|}
 
{| style="background-color:#ffffff; color:#000000" width="100%" cellspacing="0" cellpadding="8" valign="top" border="0" |
 
{| style="background-color:#ffffff; color:#000000" width="100%" cellspacing="0" cellpadding="8" valign="top" border="0" |
  
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:90%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit | <font color="#FFFFFF"><b>Home</b></font>]]
+
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:110%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit | <font color="#FFFFFF"><b>Home</b></font>]]
  
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:90%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Technical_Overview | <font color="#FFFFFF"><b>Technical Overview</b></font>]]
+
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:110%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Technical_Overview | <font color="#FFFFFF"><b>Technical Overview</b></font>]]
  
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:90%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Project_Deliverables | <font color="#FFFFFF"><b>Project Deliverables</b></font>]]
+
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:110%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Project_Deliverables | <font color="#FFFFFF"><b>Project Deliverables</b></font>]]
  
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:90%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Project_Management | <font color="#FFFFFF"><b>Project Management</b></font>]]
+
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:110%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Project_Management | <font color="#FFFFFF"><b>Project Management</b></font>]]
  
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:90%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Learning_Outcomes | <font color="#FFFFFF"><b>Learning Outcomes</b></font>]]
+
| style="filter: progid:DXImageTransform.Microsoft.gradient(startColorstr='#FF0066', endColorstr='#FF0066'); background: -webkit-gradient(linear, left top, left bottom, from(#FF0066), to(#FF0066)); background: -moz-linear-gradient(top,  #FF0066,  #FF0066); font-size:110%; text-align:center; color:#ffffff" width="10%" | [[IS480_Team_wiki:_2012T1_6-bit_Learning_Outcomes | <font color="#FFFFFF"><b>Learning Outcomes</b></font>]]
  
 
|}
 
|}
Line 27: Line 27:
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Schedule</font></div>=
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Schedule</font></div>=
 
==Planned Schedule==
 
==Planned Schedule==
[[Image:6-bit_ScheduleDiagramOverview.png|left|900px|link=""]]
+
[[Image:6-bit_ScheduleDiagramOverview.png|left|750px|link=""]]
 
<br>
 
<br>
 
<br>
 
<br>
Line 86: Line 86:
 
|[[https://docs.google.com/document/d/1Qx4jdRZoE8hXv3u1R4dWke5b-kyMxIc7ASzPIh4ShRQ/edit |<font color="#CD004E"><b>Meeting Minute 6</b></font>]]
 
|[[https://docs.google.com/document/d/1Qx4jdRZoE8hXv3u1R4dWke5b-kyMxIc7ASzPIh4ShRQ/edit |<font color="#CD004E"><b>Meeting Minute 6</b></font>]]
 
|[[https://docs.google.com/document/d/1Pcd6ZADI7ZEuXzq3exrss4nJQhotxAq7-ENfy_OVs6o/edit |<font color="#CD004E"><b>Meeting Minute 16</b></font>]]
 
|[[https://docs.google.com/document/d/1Pcd6ZADI7ZEuXzq3exrss4nJQhotxAq7-ENfy_OVs6o/edit |<font color="#CD004E"><b>Meeting Minute 16</b></font>]]
 +
|[[https://docs.google.com/document/d/1NDCoF37ckdhbys57BLEXShcAcraxzOKpoNbFvpWWUQM/edit |<font color="#CD004E"><b>Meeting Minute 26</b></font>]]
 
|-
 
|-
 
|[[https://docs.google.com/document/d/1uSkv_cux0PqnRTKS-w72wQLgL1ZKaCpXhzDm5uamfMU/edit |<font color="#CD004E"><b>Meeting Minute 7</b></font>]]
 
|[[https://docs.google.com/document/d/1uSkv_cux0PqnRTKS-w72wQLgL1ZKaCpXhzDm5uamfMU/edit |<font color="#CD004E"><b>Meeting Minute 7</b></font>]]
 
|[[https://docs.google.com/document/d/12jW-ycBkfivYL2pg7nrVvb9SF7_XmiglVGGnWAjp6uQ/edit |<font color="#CD004E"><b>Meeting Minute 17</b></font>]]
 
|[[https://docs.google.com/document/d/12jW-ycBkfivYL2pg7nrVvb9SF7_XmiglVGGnWAjp6uQ/edit |<font color="#CD004E"><b>Meeting Minute 17</b></font>]]
 +
|[[https://docs.google.com/document/d/1Amo6YgCkVA3YD-iqyIxhyG8embuClo50KXsznVgoWQE/edit |<font color="#CD004E"><b>Meeting Minute 27</b></font>]]
 
|-
 
|-
 
|[[https://docs.google.com/document/d/1pTivyZ9D3jSU1Kf5VoW4-Xl_C9Abg-fL_6SI6GP_hz0/edit |<font color="#CD004E"><b>Meeting Minute 8</b></font>]]
 
|[[https://docs.google.com/document/d/1pTivyZ9D3jSU1Kf5VoW4-Xl_C9Abg-fL_6SI6GP_hz0/edit |<font color="#CD004E"><b>Meeting Minute 8</b></font>]]
Line 162: Line 164:
 
=====Scalability & Performance Testing=====
 
=====Scalability & Performance Testing=====
 
As the test does not require physical testers, the following appends the test environment.<br>
 
As the test does not require physical testers, the following appends the test environment.<br>
[[Image:6-bit_ut4spec.png]]
+
[[Image:6-bit_ut4spec.png|400px]]
  
 
=====Inter-Rater Reliability Test=====
 
=====Inter-Rater Reliability Test=====
Line 190: Line 192:
  
 
====Test Instruction====
 
====Test Instruction====
[https://dl.dropbox.com/u/56071797/User%20Testing%202%20Instructions.docx Click Here to Download User Testing 2 Instruction]
+
=====Inter-Rater Reliability Test=====
 +
This is a sample output of the first rater, for a product recommendation test.
 +
<br>
 +
[[Image:6-bit_ut4a.png|750px]]
 +
<br>
 +
This is a sample input sheet for the second rater, on product recommendation test.
 +
<br>
 +
[[Image:6-bit_ut4b.png|750px]]
 +
<br>
 +
In a descending order, 1 represents the most relevant item while 10 represents the least relevant item.
  
 
====Test Results====
 
====Test Results====
[[Image: 6bituser-testing2resultsfigure.png|250px]]
+
=====Scalability & Performance Test=====
 +
The terms “performance” and “scalability” are commonly used interchangeably, but the two are distinct: performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.
 +
*(a) Identification of activities
 +
[[Image:ChapalangA.png|300px]]
 +
*(b) Identification of bottleneck
 +
Based on the table above, we have identified that if a user experience is sequential, “Display single product” page has the longest elapsed time. At the same time, it has the longest elapsed time by itself, when compared to all the functions. Hence, we have identified this activity as the bottleneck.
 +
*(c) Measurement Performance
 +
Referring to the same table above, we have determined the average of 5 tests on the “Display single product” activity for 1 concurrent user to have an elapsed time of 1.76997 seconds. Due to natural variation of the recorded timing, we will adopt 2 seconds as the performance upper limit.
 
<br>
 
<br>
Based on the survey questions, the results are positive with most testers having a good level of comfort using our web application.
+
[[Image:ChapalangC.png|600px]]
<br> <br>
+
*(d) Scalability
There are also 21 reported bugs, and 69 recommendations for improvements received. The following are the top 10 bugs reported.
+
We will now attempt to find out the maximum number of concurrent users that the system is able to support, given the performance of a single concurrent user. If we assume the performance upper limit to be 2 seconds, there is an estimated number of 16 concurrent users which has performed 2 seconds of lower.
 +
*(e) Results of up to 300 concurrent users
 +
Performing further tests based on the markers of up to 300 concurrent users with 25-user intervals, for the purpose of intentionally simulating a load on the server to observe its deterioration in performance, if any.
 +
<br><br>
 +
Firstly, we have determined an arbitrary of acceptable performance of 3 seconds elapsed time for optimal user experience. Given this benchmark, we can observe that the system is capable of supporting up to 25 concurrent users.
 +
<br><br>
 +
Secondly, we have also observed peak deterioration in the trend line till approximately 75 users before the elapsed time for 100 concurrent users actually improved and reduced. The performance improved till approximately 200 concurrent users before it begins deterioration again. This is an unexpected and unusual observation for the team. With further consultation with technical experts, there are a variety of reasons which may lead to this anomaly. It could be due to simple outliers in our data, or it could be the use of swap memory on a Linux system (equivalent to pagination in Windows), and it could also be the nature of a switching between threading and non-threading operations of an apache web server.
 +
<br><br>
 +
Thirdly, the test crashed after approximately 260 concurrent connections. However, MySQL was still running but apache requires a reboot. Investigating into the log files, we understood that the MaxClient in our apache configuration is 256 and hence it is natural for it to fail. In addition, we found many forked and unreleased processes in the Linux server, which could have been further improved with system-level adjustments.
 +
 
 +
 
 +
<br><br><br>
 +
While the test shows that our system is capable of supporting 25 concurrent users within our acceptable performance, there are several variables in the circumstances that may limit the validity of this test result. Network performance is one key variable that can directly affect a user’s actual experience.
 +
Other limitations of the test include:
 +
* A standard MySQL is setup to handle 100 concurrent connections. For the purpose of test, we have lifted this limit.
 +
* Actual operations on the server consist of multiple concurrent connections on different pages, which make different SQL calls. The results above are limited because it tests on a single bottleneck controller, which locks a table and executes before it releases for the next process to perform. Hence, the elapsed time will be incurred exponentially which is rarely the case in real situations. As such, the above results may be pessimistic.
 +
* A server’s performance is also limited to a standard Apache non-threaded setup which serves 256 concurrent users using the mod_prefork module.
 +
Even though the test result is not conclusive, it is sufficient for us to have a reasonable assumption that the server and application is capable of scaling up to 50 concurrent users while maintaining the same performance, and handling a load of approximately 25 concurrent users before suffering sub-optimal performance deterioration.
 +
 
 +
=====Inter-Rater Reliability Test=====
 +
In order to evaluate the test results, we rely on the statistical model called Spearman’s Rank Correlation Coefficient (SRCC). The model is appended below:
 
<br>
 
<br>
[[Image: 6bituser-testing2bug.png|650px]]
+
[[Image: 6bituser-testing4formula.png]]
 
<br>
 
<br>
 +
In short, the SRCC model takes into account the rank rating from 2 different raters, represented by xi and yi respectively. In conventional Correlation of determination model, it takes in absolute data instead to find out statistical and data-driven correlation between 2 inputs.
 
<br>
 
<br>
The top 10 most frequently mentioned or important recommendations will be published and appended below.
 
 
<br>
 
<br>
[[Image: 6bituser-testing2recommendation.png]]
+
However, we are interested in the consensus between human judgment on the data, therefore SRCC is a suitable model of analysis.  
 
+
The SRCC model assumes that the rating scale is ordinal, or basically serial scale of rating. This assumption is aligned with our 1 – 10 rating score which is incremental and serial. Additionally, the SRCC model considers only relative position of the ratings. For example, (1, 2, 1, 3) is considered perfectly correlated with (2, 3, 2, 4). This is acceptable in our test because each rating in our test is distinct and exhaustive, where no repeats or unused score is allowed.
====Click Data Analysis(User Test 1 vs. User Test 2 – Forum Functions Only)====
+
<br> <br>
Additionally, click data of each test session has also been collected and analysed. They are also being compared to that with the results of User Test 1.<br>
+
The following is a sample of data tabulation in visual form.
[[Image: 6bituser-testing2cda1.png]]
 
 
<br>
 
<br>
The above box-plot represents 3 sets of data comparing the number of clicks per task, for discussion forum functions only. UT1 represents the results from User Test 1, UT2A represents the results from Group A testers of User Test 2, while UT2B represents the results from Group B testers of User Test 2. For the objective of fair comparisons, the results from User Test 2 has been drilled down to consists of data
+
[[Image: 6bituser-testing4z.png]]
<br><br>
 
The median number of clicks it takes per tester to accomplish a forum-related task in User Test 2 ranges from 1 to 3 clicks with 2 clicks being the median, a decrement from the median of 3 clicks, as well as a smaller variance as compared to User Test 1. Additionally, it can also be observed that there is no significant difference in the results between Group A and Group B users.
 
<br><br>
 
Preliminary, we can observe an improvement in the user experience for Group A users between the 2 tests. The improvement can be broadly attributed to the improvements made as well as the high learnability of the system interface design. However, this observation is not conclusive and more data is required.
 
 
<br>
 
<br>
[[Image: 6bituser-testing2cda2.png]]
+
With [[Image: 6bituser-testing4di2.png]] found, we can add them to find [[Image: 6bituser-testing4sumdi2.png]] 28. The value of n is 10. So these values can now be substituted back into the equation, we evaluate that p = 0.83.
 
<br>
 
<br>
Additional statistics were computer and observed that the median time spent to accomplish a forum-related task for Group A tester is 4 seconds, and Group B tester is 5 seconds. Again, this is a significant decrement in the time spent from User Test 1, where testers spent a median of 10 seconds between each task.
+
Based on the nature of correlation coefficient value p, it has a range between -1 to 1. Here are some of the characteristics of the p value.
Based on the finding, we can reasonably derive that there is an improved user experience between User Test 1 and User Test 2, attributing to the improvements made and high learnability of the system. In addition, the improved user experience is shared between both Group A and Group B users, possibly suggesting that the improved system does not require much training or high learning curve.
 
====Click Data Analysis (Group A vs. Group B – Marketplace Functions)====
 
Prospectively, we will also study the user experience difference between Group A and Group B testers on marketplace functions, based on the click data which measures the number of clicks involved per task and time taken in seconds between each task.
 
<br><br>
 
In the box-plot diagram above, UT2A refers to User Test 2 Group A testers, while UT2B refers to User Test 2 Group B testers. Each box-plot is represented by data of a specific group of users, and the results computed based on the number of clicks of time pertaining to forum or marketplace functions.
 
 
<br>
 
<br>
[[Image: 6bituser-testing2cda3.png]]
+
* A negative number suggests negative relationship,
 +
* A positive number suggests positive relationship,
 +
* A size of a positive or negative number closer to 1 or -1 suggests strength of relationship,
 +
* and A p value of 0 suggests an absolute lack of relationship in the input attributes.
 +
 
 +
It should be noted that if there is a strongly negative correlation, it may suggest a reversal in the order of recommendations in our system may eventually become an ideal model.
 
<br>
 
<br>
Comparing marketplace functions, both Group A and Group B testers have made a median of 2 clicks to accomplish each task. While Group B testers have a wider variance of clicks of up to 4 clicks, it can be broadly attributed to outliers, user experiments or some learning curve involved in getting used to the functions or interface objects placements.  
+
The accuracy sentimental analysis system is, in principle, how well it agrees with human judgment, despite the fact that the applied accuracy in our test is to ensure an optimal user experience and optimal sales exposure. <br>
 +
However, the limitation of the model is that human raters typically agree about 70% of the time and even if a system is 100% accurate by assumption, humans will still disagree about 30% of the time. Hence, for the purpose of our study, it may be suggested that an application with 100% human agreeableness can only be statistically justified to be 70% accurate and more sophisticated methods should be used to endorse the remaining 30% of accuracy. <br><br>
 +
After conducting the IRR Test, the results are tabulated and appended as follows.
 
<br>
 
<br>
[[Image: 6bituser-testing2cda4.png]]
+
[[Image: 6bituser-testing4results.png]]
 
<br>
 
<br>
The result when comparing the time taken is consistent with the preliminary conclusion when comparing the number of clicks per task. The median time taken for Group A and Group B testers for forum and marketplace functions are within the range of 4 to 5 seconds. The difference between the median records is insignificant.
+
Tester IDs that begins with M denotes male testers, while F denotes Female testers. The categorization of results will help us understand a basic level of stereotyping accuracy by gender. If need it, the results can be further drilled down to represent a more specific personality stereotype.
 
<br><br>
 
<br><br>
Overall, the result is consistent across forums and marketplace functions, between testers from both User Tests and test groups. It is also consistent with our earlier preliminary conclusion that the improvements made between the two User Tests have resulted in improved user experience, and there is a good level of learnability in the interface design.
+
Summarising the test results tabulated above, the average for median p-value for male is 0.8695 and median p-value for female is 0.8675. Based on the median values, there is no significant difference in accuracy for male and females. In addition, it is satisfactorily justified that there is a strong positive correlation between the recommendations of our system and the preferences of the testers, a sample from our potential users, based on their consensus.
<br><br>
+
While there are limitations in this test, where there are other externalities such as network performance, computing habits of testers and response time of each users, the macro results of the test provide a reasonable sampling on the objective of the test.
+
 
 +
 
 +
 
 +
 
  
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Milestones</font></div>=
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Milestones</font></div>=
  
[[Image:6-bit_milestones.png|750px]]
+
[[Image:6-bit_schedule.png|600px|center]]
  
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Schedule Metric</font></div>=
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Schedule Metric</font></div>=
<br>
+
Every iteration, schedule metric values are calculated to understand the project progress. They are broadly categorized into 5 different groups, where different action plans will apply. The acceptable range of value is within 90% to 110%, offering some buffer for natural inaccuracies between forecasting and execution.
[[Image:6-bit_ScheduleMetric.png|750px]]
+
<br><br>
<br>
+
Total Schedule Metric Value = Planned no. of days taken (P) / Actual no. of Days Assigned (A) x 100%
<br>
+
<br><br>
 +
[[Image:6-bit_schedulemetric.png|center|600px]]
  
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Bug Metric</font></div>=
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Bug Metric</font></div>=
 +
==Log==
 +
[[Image:6-bit_BugMetric.png|550px]]
 +
[[Image:6-bit_BugLog.png|600px]]
 +
<div style="text-align: right;">
 +
Bug Log: [https://docs.google.com/spreadsheet/ccc?key=0Aqd6IiSLbMwQdEI5TldNSEhpcVRjb1puYzU3ZHJOckE |<font color="#CD004E"><b>Click Here</b></font>]
 +
</div>
 +
<br>
 +
Bug logging for Chapalang! takes the direction of being practical and easily monitored from both macro and micro perspectives. Whenever a bug is found, a new row is entered with the following data:
 +
* Index number
 +
* Bug description
 +
* Found by
 +
* Found date
 +
* Expected solve-by date
 +
* Bug severity
 +
* Status
 +
* Owner of the function
 +
* Fixed date
 +
* Closed by (Tester)
 +
* Close date
 +
* Additional comments
 +
==Metric==
 +
Bugs are classified into 3 different categories of complexity, easy, moderate and hard. Each category is assigned points of 1, 5 and 10 respectively, lower is better.
 +
<br><br>
 +
Total Points for Each Iteration = Σ Points of the Bugs in each iteration
 
<br>
 
<br>
[[Image:6-bit_BugMetric.png|750px]]
+
[[Image:6-bit_BugMetricFormula.png|500px|center]]
Bug Log: [https://docs.google.com/spreadsheet/ccc?key=0Aqd6IiSLbMwQdEI5TldNSEhpcVRjb1puYzU3ZHJOckE |<font color="#CD004E"><b>Click Here</b></font>]
 
 
<br>
 
<br>
 +
After assigning each bug with points associated by its complexity, we will track the total bug scores at the end of each week before deciding if there should be any actions to be taken. The following is an action plan for our bug metric:
 
<br>
 
<br>
 +
[[Image:6-bit_BugMetricFormula2.png|500px|center]]
  
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Risk & Mitigation</font></div>=
 
=<div style="background: #FF0080; background: -webkit-gradient(linear, left top, left bottom, from(#FF0080), to(#F660AB)); padding: 12px; font-weight: bold; text-align: center "><font color="white" size="6" >Risk & Mitigation</font></div>=
 
<br>
 
<br>
[[Image:6-bit_RiskDiagram.png|750px]]
+
[[Image:6-bit_RiskDiagram.png|600px]]
 
<br>
 
<br>
 
<br>
 
<br>

Latest revision as of 15:13, 5 December 2012

6-bit logo.png
6-bit's Chapalang! is a social utility that connects people with friends and new friends
by offering a place for exchanging ideas and information on its public domain.
http://www.chapalang.com

Final Wikipage
Home Technical Overview Project Deliverables Project Management Learning Outcomes


Schedule

Planned Schedule

6-bit ScheduleDiagramOverview.png

































Meeting Minutes

Team Meeting Minutes

[|Meeting Minute 1] [|Meeting Minute 11] [|Meeting Minute 21]
[|Meeting Minute 2] [|Meeting Minute 12] [|Meeting Minute 22]
[|Meeting Minute 3] [|Meeting Minute 13] [|Meeting Minute 23]
[|Meeting Minute 4] [|Meeting Minute 14] [|Meeting Minute 24]
[|Meeting Minute 5] [|Meeting Minute 15] [|Meeting Minute 25]
[|Meeting Minute 6] [|Meeting Minute 16] [|Meeting Minute 26]
[|Meeting Minute 7] [|Meeting Minute 17] [|Meeting Minute 27]
[|Meeting Minute 8] [|Meeting Minute 18]
[|Meeting Minute 9] [|Meeting Minute 19]
[|Meeting Minute 10] [|Meeting Minute 20]

Supervisor Meeting Minutes

|Meeting Minute 1
|Meeting Minute 2
|Meeting Minute 3
|Meeting Minute 4
|Meeting Minute 5
|Meeting Minute 6
|Meeting Minute 7
|Meeting Minute 8
|Meeting Minute 9
|Meeting Minute 10
|Meeting Minute 11

Testing

Test Cases

|Test Cases

Test Plans

Test Plan 1 on 17 September 2012
Test Plan 2 on 28 September 2012
Test Plan 3 on 19 October 2012
Test Plan 4 on 4 November 2012

User Testing

User Testing 1 User Testing 2 User Testing 3 User Testing 4

User Testing 4

6bituser-testing4.png

Test Description

The objective of User Test 4 is on scalability, performance and analytics testing of the system. This is a 2-part test session, firstly on scalability and performance which does not require physical testers, and secondly on inter-rater reliability which requires rating judges.

The coverage of the scalability and performance test is focused on the bottleneck functions, which are the discussion forums and marketplace. The terms “performance” and “scalability” are commonly used interchangeably, but the two are distinct: performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.

Additionally, Inter-Rater Reliability Test is performed on the Personalized Dashboard to determine the concordance of the personalized results and actual personality of user stereotypes.

Testers Background

Scalability & Performance Testing

As the test does not require physical testers, the following appends the test environment.
6-bit ut4spec.png

Inter-Rater Reliability Test

Testers will assume the role of raters or judges for our Inter-Rater Reliability Test, represented by a total of 20 people with a 50-50 male is to female ratio. Testers are stratified from a diverse background, intended to represent personality stereotypes designed.

Personality stereotypes include characteristics such as gender, age group, education, personality traits, online activity, mobility and interested topics.

Test Groups

There is no test grouping employed in this test.

Test Procedures

Scalability & Performance Testing

The first part of the test is on scalability and performance testing. Chapalang Benchmark is configured to measure the time taken for each controller method before it starts and as soon as it ends. The results will be used to study the performance of the system and application at different scales of operations.

Additionally, custom application is used to perform a series of activities on a forum and marketplace, simulating an arbitrary number of concurrent users on the system, or the load.

Subsequently, we will study the benchmark timing data to understand the performance differences under different load.

Inter-Rater Reliability Test

An Inter-Rater Reliability (IRR) is the degree of agreement among raters. It gives a score of how much homogeneity or consensus there is in the ratings given by judges.

The first rater is the system itself, which will generate a list of 10 products and 10 discussion topics recommendations in descending order of relevance to a target user. Every recommendation is tied to a specific and distinct order number.

The second rater is a human tester, who will be provided with the same list of 10 products and 10 discussion topics generated by the system in relevance to him or herself. To mitigate the effects of Experimenter’s Bias, the order of each item is unordered and randomized without any intended logic. The second rater is expected to reorder the items according to his or her preferences in descending order.

Subsequently, we will make use of Spearman’s Rank Correlation Coefficient to understand the reliability of our personalized dashboard which features product and topic recommendations.

Test Instruction

Inter-Rater Reliability Test

This is a sample output of the first rater, for a product recommendation test.
6-bit ut4a.png
This is a sample input sheet for the second rater, on product recommendation test.
6-bit ut4b.png
In a descending order, 1 represents the most relevant item while 10 represents the least relevant item.

Test Results

Scalability & Performance Test

The terms “performance” and “scalability” are commonly used interchangeably, but the two are distinct: performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.

  • (a) Identification of activities

ChapalangA.png

  • (b) Identification of bottleneck

Based on the table above, we have identified that if a user experience is sequential, “Display single product” page has the longest elapsed time. At the same time, it has the longest elapsed time by itself, when compared to all the functions. Hence, we have identified this activity as the bottleneck.

  • (c) Measurement Performance

Referring to the same table above, we have determined the average of 5 tests on the “Display single product” activity for 1 concurrent user to have an elapsed time of 1.76997 seconds. Due to natural variation of the recorded timing, we will adopt 2 seconds as the performance upper limit.
ChapalangC.png

  • (d) Scalability

We will now attempt to find out the maximum number of concurrent users that the system is able to support, given the performance of a single concurrent user. If we assume the performance upper limit to be 2 seconds, there is an estimated number of 16 concurrent users which has performed 2 seconds of lower.

  • (e) Results of up to 300 concurrent users

Performing further tests based on the markers of up to 300 concurrent users with 25-user intervals, for the purpose of intentionally simulating a load on the server to observe its deterioration in performance, if any.

Firstly, we have determined an arbitrary of acceptable performance of 3 seconds elapsed time for optimal user experience. Given this benchmark, we can observe that the system is capable of supporting up to 25 concurrent users.

Secondly, we have also observed peak deterioration in the trend line till approximately 75 users before the elapsed time for 100 concurrent users actually improved and reduced. The performance improved till approximately 200 concurrent users before it begins deterioration again. This is an unexpected and unusual observation for the team. With further consultation with technical experts, there are a variety of reasons which may lead to this anomaly. It could be due to simple outliers in our data, or it could be the use of swap memory on a Linux system (equivalent to pagination in Windows), and it could also be the nature of a switching between threading and non-threading operations of an apache web server.

Thirdly, the test crashed after approximately 260 concurrent connections. However, MySQL was still running but apache requires a reboot. Investigating into the log files, we understood that the MaxClient in our apache configuration is 256 and hence it is natural for it to fail. In addition, we found many forked and unreleased processes in the Linux server, which could have been further improved with system-level adjustments.





While the test shows that our system is capable of supporting 25 concurrent users within our acceptable performance, there are several variables in the circumstances that may limit the validity of this test result. Network performance is one key variable that can directly affect a user’s actual experience. Other limitations of the test include:

  • A standard MySQL is setup to handle 100 concurrent connections. For the purpose of test, we have lifted this limit.
  • Actual operations on the server consist of multiple concurrent connections on different pages, which make different SQL calls. The results above are limited because it tests on a single bottleneck controller, which locks a table and executes before it releases for the next process to perform. Hence, the elapsed time will be incurred exponentially which is rarely the case in real situations. As such, the above results may be pessimistic.
  • A server’s performance is also limited to a standard Apache non-threaded setup which serves 256 concurrent users using the mod_prefork module.

Even though the test result is not conclusive, it is sufficient for us to have a reasonable assumption that the server and application is capable of scaling up to 50 concurrent users while maintaining the same performance, and handling a load of approximately 25 concurrent users before suffering sub-optimal performance deterioration.

Inter-Rater Reliability Test

In order to evaluate the test results, we rely on the statistical model called Spearman’s Rank Correlation Coefficient (SRCC). The model is appended below:
6bituser-testing4formula.png
In short, the SRCC model takes into account the rank rating from 2 different raters, represented by xi and yi respectively. In conventional Correlation of determination model, it takes in absolute data instead to find out statistical and data-driven correlation between 2 inputs.

However, we are interested in the consensus between human judgment on the data, therefore SRCC is a suitable model of analysis. The SRCC model assumes that the rating scale is ordinal, or basically serial scale of rating. This assumption is aligned with our 1 – 10 rating score which is incremental and serial. Additionally, the SRCC model considers only relative position of the ratings. For example, (1, 2, 1, 3) is considered perfectly correlated with (2, 3, 2, 4). This is acceptable in our test because each rating in our test is distinct and exhaustive, where no repeats or unused score is allowed.

The following is a sample of data tabulation in visual form.
6bituser-testing4z.png
With 6bituser-testing4di2.png found, we can add them to find 6bituser-testing4sumdi2.png 28. The value of n is 10. So these values can now be substituted back into the equation, we evaluate that p = 0.83.
Based on the nature of correlation coefficient value p, it has a range between -1 to 1. Here are some of the characteristics of the p value.

  • A negative number suggests negative relationship,
  • A positive number suggests positive relationship,
  • A size of a positive or negative number closer to 1 or -1 suggests strength of relationship,
  • and A p value of 0 suggests an absolute lack of relationship in the input attributes.

It should be noted that if there is a strongly negative correlation, it may suggest a reversal in the order of recommendations in our system may eventually become an ideal model.
The accuracy sentimental analysis system is, in principle, how well it agrees with human judgment, despite the fact that the applied accuracy in our test is to ensure an optimal user experience and optimal sales exposure.
However, the limitation of the model is that human raters typically agree about 70% of the time and even if a system is 100% accurate by assumption, humans will still disagree about 30% of the time. Hence, for the purpose of our study, it may be suggested that an application with 100% human agreeableness can only be statistically justified to be 70% accurate and more sophisticated methods should be used to endorse the remaining 30% of accuracy.

After conducting the IRR Test, the results are tabulated and appended as follows.
6bituser-testing4results.png
Tester IDs that begins with M denotes male testers, while F denotes Female testers. The categorization of results will help us understand a basic level of stereotyping accuracy by gender. If need it, the results can be further drilled down to represent a more specific personality stereotype.

Summarising the test results tabulated above, the average for median p-value for male is 0.8695 and median p-value for female is 0.8675. Based on the median values, there is no significant difference in accuracy for male and females. In addition, it is satisfactorily justified that there is a strong positive correlation between the recommendations of our system and the preferences of the testers, a sample from our potential users, based on their consensus.  



Milestones

6-bit schedule.png

Schedule Metric

Every iteration, schedule metric values are calculated to understand the project progress. They are broadly categorized into 5 different groups, where different action plans will apply. The acceptable range of value is within 90% to 110%, offering some buffer for natural inaccuracies between forecasting and execution.

Total Schedule Metric Value = Planned no. of days taken (P) / Actual no. of Days Assigned (A) x 100%

6-bit schedulemetric.png

Bug Metric

Log

6-bit BugMetric.png 6-bit BugLog.png

Bug Log: |Click Here


Bug logging for Chapalang! takes the direction of being practical and easily monitored from both macro and micro perspectives. Whenever a bug is found, a new row is entered with the following data:

  • Index number
  • Bug description
  • Found by
  • Found date
  • Expected solve-by date
  • Bug severity
  • Status
  • Owner of the function
  • Fixed date
  • Closed by (Tester)
  • Close date
  • Additional comments

Metric

Bugs are classified into 3 different categories of complexity, easy, moderate and hard. Each category is assigned points of 1, 5 and 10 respectively, lower is better.

Total Points for Each Iteration = Σ Points of the Bugs in each iteration

6-bit BugMetricFormula.png


After assigning each bug with points associated by its complexity, we will track the total bug scores at the end of each week before deciding if there should be any actions to be taken. The following is an action plan for our bug metric:

6-bit BugMetricFormula2.png

Risk & Mitigation


6-bit RiskDiagram.png