Difference between revisions of "IS480 Team wiki: 2012T1 6-bit Project Management UT4"
Line 86: | Line 86: | ||
|[[https://docs.google.com/document/d/1Qx4jdRZoE8hXv3u1R4dWke5b-kyMxIc7ASzPIh4ShRQ/edit |<font color="#CD004E"><b>Meeting Minute 6</b></font>]] | |[[https://docs.google.com/document/d/1Qx4jdRZoE8hXv3u1R4dWke5b-kyMxIc7ASzPIh4ShRQ/edit |<font color="#CD004E"><b>Meeting Minute 6</b></font>]] | ||
|[[https://docs.google.com/document/d/1Pcd6ZADI7ZEuXzq3exrss4nJQhotxAq7-ENfy_OVs6o/edit |<font color="#CD004E"><b>Meeting Minute 16</b></font>]] | |[[https://docs.google.com/document/d/1Pcd6ZADI7ZEuXzq3exrss4nJQhotxAq7-ENfy_OVs6o/edit |<font color="#CD004E"><b>Meeting Minute 16</b></font>]] | ||
+ | |[[https://docs.google.com/document/d/1NDCoF37ckdhbys57BLEXShcAcraxzOKpoNbFvpWWUQM/edit |<font color="#CD004E"><b>Meeting Minute 26</b></font>]] | ||
|- | |- | ||
|[[https://docs.google.com/document/d/1uSkv_cux0PqnRTKS-w72wQLgL1ZKaCpXhzDm5uamfMU/edit |<font color="#CD004E"><b>Meeting Minute 7</b></font>]] | |[[https://docs.google.com/document/d/1uSkv_cux0PqnRTKS-w72wQLgL1ZKaCpXhzDm5uamfMU/edit |<font color="#CD004E"><b>Meeting Minute 7</b></font>]] | ||
|[[https://docs.google.com/document/d/12jW-ycBkfivYL2pg7nrVvb9SF7_XmiglVGGnWAjp6uQ/edit |<font color="#CD004E"><b>Meeting Minute 17</b></font>]] | |[[https://docs.google.com/document/d/12jW-ycBkfivYL2pg7nrVvb9SF7_XmiglVGGnWAjp6uQ/edit |<font color="#CD004E"><b>Meeting Minute 17</b></font>]] | ||
+ | |[[https://docs.google.com/document/d/1Amo6YgCkVA3YD-iqyIxhyG8embuClo50KXsznVgoWQE/edit |<font color="#CD004E"><b>Meeting Minute 27</b></font>]] | ||
|- | |- | ||
|[[https://docs.google.com/document/d/1pTivyZ9D3jSU1Kf5VoW4-Xl_C9Abg-fL_6SI6GP_hz0/edit |<font color="#CD004E"><b>Meeting Minute 8</b></font>]] | |[[https://docs.google.com/document/d/1pTivyZ9D3jSU1Kf5VoW4-Xl_C9Abg-fL_6SI6GP_hz0/edit |<font color="#CD004E"><b>Meeting Minute 8</b></font>]] |
Revision as of 11:45, 4 December 2012
by offering a place for exchanging ideas and information on its public domain.
http://www.chapalang.com
Final Wikipage |
Home | Technical Overview | Project Deliverables | Project Management | Learning Outcomes |
Contents
Schedule
Planned Schedule
Meeting Minutes
Team Meeting Minutes
Supervisor Meeting Minutes
|Meeting Minute 1
|Meeting Minute 2
|Meeting Minute 3
|Meeting Minute 4
|Meeting Minute 5
|Meeting Minute 6
|Meeting Minute 7
|Meeting Minute 8
|Meeting Minute 9
|Meeting Minute 10
|Meeting Minute 11
Testing
Test Cases
Test Plans
Test Plan 1 on 17 September 2012
Test Plan 2 on 28 September 2012
Test Plan 3 on 19 October 2012
Test Plan 4 on 4 November 2012
User Testing
User Testing 1 | User Testing 2 | User Testing 3 | User Testing 4 |
User Testing 4
Test Description
The objective of User Test 4 is on scalability, performance and analytics testing of the system. This is a 2-part test session, firstly on scalability and performance which does not require physical testers, and secondly on inter-rater reliability which requires rating judges.
The coverage of the scalability and performance test is focused on the bottleneck functions, which are the discussion forums and marketplace. The terms “performance” and “scalability” are commonly used interchangeably, but the two are distinct: performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.
Additionally, Inter-Rater Reliability Test is performed on the Personalized Dashboard to determine the concordance of the personalized results and actual personality of user stereotypes.
Testers Background
Scalability & Performance Testing
As the test does not require physical testers, the following appends the test environment.
Inter-Rater Reliability Test
Testers will assume the role of raters or judges for our Inter-Rater Reliability Test, represented by a total of 20 people with a 50-50 male is to female ratio. Testers are stratified from a diverse background, intended to represent personality stereotypes designed.
Personality stereotypes include characteristics such as gender, age group, education, personality traits, online activity, mobility and interested topics.
Test Groups
There is no test grouping employed in this test.
Test Procedures
Scalability & Performance Testing
The first part of the test is on scalability and performance testing. Chapalang Benchmark is configured to measure the time taken for each controller method before it starts and as soon as it ends. The results will be used to study the performance of the system and application at different scales of operations.
Additionally, custom application is used to perform a series of activities on a forum and marketplace, simulating an arbitrary number of concurrent users on the system, or the load.
Subsequently, we will study the benchmark timing data to understand the performance differences under different load.
Inter-Rater Reliability Test
An Inter-Rater Reliability (IRR) is the degree of agreement among raters. It gives a score of how much homogeneity or consensus there is in the ratings given by judges.
The first rater is the system itself, which will generate a list of 10 products and 10 discussion topics recommendations in descending order of relevance to a target user. Every recommendation is tied to a specific and distinct order number.
The second rater is a human tester, who will be provided with the same list of 10 products and 10 discussion topics generated by the system in relevance to him or herself. To mitigate the effects of Experimenter’s Bias, the order of each item is unordered and randomized without any intended logic. The second rater is expected to reorder the items according to his or her preferences in descending order.
Subsequently, we will make use of Spearman’s Rank Correlation Coefficient to understand the reliability of our personalized dashboard which features product and topic recommendations.
Test Instruction
Inter-Rater Reliability Test
This is a sample output of the first rater, for a product recommendation test.
This is a sample input sheet for the second rater, on product recommendation test.
In a descending order, 1 represents the most relevant item while 10 represents the least relevant item.
Test Results
Scalability & Performance Test
The terms “performance” and “scalability” are commonly used interchangeably, but the two are distinct: performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load.
- (a) Identification of activities
- (b) Identification of bottleneck
- (c) Measurement of result of single user
- (d) Maximum number of concurrent user, while maintaining single user performance
- (e) Results of 1, 50, 100, 200, 400 users
Inter-Rater Reliability Test
In order to evaluate the test results, we rely on the statistical model called Spearman’s Rank Correlation Coefficient (SRCC). The model is appended below:
In short, the SRCC model takes into account the rank rating from 2 different raters, represented by xi and yi respectively. In conventional Correlation of determination model, it takes in absolute data instead to find out statistical and data-driven correlation between 2 inputs.
However, we are interested in the consensus between human judgment on the data, therefore SRCC is a suitable model of analysis.
The SRCC model assumes that the rating scale is ordinal, or basically serial scale of rating. This assumption is aligned with our 1 – 10 rating score which is incremental and serial. Additionally, the SRCC model considers only relative position of the ratings. For example, (1, 2, 1, 3) is considered perfectly correlated with (2, 3, 2, 4). This is acceptable in our test because each rating in our test is distinct and exhaustive, where no repeats or unused score is allowed.
The following is a sample of data tabulation in visual form.
With found, we can add them to find 28. The value of n is 10. So these values can now be substituted back into the equation, we evaluate that p = 0.83.
Based on the nature of correlation coefficient value p, it has a range between -1 to 1. Here are some of the characteristics of the p value.
- A negative number suggests negative relationship,
- A positive number suggests positive relationship,
- A size of a positive or negative number closer to 1 or -1 suggests strength of relationship,
- and A p value of 0 suggests an absolute lack of relationship in the input attributes.
It should be noted that if there is a strongly negative correlation, it may suggest a reversal in the order of recommendations in our system may eventually become an ideal model.
The accuracy sentimental analysis system is, in principle, how well it agrees with human judgment, despite the fact that the applied accuracy in our test is to ensure an optimal user experience and optimal sales exposure.
However, the limitation of the model is that human raters typically agree about 70% of the time and even if a system is 100% accurate by assumption, humans will still disagree about 30% of the time. Hence, for the purpose of our study, it may be suggested that an application with 100% human agreeableness can only be statistically justified to be 70% accurate and more sophisticated methods should be used to endorse the remaining 30% of accuracy.
After conducting the IRR Test, the results are tabulated and appended as follows.
Tester IDs that begins with M denotes male testers, while F denotes Female testers. The categorization of results will help us understand a basic level of stereotyping accuracy by gender. If need it, the results can be further drilled down to represent a more specific personality stereotype.
Summarising the test results tabulated above, the average for median p-value for male is 0.8695 and median p-value for female is 0.8675. Based on the median values, there is no significant difference in accuracy for male and females. In addition, it is satisfactorily justified that there is a strong positive correlation between the recommendations of our system and the preferences of the testers, a sample from our potential users, based on their consensus.
Milestones
Schedule Metric
Every iteration, schedule metric values are calculated to understand the project progress. They are broadly categorized into 5 different groups, where different action plans will apply. The acceptable range of value is within 90% to 110%, offering some buffer for natural inaccuracies between forecasting and execution.
Total Schedule Metric Value = Planned no. of days taken (P) / Actual no. of Days Assigned (A) x 100%
Bug Metric
Log
Bug Log: |Click Here
Bug logging for Chapalang! takes the direction of being practical and easily monitored from both macro and micro perspectives. Whenever a bug is found, a new row is entered with the following data:
- Index number
- Bug description
- Found by
- Found date
- Expected solve-by date
- Bug severity
- Status
- Owner of the function
- Fixed date
- Closed by (Tester)
- Close date
- Additional comments
Metric
Bugs are classified into 3 different categories of complexity, easy, moderate and hard. Each category is assigned points of 1, 5 and 10 respectively, lower is better.
Total Points for Each Iteration = Σ Points of the Bugs in each iteration
After assigning each bug with points associated by its complexity, we will track the total bug scores at the end of each week before deciding if there should be any actions to be taken. The following is an action plan for our bug metric: