Difference between revisions of "ANLY482 AY2017-18 T2 Group 17 Findings Finals"
(Created page with "__NOEDITSECTION__ __NOTOC__ <!--Header Start--> {|style="background-color:#6A8D9D; color: #F5F5F5; padding: 10 0 10 0;" width="100%" cellspacing="0" cellpadding="0" valign="to...") |
|||
Line 62: | Line 62: | ||
<b><u> Analysis by distinct user utilization of books</u></b><br> | <b><u> Analysis by distinct user utilization of books</u></b><br> | ||
+ | [[File:distinctuser1.png|500px|center]] | ||
+ | The boxplot diagram shows the distribution of time spent across the different distinct user utilization group. From the boxplot and table above, it is observed that there are differences between the median time spent on each chapter across different groups. For books that are used by more distinct users (38), the median time spend seems to be much higher than that by 1 distinct user. To compare the statistical difference, we perform a Wilcoxon test on each pair and noted the following results, | ||
</div> | </div> |
Revision as of 15:37, 14 April 2018
Exploratory Data Analysis | Confirmatory Data Analysis |
---|
Wilcoxon test
In attempt to analyse user behaviour pattern, the time spent on each chapter is calculated using the proxy data. On the condition that ‘SessionID’ is similar for each row, the proxy for time spend on each chapter (t) is calculated using the following equation;
datetime(t-1) - datetime(t)
As time spent per chapter is a calculated field, prior information of the distribution is unknown. As such, a parametric test of means comparison between different strata will not be appropriate as certain assumption will have to be made on the distribution for instance if data follows a normal distribution. Therefore, a non-parametric test is performed on the data instead.
Since the data is highly skewed towards the left-hand side, a Wilcoxon test is used to analyse if there is a significant difference in time spent between each strata of interest. In Wilcoxon test, comparison is done using the medium of each group. Using medium as a benchmark will help minimize the biasness resulting from the skewed population. In the analysis the groups of interests are as follows;
1. Analysis by distinct user utilization of books
2. Analysis by chapter view and chapter downloads
3. Analysis by different user groups
Analysis by distinct user utilization of books
The boxplot diagram shows the distribution of time spent across the different distinct user utilization group. From the boxplot and table above, it is observed that there are differences between the median time spent on each chapter across different groups. For books that are used by more distinct users (38), the median time spend seems to be much higher than that by 1 distinct user. To compare the statistical difference, we perform a Wilcoxon test on each pair and noted the following results,