Difference between revisions of "REO Project Findings EDA"

Latest revision as of 17:18, 16 April 2018

Distribution

Users

Looking at the distribution of the Member_Days (Figure 31), the median value is 100 days higher than the mean value. This is a left-tail distribution with a bunch of 3 – 4 years members (850 to 1100 Member_Days). These could be the members who have been extensively recruited during their first year of launch in 2014.

Subscription

Among the 20762 total users, there are 15323 (73.8%) free subscription plans users while there are 5439 (26.2%) paid subscription plans users. (Figure 32)

Sessions

By tabulating the data in the “user_id” view, the distribution can be generated for the number of sessions by each user. 118 outliers are excluded. The distribution is highly skewed to the right with the mean almost four times of the median. The N tabulated here is only about 62% of the total users as seen from the Users files. (Figure 33)

The nature of a property agent work is not necessary regular and may not follow the usual office hours (9 am to 5 pm), hence the system log could be view from the hourly level to identify any trends from the users’ usage log. Sessions were tabulated by users_id with hours as columns before stacking the columns to conduct analysis on hours. 1423 outliers were excluded in the analysis.

The distribution for each hour are found to be highly skewed to the right with the mean to be about twice the median. Using the Kruskai-Wallis Tests (Figure 34), which returns p-value <0.0001*, shows that the sessions count by the users for at least one hour period in the day is significantly different from another one hour period.

Steel-Dwass method is used instead of Wilcoxon Each Pair to control for overall alpha level [12]. Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (Figure 35), which returns p-value for each pair of sessions count by users between any two hours. Most of the hours pairs were found to be significantly different with p-value <0.01.

As the data spans over 6 months, there could be seasonal effect or trends that could affect the usage of the agents, hence the system log could be view in a monthly view to capture any trends or observations. Sessions were tabulated by users_id with months as columns before stacking the columns to conduct analysis on months. 712 outliers were excluded in the analysis.

The distribution for each month are found to be highly skewed to the right with the mean to be more than 4 times the median. Using the Kruskai-Wallis Tests (Figure 36), which returns p-value <0.0001*, shows that the sessions count by the users for at least one month is significantly different from another month.

Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method, which returns p-value for each pair of sessions count by users between two months. With the exceptions of December and September pair, and December and October pair, all the remaining pairs are found to be significantly different.

As a property agent work may not follow the usual 5-days work week, hence the system log could be view from the different day of the week to identify any trends from the users’ usage log. Sessions were tabulated by users_id with day_of_week as columns before stacking the columns to conduct analysis on day_of_week. 661 outliers were excluded in the analysis.

The distribution for each day of week are found to be highly skewed to the right with the mean to be more than 3 times the median. Using the Kruskai-Wallis Tests (Figure 38), which returns p-value <0.0001*, shows that the sessions count by the users for at least one day of week are significantly different from another day of the week.

Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (Figure 39), which returns p-value for each pair of sessions count by users between any two days of the week. Most pairs were found to be significantly different with p-value <0.01

Listings

By tabulating the data in the “user_id” view, the distribution can be generated for the number of listings by each user. 177 outliers are excluded. The distribution is skewed to the right with the mean greater than median. The N tabulated here is only about 83% of the users found in Sessions file.

As creating listings is one of the key activities an agent could do to gain more enquiries, this activity is similarly broken down by hours, months, and days of week just like what was done in Sessions. Listings were tabulated by users_id with hours as columns before stacking the columns to conduct analysis on hours. 660 outliers were excluded in the analysis.

The distribution for each hour are found to be rightly skewed with the mean greater than median. Other than hours 16 to 19, the median of the remaining hours are found to be 0 which implies the low count of listings during most hours. Using the Kruskai-Wallis Tests, which returns p-value <0.0001*, shows that the listings count by the users for at least one hour period is significantly different from another hour period.

Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method, which returns p-value for each pair of listing count by users between any two hours. Most pairs were found to be significantly different with p-value <0.01.

Listings were tabulated by users_id with months as columns before stacking the columns to conduct analysis on months. 424 outliers were excluded in the analysis.

The distribution for each month are found to be rightly skewed with the mean greater than the median. Using the Kruskai-Wallis Tests, which returns p-value <0.0001*, shows that the listings count by the users for at least one month is significantly different from another month.

Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (see below), which returns p-value for each pair of listing count by users between any two months. All of the pairs were found to be significantly different with p-value <0.01.

Listings were tabulated by users_id with day_of_week as columns before stacking the columns to conduct analysis on day_of_week. 684 outliers were excluded in the analysis.

The distribution for each day of week are found to be rightly skewed with the mean greater than median. Using the Kruskai-Wallis Tests, which returns p-value <0.0001*, shows that the listings count by the users for at least one day of week is significantly different from another day of the week.

Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (see below), which returns p-value for each pair of listing count by users between any two days of the week. Most pairs were found to be significantly different with p-value <0.01.

Listings have 2 source types as stated in the given data file. Agents can either post an organic listing which is original or a synced listing which is transported from an external portal. Listings were tabulated by users_id with source_type as columns before stacking the columns to conduct analysis on source types. 367 outliers were excluded in the analysis.

The distribution for each source_type is found to be highly skewed to the right with the mean to be greater than median. Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the listing count by the users for each source type are significantly different. Users have significantly more synced listings than organic listings.

Cobroke

By tabulating the data in the “user_id” view, the distribution can be generated for the number of cobroke requests sent by each user. 67 outliers are excluded. The distribution is skewed to the right with the mean greater than median. The N tabulated here is only about 16% of the users found in Sessions file.

As the user count for Cobroke is only a small percentage of the user count for Sessions, the breakdown of the data into other time series format would not be as meaningful. Furthermore, 99.co also feedback that their cobroke request is a function which has not always been properly utilised. Agents may spam that function as a mean to gain new networks rather than offering potential leads to the other agents.

Enquiries

By tabulating the data in the “user_id” view, the distribution can be generated for the number of enquiries received by each user. 101 outliers are excluded. The distribution is skewed to the right with the mean greater than median. The N tabulated here is only about 78% of the users found in Sessions file.

Enquiries received can be either from other agents or from consumers as stated in the given data file column “enquires_type”. Enquiries were tabulated by users_id with enquiries_type as columns before stacking the columns to conduct analysis on source types. 170 outliers were excluded in the analysis.

The distribution for each enquire type is found to be rightly skewed with the mean to be greater than median. Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the enquiries received by the users from agents or consumer are significantly different. Users have significantly more enquiries from consumers than agents.

Comparison by Subscription Plan

Based on the subscription model of 99.co, there are 2 available subscription plans for all agents, either free or paid. For a free plan user, they are only entitled to a maximum of 5 listings. As for a paid plan user, they get 100 listings and have their listings featured on the portal. Due to the difference in both the payment and the benefits, it would be crucial to identify any differences in the data between paid and free users.

Sessions by plan

Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the sessions count by the subscription plan of the users are significantly different. It also showed that paid users have significantly more sessions recorded than free users.

Listings types by plan

Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the organic listings count by the subscription plan of the users are significantly different. It also showed that paid users have significantly more organic listings than free users

Using the Wilcoxon Tests, 2-Sample Test which returns p-value of 0.69, shows that the synced listings count by the subscription plan of the users are not significantly different.

Cobroke by plan

Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the cobroke request count by the subscription plan of the users are significantly different. It also showed that paid users send significantly more cobroke requests than free users

Enquiries types by plan

Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the enquiries from agent count by the subscription plan of the users are significantly different. It also showed that paid users received significantly more enquiries from agents than free users

Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the enquiries from consumers count by the subscription plan of the users are significantly different. It also showed that paid users received significantly more enquiries from consumers than free users

The few testes above have shown that paid users generally perform more activities and received more enquiries than the free users. Paid users had incurred a cost for the access of the platform, it is assumed and proven that they will be more committed in using the platform than the free users

Rencency Analysis (December)

As the data spanned over 6 months, all the analysis was done using aggregated data. Through breaking the data down could give a deeper understanding on the usage rate per user per month. December’s data will be used, and it would also be most recent data collected. Each data files were filter to month with label 12 and were tabulated before merging together. All the missing values indicate that the user did not perform that activity hence were replaced with ‘0’. Users with 0 sessions count on the month of December were filtered out, therefore the following distribution generated will be usage rate of the users who have at least 1 session count on December. After filtering the 0 count sessions as well as the outliers, there are a total of 9570 users.

The distributions of each activity are highly skewed and there are a huge percentage of users with 0 counts in other activities recorded. In fact, the median for all other activities are all less than 1. It also implies that among the users who log onto the platform on the month of Decembers, most of them do not perform other activities. This observation suggest that the team may face some difficult when conducting other further analysis – Clustering analysis.

@@ Line 38: / Line 38: @@
 | style="padding: 0.25em; font-size: 90%; border-top: 1px solid #cccccc; border-left: 1px solid #cccccc; border-right: 1px solid #cccccc; border-bottom: 1px solid #cccccc; text-align:center; background-color: #404040; width:30%" | [[REO_Project_Findings_EDA | <font color="#ffffff">Exploratory Data Analysis</font>]]
-| style="padding: 0.25em; font-size: 90%; border-top: 1px solid #cccccc; border-left: 1px solid #cccccc; border-right: 1px solid #cccccc; border-bottom: 1px solid #cccccc; text-align:center; background-color: none; width:30%" | [[REO_Project_Findings_Cluster | <font color="#053B6B">Cluster Analysis</font>]]
+| style="padding: 0.25em; font-size: 90%; border-top: 1px solid #cccccc; border-left: 1px solid #cccccc; border-right: 1px solid #cccccc; border-bottom: 1px solid #cccccc; text-align:center; background-color: none; width:30%" | [[REO_Project_Findings_Cluster | <font color="#053B6B">Clustering Analysis</font>]]
 |}
+==Distribution==
+<b>Users</b><br>
+[[File:REO_fig31.png|300px]] <br>
+Looking at the distribution of the Member_Days (Figure 31), the median value is 100 days higher than the mean value. This is a left-tail distribution with a bunch of 3 – 4 years members (850 to 1100 Member_Days). These could be the members who have been extensively recruited during their first year of launch in 2014.
-== Multivariate Analysis of Variables==
+<br><b>Subscription</b><br>
-[[File:REO_Correlations.png]]
+[[File:REO_fig32.png|300px]] <br>
-<i>Figure 1: Correlation Matrix of Variables (left); Pearson Correlation of Variables (right)</i>
+Among the 20762 total users, there are 15323 (73.8%) free subscription plans users while there are 5439 (26.2%) paid subscription plans users. (Figure 32)
-Some of the variables are highly correlated with each other, notably the total number of listings and total number of sessions as well as the number of organic listings and total number of listings. The former suggests that the more frequent a user logs onto the portal, the more posting the user posts. The latter relationship suggests that most of the listings posted are organic compared to synced.
-�
+<br><b>Sessions</b><br>
-==Paid vs Free users==
+[[File:REO_fig33.png|300px]] <br>
-[[File:REO_Distribution_Session.png |300px ]][[File:REO_Stats_Listings.png |300 px]]
+By tabulating the data in the “user_id” view, the distribution can be generated for the number of sessions by each user. 118 outliers are excluded.
+The distribution is highly skewed to the right with the mean almost four times of the median. The N tabulated here is only about 62% of the total users as seen from the Users files. (Figure 33)
-The distribution of the total number of sessions as seen is positively skewed so the measures for the distribution regarding central tendency and dispersion are not very accurate. This is replicated across all variables created. As such, the team explores the option of splitting the users into paid and free. Based on our sponsor meeting and secondary research, we have also gathered that free users have restrictions regarding the number of listings they can post which could influence the number of sessions they logged onto the portal and number of enquiries they receive as well. There are 7536 free users and 5331 paid users in the dataset.
+The nature of a property agent work is not necessary regular and may not follow the usual office hours (9 am to 5 pm), hence the system log could be view from the hourly level to identify any trends from the users’ usage log. Sessions were tabulated by users_id with hours as columns before stacking the columns to conduct analysis on hours. 1423 outliers were excluded in the analysis.<br>
+[[File:REO_fig34.png|600px]] <br>
+The distribution for each hour are found to be highly skewed to the right with the mean to be about twice the median.  Using the Kruskai-Wallis Tests (Figure 34), which returns p-value <0.0001*, shows that the sessions count by the users for at least one hour period in the day is significantly different from another one hour period. <br>
+[[File:REO_fig35.png|600px]] <br>
+Steel-Dwass method is used instead of Wilcoxon Each Pair to control for overall alpha level [12]. Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (Figure 35), which returns p-value for each pair of sessions count by users between any two hours. Most of the hours pairs were found to be significantly different with p-value <0.01.
-[[File:REO_Distribution_paidfree.png]]
+As the data spans over 6 months, there could be seasonal effect or trends that could affect the usage of the agents, hence the system log could be view in a monthly view to capture any trends or observations. Sessions were tabulated by users_id with months as columns before stacking the columns to conduct analysis on months. 712 outliers were excluded in the analysis. <br>
+[[File:REO_fig36.png|600px]] <br>
+The distribution for each month are found to be highly skewed to the right with the mean to be more than 4 times the median.  Using the Kruskai-Wallis Tests (Figure 36), which returns p-value <0.0001*, shows that the sessions count by the users for at least one month is significantly different from another month. <br>
+[[File:REO_fig37.png|600px]] <br>
+Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method, which returns p-value for each pair of sessions count by users between two months. With the exceptions of December and September pair, and December and October pair, all the remaining pairs are found to be significantly different.
+As a property agent work may not follow the usual 5-days work week, hence the system log could be view from the different day of the week to identify any trends from the users’ usage log. Sessions were tabulated by users_id with day_of_week as columns before stacking the columns to conduct analysis on day_of_week.  661 outliers were excluded in the analysis.<br>
+[[File:REO_fig38.png|600px]] <br>
+The distribution for each day of week are found to be highly skewed to the right with the mean to be more than 3 times the median.  Using the Kruskai-Wallis Tests (Figure 38), which returns p-value <0.0001*, shows that the sessions count by the users for at least one day of week are significantly different from another day of the week. <br>
+[[File:REO_fig39.png|600px]] <br>
+Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (Figure 39), which returns p-value for each pair of sessions count by users between any two days of the week. Most pairs were found to be significantly different with p-value <0.01
-==Number of Days since Registration==
+<br><b>Listings</b> <br>
-[[File:REO_Distribution_numberofday.png]]
+[[File:REO_fig40.png|300px]] <br>
-The average number of days since registration for paid users is 823.2 which is around 2 years 3 months. It is also interesting to note that there is a spike of paid users (1110 users) joining between 975 days and 999 days. This could be matched with the period where there is publicity for 99.co due to the circulation of articles regarding 99.co receiving funding from Facebook co-founder, Sequoia Capital. Being associated with a well-known organisation could help them to gain users. On the other hand, the average number of days joined for free users is 812 which is around the same as paid users. The pattern seems similar as the paid users where the spike takes place between 975 days and 999 days.
+By tabulating the data in the “user_id” view, the distribution can be generated for the number of listings by each user. 177 outliers are excluded. The distribution is skewed to the right with the mean greater than median. The N tabulated here is only about 83% of the users found in Sessions file.
-==Key Activity Indicators==
+As creating listings is one of the key activities an agent could do to gain more enquiries, this activity is similarly broken down by hours, months, and days of week just like what was done in Sessions. Listings were tabulated by users_id with hours as columns before stacking the columns to conduct analysis on hours.  660 outliers were excluded in the analysis.
-[[File:REO_Key_Activity_Indicators.png]]
+<br>[[File:REO_fig41.png|600px]] <br>
-Figure 5: Activity Levels Across Variables for Paid and Free Users
+The distribution for each hour are found to be rightly skewed with the mean greater than median.  Other than hours 16 to 19, the median of the remaining hours are found to be 0 which implies the low count of listings during most hours. Using the Kruskai-Wallis Tests, which returns p-value <0.0001*, shows that the listings count by the users for at least one hour period is significantly different from another hour period.
-Looking at variables representing activity level of an user (i.e. number of cobrokes and enquiries received, the average number of listings posted as well as the average number of sessions posted), we can see the activity level of a paid user is higher than a free user across all variables.
+<br>[[File:REO_fig42.png|600px]] <br>
+Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method, which returns p-value for each pair of listing count by users between any two hours. Most pairs were found to be significantly different with p-value <0.01.
-�
+Listings were tabulated by users_id with months as columns before stacking the columns to conduct analysis on months. 424 outliers were excluded in the analysis.
-==Average Activity across Weekday and Weekend==
+<br>[[File:REO_fig43.png|600px]] <br>
-[[File:REO_Weekday_Weekend.png]]
+The distribution for each month are found to be rightly skewed with the mean greater than the median.  Using the Kruskai-Wallis Tests, which returns p-value <0.0001*, shows that the listings count by the users for at least one month is significantly different from another month.
-Figure 6: Activity Level (Number of Sessions and Listings) Across Weekday and Weekend for Paid and Free Users<br>
+<br>[[File:REO_fig44.png|600px]] <br>
-To compare the number of sessions on weekdays and weekends, we have applied different weights to the total number of session (0.5 for weekends and 0.2 for weekdays) because of the number of days in the week. There is higher usage on weekends compared to weekdays as seen from the graphic above for both listings and number of sessions logged. This shows that there could be a segment of paid users that are active only on weekends.
+Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (see below), which returns p-value for each pair of listing count by users between any two months. All of the pairs were found to be significantly different with p-value <0.01.
+Listings were tabulated by users_id with day_of_week as columns before stacking the columns to conduct analysis on day_of_week.  684 outliers were excluded in the analysis.
+<br>[[File:REO_fig45.png|600px]] <br>
+The distribution for each day of week are found to be rightly skewed with the mean greater than median.  Using the Kruskai-Wallis Tests, which returns p-value <0.0001*, shows that the listings count by the users for at least one day of week is significantly different from another day of the week.
+<br>[[File:REO_fig46.png|600px]] <br>
+Using the Nonparametric Comparisons for all Pairs using Steel-Dwass Method (see below), which returns p-value for each pair of listing count by users between any two days of the week. Most pairs were found to be significantly different with p-value <0.01.
-==Average Activity across Timeslots==
+Listings have 2 source types as stated in the given data file. Agents can either post an organic listing which is original or a synced listing which is transported from an external portal. Listings were tabulated by users_id with source_type as columns before stacking the columns to conduct analysis on source types.  367 outliers were excluded in the analysis.
-[[File:REO_Avg_act_across_timeslots.png]]
+<br>[[File:REO_fig47.png|600px]] <br>
-The team has previously established that the activity levels for paid and free users are different. As such, when we investigate the number of listings posted and sessions logged in 4 different timeslots, as seen below.
+The distribution for each source_type is found to be highly skewed to the right with the mean to be greater than median.  Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the listing count by the users for each source type are significantly different. Users have significantly more synced listings than organic listings.
-[[File:REO_timeslots.png]]
-As the nature of a property agent is flexible, the team hypothesised that the users would display an online pattern of appearing in the afternoon. This is because they are not restrained to a nine-to-six working hours. However, based on the graph above, we can conclude that the activity levels across timeslots 2,3 and 4 are rather similar.
-�
+<br><b>Cobroke</b>
-==Type of Listings==
+<br>[[File:REO_fig48.png|300px]] <br>
-[[File:REO_listing]]
+By tabulating the data in the “user_id” view, the distribution can be generated for the number of cobroke requests sent by each user. 67 outliers are excluded. The distribution is skewed to the right with the mean greater than median. The N tabulated here is only about 16% of the users found in Sessions file.
-As mentioned before, synced listings refer to listings that are imported to 99.co while organic listings refer to listings that are created on 99.co. Despite posting similar number of synced listings as free users, paid users post significantly more organic listings than the free users. This is beneficial for 99.co since it shows that paid users priortise using 99.co compared to other platforms. This is especially relevant in today’s age where users can be subscribing to multiple platforms to post their listings. It is also interesting to note that free users tend to post synced listings whereas paid users tend to post organic listings.
+As the user count for Cobroke is only a small percentage of the user count for Sessions, the breakdown of the data into other time series format would not be as meaningful. Furthermore, 99.co also feedback that their cobroke request is a function which has not always been properly utilised. Agents may spam that function as a mean to gain new networks rather than offering potential leads to the other agents.
+<br><b>Enquiries</b>
+<br>[[File:REO_fig49.png|300px]] <br>
+By tabulating the data in the “user_id” view, the distribution can be generated for the number of enquiries received by each user. 101 outliers are excluded. The distribution is skewed to the right with the mean greater than median. The N tabulated here is only about 78% of the users found in Sessions file.
+Enquiries received can be either from other agents or from consumers as stated in the given data file column “enquires_type”. Enquiries were tabulated by users_id with enquiries_type as columns before stacking the columns to conduct analysis on source types.  170 outliers were excluded in the analysis.
+<br>[[File:REO_fig50.png|600px]] <br>
+The distribution for each enquire type is found to be rightly skewed with the mean to be greater than median.  Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the enquiries received by the users from agents or consumer are significantly different. Users have significantly more enquiries from consumers than agents.
+==Comparison by Subscription Plan==
+Based on the subscription model of 99.co, there are 2 available subscription plans for all agents, either free or paid. For a free plan user, they are only entitled to a maximum of 5 listings. As for a paid plan user, they get 100 listings and have their listings featured on the portal. Due to the difference in both the payment and the benefits, it would be crucial to identify any differences in the data between paid and free users.
+<br><b>Sessions by plan</b>
+<br>[[File:REO_fig51.png|700px]] <br>
+Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the sessions count by the subscription plan of the users are significantly different. It also showed that paid users have significantly more sessions recorded than free users.
+<br><b>Listings types by plan</b>
+<br>[[File:REO_fig52.png|700px]] <br>
+Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the organic listings count by the subscription plan of the users are significantly different. It also showed that paid users have significantly more organic listings than free users
+<br>[[File:REO_fig53.png|700px]] <br>
+Using the Wilcoxon Tests, 2-Sample Test which returns p-value of 0.69, shows that the synced listings count by the subscription plan of the users are not significantly different.
+<br><b>Cobroke by plan</b>
+<br>[[File:REO_fig54.png|700px]] <br>
+Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the cobroke request count by the subscription plan of the users are significantly different. It also showed that paid users send significantly more cobroke requests than free users
+<br><b>Enquiries types by plan</b>
+<br>[[File:REO_fig55.png|700px]] <br>
+Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the enquiries from agent count by the subscription plan of the users are significantly different. It also showed that paid users received significantly more enquiries from agents than free users
+<br>[[File:REO_fig56.png|700px]] <br>
+Using the Wilcoxon Tests, 2-Sample Test which returns p-value <0.0001*, shows that the enquiries from consumers count by the subscription plan of the users are significantly different. It also showed that paid users received significantly more enquiries from consumers than free users
+The few testes above have shown that paid users generally perform more activities and received more enquiries than the free users. Paid users had incurred a cost for the access of the platform, it is assumed and proven that they will be more committed in using the platform than the free users
+==Rencency Analysis (December)==
+As the data spanned over 6 months, all the analysis was done using aggregated data. Through breaking the data down could give a deeper understanding on the usage rate per user per month. December’s data will be used, and it would also be most recent data collected.
+Each data files were filter to month with label 12 and were tabulated before merging together. All the missing values indicate that the user did not perform that activity hence were replaced with ‘0’. Users with 0 sessions count on the month of December were filtered out, therefore the following distribution generated will be usage rate of the users who have at least 1 session count on December. After filtering the 0 count sessions as well as the outliers, there are a total of 9570 users.
+<br>[[File:REO_fig57.png|900px]] <br>
+The distributions of each activity are highly skewed and there are a huge percentage of users with 0 counts in other activities recorded. In fact, the median for all other activities are all less than 1. It also implies that among the users who log onto the platform on the month of Decembers, most of them do not perform other activities. This observation suggest that the team may face some difficult when conducting other further analysis – Clustering analysis.

Difference between revisions of "REO Project Findings EDA"

Latest revision as of 17:18, 16 April 2018

Distribution

Comparison by Subscription Plan

Rencency Analysis (December)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools