ANLY482 AY2016-17 T2 Group7: Project Findings
Data Preparation | Analysis |
---|
Chart 1: Overall Search Counts by Month
Chart 2: User Group Search Counts
Chart 3: Search Count of 'Others'
Subject Matter: | Awareness of the number of searches throughout the year |
Thought Process: | We want to understand the number of searches throughout the year and see if there are any observable trends. From our personal anecdotes as Seniors in SMU, we believe there will be a spike in searches especially during the project weeks (Week 4 onwards). Thus, we initiated the break down of the number of searches by months, to have a better look at where the peak periods are. |
Analysis: | There is great variation in the number of searches across the span of a year, and these searches on the Ezproxy are contributed by students - Undergraduate, Masters, PhD and others (international exchange, local exchange, visiting students). As the users of the Ezproxy site are students of Singapore Management University, the spike in the number of searches can be seen during the months of the regular semesters (Term 1 and 2) - January to March and Mid-August to November.
Identifying the start and end of regular terms just by looking at the number of searches In Chart 1, we could potentially identify the start and end of the 2 regular terms just by observing where the number of searches experience a gradual dip. The overall trend of the number of searches forms the shape of a jagged mountain for both terms, thus the start and ends of the mountains fall around the start and ends of the terms.
|
Chart 4: Dip in Weekends for Term 2: Jan-March 2016
Chart 5: Dip in Weekends for Term 1: Aug-Nov 2016
Chart 6: Search Count by Days for Term 2: Jan-March 2016
Chart 7: Search Count by Days for Term 1: Aug-Nov 2016
Chart 7: Search Count by Days for Term 1: Aug-Nov 2016
Chart 8: Chinese New Year in 2016
Subject Matter: | Understanding the students’ behaviours in searches during the semesters. |
Thought Process: | In SMU, 1 of the common perception is that SMU students study all day everyday, even the weekends. Thus, we want to see if this perception of SMU students is indeed true.
Next, we want to see if there is a surge in searches when the semester reaches the week where group projects are released. This is because projects in SMU largely requires the students to perform desk research and 1 of the many places to do so is through the SMU library’s EzProxy e-resources database. |
Analysis: | Dip in Weekend Searches
From the Chart 3 & 4,, we noticed that there is a dip in the number of searches performed every weekend (Saturdays & Sundays). For example, there is a plunge in the number of searches on 16th of January (Saturday). Thus, this may show that the perception of SMU students studying all day everyday and even the weekends may be untrue. Or it could be that SMU students generally do not perform as many searches for their research on weekends. Research on Recess Week? However, upon further contrasting of the trends in Charts 5 and 6 side-by-side, we discovered that there is always a spike in the number of searches on the first day of Recess week in both Terms. For Term 2, Recess week starts on 22 Feb where there is a visible spike from 21 Feb to 22 Feb. And for Term 1, Recess week starts on 3 Oct where there is also a visible spike from 2 Oct to 3 Oct (this spike happens to be the highest in the entire Term 1). And in both cases, the number of searches decreases gradually until the end of the Recess Week (28 Feb for Term 2 and 9 Oct for Term 1 respectively). This is a very interesting discovery as it potentially shows that students typically start their research on the first day of Recess week, thereby contributing to the spike in number of searches, and then as the Recess week comes to a close, the amount of research students performed becomes lesser too. Highest Spike in Term 2: 11 Feb, end of CNY? From Chart 7, we observed the highest spike in Term 2, which takes place on 11 Feb 2016. We could not find a possible explanation for this other than it being the end of the Chinese New Year holidays (9 & 10 Feb 2016) and students may be guilty from enjoying their CNY a tad too much and thus began to do their research on the library databases on 11 Feb.
|
Chart 9: Percentage of Search Counts by Degrees in Weekends for Term 2: Jan-March 2016
Chart 10: Percentage of Search Counts by Degrees in Weekends for Term 1: Sep to Nov 2016
Subject Matter: | Understanding the percentage of searches contributed by students across their Degrees during weekends |
Thought Process: | We want to dive deeper into the analysis of weekend searches and find out who are the ones still contributing to it, despite the dip in number of weekend searches. |
Analysis: | In Chart 9, we noticed that 56.75% of searches were done by students enrolled in the Bachelor of Laws programme, which occupies a majority of the total number of searches performed on weekends. Additionally, 16.91% of searches were done by students from Bachelor of Business Management and 7.36% from the Juris Doctor programme.
One of the possible conclusions from this observation is that students enrolled in the Law field (Bachelor of Laws & Juris Doctor programme) do not typically stop performing searches and/or stop researching simply because it is the weekends. In addition to that, students in the Bachelor of Business Management programme contributes significantly to the number of searches on weekends too, perhaps due to the nature of the programme which is research-intensive. This is in contrast to students from other non-research intensive programmes such as Bachelor of Science (Information Systems) at 1.64% of total number of searches. In Chart 10, we can observe that the abovementioned trend is consistent for students in the Bachelor of Laws, Bachelor of Business Management and Juris Doctor. Thus, our trend analysis holds consistent for both Terms 1 and 2.
|
Chart 11: Search Count by Schools for 2016
Chart 12: Percentage of Search Counts by Schools & Months
Subject Matter: | Understanding the percentage of searches contributed by students across their Degrees for 2016 |
Thought Process: | After learning about the different spikes as a whole, we then consider the possibility of some schools being greater contributors to the searches. The main contributors to these searches should most likely be similar to that of the weekends, and thus, we broke down the search count by schools over the months to analyze the general trend of searches across each degree. |
Analysis: | Similar to the weekend searches, we observed that the top 3 percentages of searches still come from the students enrolled in the Bachelor of Laws, Bachelor of Business Management and Juris Doctor.
In Chart 12, we observed that even though Term 2 and Term 1 starts in Aug and Jan respectively, the total number of searches in Jan is significantly higher than that of Aug. This is where additional information such as the exact dates of the start of Terms 2 & 1 comes in handy; Term 2 starts on 4 Jan 2016, thereby occupying the entire month of Jan whereby Term 1 starts only on 15 Aug 2016, thereby occupying only half of the month of Aug. Without such additional information, analysts may conclude that perhaps students in Term 2 are more hardworking than in Term 1 in Terms of the number of searches they perform.
|
Chart 13: Usage of Database by Schools
With reference to Chart 13, we have selected 2 databases, Lawnet and Euromonitor, to focus on for this interim phase. This is due to the fact that these 2 databases are the most commonly used amongst the Law and Business students respectively, as these 2 schools are the 2 biggest contributors to the searches during the Term.
From the following actions applied to these 2 databases, we could then repeat these steps for the rest of the databases.
Chart 14: Text analytics data preparation
Firstly, we need to format the search queries to lowercase form for standardization purposes. We do that by using Tableau’s ‘LOWER()’ function, filtering out two data sets: euromonitor’s data being “euromonitor_text_data” and lawnet’s data being “lawnet_text_data”.
After which we use SAS Enterprise Miner 14.1 to carry out text analytics. We import ‘euromonitor_text_data’ and ‘lawnet_text_data’ respectively by using the File Import function and running though the text mining process in Chart 15: Text mining process.
Chart 16: Text Parsing Configuration
We configure text parsing so that Parts of Speech such as ‘Aux’, ‘Conj’, ‘Det’, ‘Interj’, ‘Part’, ‘Prep’, ‘Pron’ and Types of Attributes including ‘Num’ and ‘Punct’ are all ignored.
Euromonitor
Chart 17: Search Count in Euromonitor by Schools & Admission Years
Subject Matter: | Contrasting BBM against Bsc(IS) users across Admission Years |
---|---|
Thought Process: | As our team consists of BBM and Bsc(IS) students, we discussed among ourselves and then with our peers of our faculties about how often we use Euromonitor in our research. Amidst our sharings, we found out that more often than not, BSc(IS) students do not use Euromonitor as much as their BBM counterparts. However, some of the BSc(IS) students shared that they have used Euromonitor rather intensively in their 1st - 2nd years, mainly for researching on the University Core modules which they have to take (eg. TWC, BGS).
Thus, we attempted to verify this discussion through the analysis of the data. |
Analysis: | From Chart 16, we observe that the number of searches performed by BSc(IS) users across all admission years are significantly lower than their BBM counterparts. Thus, this could possibly verify our thoughts that BSc(IS) users indeed use Euromonitor for research lesser than their BBM counterparts.
Most interestingly, BSc(IS) users in AY_2016 have performed the most number of searches as compared to their faculty users from the other admission years. The first year of the SMU BSc(IS) curriculum usually consists of University Core Modules such as BGS (Business, Government & Society) and TWC (Technology and World Change) which are by nature, research-intensive modules. Thus, it would be more probable that BSc(IS) users in AY_2016, meaning they are in their first year in 2016, are performing such high number of searches because they are enrolled in such research-intensive modules. The number of research-intensive modules in the curriculum decreases significantly as the typical BSc(IS) user moves into his/her 2nd year and thereafter. This could be shown by the low number of searches performed by BSc(IS) users in AY_2015 (1st/2nd Year in 2016), AY_2014 (2nd/3rd Year in 2016) and AY_2013 (3rd/4th Year in 2016. Contrasting with BBM users, the number of searches across all academic years remains high. This could be due to the nature of the BBM curriculum which consists of research-intensive modules throughout. |
Among 30720 cases, 8257 (26.88%) are dropped after parsing the data.
In addition to parsing of the data, we noticed that the Term “singapore” has the greatest frequency of 2175, followed by the Terms “consumer” and “tourism”.
From the graph above we noticed that ‘singapore’ is linked to ‘hot drinks’, ‘hot’, ‘drink’, ‘singapore travel’, ‘consumer lifestyle’, ‘lifestyle’, ‘singapore consumer’ and ‘singapore airline’
From the graph above we noticed that ‘consumer’ is linked to ‘consumer health’, ‘consumer foodservice’, ‘electronics’, ‘’, ‘singapore consumer’, ‘trend’, ‘consumer electronics’ and ‘global’.
From the graph above, we noticed ‘tourism’ is linked to ‘medical’, ‘sport’, ‘cultural tourism’, ‘wellness’, ‘medical tourism’, ‘cultural’, ‘wellness tourism’ and ‘travel’.
This is the result shown by function Text Topic.
From the graph above, results from Text Topic function shows that “singapore”, “retail”, “beer”, “milk” and “juice” are of the same topic, “medical”, “tourism”, “technology” and “health” are of the same topic, and “lifestyle”, “consumer”, “singapore”, “japan” are of the same topic.
Lawnet
Among 172363 cases, 36066 are dropped (20.92%). As compared to euromonitor, lawnet has a larger amount of searches.
The most popular search Terms are ‘slr’, ‘ltd’ followed by ‘pte’. These stands for Singapore Law Review, Ltd as in Pte Ltd and Pte as in Pte Ltd respectively. This means that students search for Singapore Law Review a lot.
From the graph above, we noticed that ‘slr’ which stands for Singapore Law Review, is linked to words which are presumably names such as ‘chum tat’, ‘ngiam’, ‘chiew’, ‘chiew hock’, ‘chum’ and the time period ‘1974-1976’. These could possibly tell us the popular cases associated with the Singapore Law Review and the time period for which cases took place in.
From the graph above, we noticed that ‘singapore’ is linked to words such as ‘overseas enterprise’, ‘pte’ (presumably pte in Pte Ltd, the short form for Private Limited), ‘global singapore’, ‘southeast’, ‘finance’, ‘institutional’, ‘law’, ‘ltd’ (presumably Ltd in Pte Ltd) and ‘development bank’.
This is the result shown by function Text Topic.
From the table above, results from Text Topic function shows that “slr”, “wlr”, “teck”, “attorney-general” are of the same topic, this is possibly because people who searched for singapore law review (slr), also searched for world law review (wlr) while the attorney-general is the legal advisor to the government and “teck” could be someone’s name. “sghc”, “bin”, “rahmart” and “iskandar” are of the same topic as “sghc” stands for singapore high court.
The name ‘Rahmart’, ‘bin’ and ‘Iskandar’ is an interesting search Term whereby it features a former policeman of the name ‘Iskandar bin Rahmat’ who was charged for committing double murder at Kovan MRT in 2013. This is a widely known local criminal case which most probably is being used as a prime example of criminal cases in the SMU Bachelor of Laws Curriculum, thereby explaining the popularity of these keywords.
More interestingly, the ‘Rahmart’ is in fact a misspelling of the name ‘Rahmat’. This could possibly indicate that majority of the searches for ‘Rahmart’ were performed by users who are not of the Malay descent. Or this could possibly be due to a misspelling from the course material that was provided to the users, presumably Law students.
When we searched “rahmart” or “iskandar bin rahmart”in lawnet we could not find anything as the correct name of the case should be “rahmat”, but SAS Text Miner grouped “rahmart” and “iskandar”, “bin” together so we speculate that many students searched for “iskandar bin rahmart” and found nothing. A recommendation system which will automatically link “rahmart” and the “iskandar bin rahmat” case would be welcomed.
Excessive System Logging of Search Queries
In our EDA, we discovered that there exists a problem of excessive system logging of search queries. We have found 2 examples of such occurrence:
Time | Search Query Logged |
---|---|
12:55:02PM | Re |
12:55:04PM | Resol |
12:55:06PM | Resoluti |
12:55:08PM | Resolution |
Example 1: Log data is logged every 2 second
Key Press | Search Query Logged |
---|---|
1st Key Press: T | T |
2nd Key Press: r | Tr |
3rd Key Press: u | Tru |
4th Key Press: m | Trum |
5th Key Press: p | Trump |
Example 2: Log Data is logged with every key press
In our analysis, these presents a problem to us in the form of how do we determine which is the actual search query that a User is searching for? As illustrated by the example by ‘User A’ below, in a single session logged by ‘User A’, there may be multiple search queries searched by users. In this case, we used 3 search queries as an example. The challenge to us is to sieve out which are the search queries (eg. Jack, Singapore) that User A is searching for when it is not the end of the session for him.
Eg. List of 3 Search Queries being logged with every key press by User A:
[ Start of Session for User A ]
Re
Regu
Regula
Regulati
Regulation
Ja
Jack
Si
Sing
Singap
Singapor
Singapore
[ End of Session for User A ]
We decided that this shortfall not only affects us as project analysts, but to other stakeholders as well.
Interim Gap Analysis by Stakeholders
The Actual Performance in this case would be if everything remains status quo, meaning the problem of multiple logging of search queries would persist.
The Desired Performance in this case would be if this problem does not exist and 1 line of logging is created for 1 full, actual search query.
Stakeholders Involved/Impact of Performance | Actual Performance | Desired Performance |
---|---|---|
Our Team as Project Analysts | Presents a problem whereby we need to find out how to determine which line of search query logged is the actual, full search query by end-users so that we can begin the analysis from there | Every line of search query would be the actual, full search query by end-users so we need not clean the dataset even further, thereby reducing the amount of work we have to do and saves time which can be better spent in progressing the analysis |
End-Users of Library’s e-Resources | Presents a problem whereby end-users may experience unnecessary lag in obtaining the results from their search queries | No lag when completing searches would mean a better overall user experience. Furthermore, such seamless experience would mean that the system do not stand in the way of the intensive research that students have to do in their course of study, but rather serving as an effective aid to them. |
Library Team as Project Sponsors for this Practicum | Presents a problem whereby the project sponsors run a risk of the project analysts not being able to sieve out the line of search queries which are full, actual and useful to determine the accurate search queries that users are actually searching for | No such problem as whatever the search query is, it would be logged as exactly that. |
Library Team in charge of ensuring that the EzProxy server serves the users in the best possible way | Wastage of resources and can potentially slow down the servers when multiple logs are triggered and recorded before searches are completed. This utilizes processing RAM of the server unnecessarily and takes up precious memory space when being recorded as a line of search query. | There would be no wastage of server’s processing RAM and memory space as 1 line of logging would be created for 1 full, actual search query entered by users. |