Difference between revisions of "ANLY482 AY2016-17 T2 Group7: Project Overview"
Jx.wang.2013 (talk | contribs) |
Yx.lim.2013 (talk | contribs) m |
||
(16 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
<!-- Start Main Navigation Bar --> | <!-- Start Main Navigation Bar --> | ||
− | {|style="background-color:# | + | {|style="background-color:#307FBB; font-family:sans-serif; font-size:140%; text-align:center;" width="100%" cellspacing="0" | |
| style="border-bottom:7px solid #005192;" width="10%" | | | style="border-bottom:7px solid #005192;" width="10%" | | ||
[[ANLY482_AY2016-17_T2_Group7: Home | <font color="#bbdefb">Home</font>]] | [[ANLY482_AY2016-17_T2_Group7: Home | <font color="#bbdefb">Home</font>]] | ||
Line 11: | Line 11: | ||
| style="border-bottom:7px solid #005192;" width="20%" | | | style="border-bottom:7px solid #005192;" width="20%" | | ||
− | [[ANLY482_AY2016-17_T2_Group7: | + | [[ANLY482_AY2016-17_T2_Group7: Methodology | <font color="#bbdefb">Project Findings</font>]] |
| style="border-bottom:7px solid #005192;" width="20%" | | | style="border-bottom:7px solid #005192;" width="20%" | | ||
Line 24: | Line 24: | ||
<!-- Start Information --> | <!-- Start Information --> | ||
− | <div style="background:# | + | <div style="background:#307FBB; line-height:0.3em; font-family:sans-serif; font-size:120%; border-left:#bbdefb solid 15px;"><div style="border-left:#fff solid 5px; padding:15px;"><font color="#fff"><strong>Introduction</strong></font></div></div> |
<div style="color:#212121;"> | <div style="color:#212121;"> | ||
− | The | + | The project sponsor, Singapore Management University’s Library which consists of the Li Ka Shing Library and the Kwa Geok Choo Law Library, has an electronic search platform which offers a wide array of research resources through the EZproxy server. However, the organization requires more valuable insights about the students’ access to the Library’s online database through the EZproxy server. While the team of librarians had an exhaustive repository of EZproxy log data files, they lacked the time and resources to process the data for analysis to better optimize the User Experience. The main focus of this paper consists of our own solution developed in Python 3.0.1 using Jupyter Notebook which is able to process and clean the EZproxy data, and the processed data being tested against 2 test cases, namely the Data Analysis of the search count and the Text Analytics for 3 databases namely Euromonitor, Lawnet and Marketline Advantage. Following these test cases, the paper ends with the conclusion on what can be the future continuation of our project. |
+ | </div> | ||
+ | |||
+ | |||
+ | <div style="background:#307FBB; line-height:0.3em; font-family:sans-serif; font-size:120%; border-left:#bbdefb solid 15px;"><div style="border-left:#fff solid 5px; padding:15px;"><font color="#fff"><strong>Motivation</strong></font></div></div> | ||
+ | |||
+ | <div style="color:#212121;"> | ||
+ | Currently, there is no single platform where EZproxy log data can be processed into proper data frames or allow topics to be extracted. We felt that this could be a great opportunity as the log data files could be extracted and analyzed to provide valuable insights for the SMU library team so that the electronic resources database can be better optimized for its users. This motivation originates and resonates deeply with us as students who are active users of the SMU library electronic resources database. We personally use the electronic resources databases frequently to research for academic projects and often faced problems in finding the best and most optimized results on the most appropriate platform. Thus, for this project, we believe that preparing the raw log data onto a single platform, coupled with formulating possible directions for Data Analysis and Textual Analytics, could allow the SMU library team to work more efficiently on the data collected. This in turn could possibly add insights for future projects in optimizing the electronic resources database for current and future students of SMU. | ||
</div> | </div> | ||
− | <div style="background:# | + | <div style="background:#307FBB; line-height:0.3em; font-family:sans-serif; font-size:120%; border-left:#bbdefb solid 15px;"><div style="border-left:#fff solid 5px; padding:15px;"><font color="#fff"><strong>Objectives</strong></font></div></div> |
<div style="color:#212121;"> | <div style="color:#212121;"> | ||
− | This project aims to | + | This project aims to create a single platform which enables the preparation of EZproxy raw log data and extraction of search queries. This is done on Jupyter Notebook using Python 3.0.1 to offer a ‘plug & play’ solution for preparation of future data collected on EZproxy by the SMU library team. After which, the processed data would be tested against 3 test cases which covers the insights on search count and textual analytics on 2 electronic databases: Euromonitor, Lawnet and Marketline. |
− | |||
− | |||
− | |||
− | |||
</div> | </div> | ||
− | <div style="background:# | + | <div style="background:#307FBB; line-height:0.3em; font-family:sans-serif; font-size:120%; border-left:#bbdefb solid 15px;"><div style="border-left:#fff solid 5px; padding:15px;"><font color="#fff"><strong>Datasets</strong></font></div></div> |
<div style="color:#212121;"> | <div style="color:#212121;"> | ||
Line 49: | Line 52: | ||
</div> | </div> | ||
− | <div style="background:# | + | |
+ | <div style="background:#307FBB; line-height:0.3em; font-family:sans-serif; font-size:120%; border-left:#bbdefb solid 15px;"><div style="border-left:#fff solid 5px; padding:15px;"><font color="#fff"><strong>Data Dictionary</strong></font></div></div> | ||
<div style="color:#212121;"> | <div style="color:#212121;"> | ||
Proxy log data: | Proxy log data: | ||
− | <p>59.189.71.33 tDU1zb0CaV2B8qZ 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba [01/Jan/2016:00:01: | + | <p>59.189.71.33 tDU1zb0CaV2B8qZ 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba [01/Jan/2016:00:01:36 +0800] "GET http://heinonline.org:80/HOL/VMTP?base=js&handle=hein.journals/bclr54&div=62&collection=journals&input=(The%20Great%20Peace)&set_as_cursor=0&disp_num=1&viewurl=SearchVolumeSOLR%3Finput%3D%2528The%2520Great%2520Peace%2529%26div%3D62%26f_size%3D600%26num_results%3D10%26handle%3Dhein.journals%252Fbclr54%26collection%3Djournals%26set_as_cursor%3D0%26men_tab%3Dsrchresults%26terms%3D%2528The%2520Great%2520Peace%2529 HTTP/1.1" 200 2291 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"</p> |
Proxy Log Data: | Proxy Log Data: | ||
− | + | {| class="wikitable" | |
− | + | |- | |
− | + | ! Parameters | |
− | + | ! Description | |
− | + | ! Example | |
− | + | |- | |
+ | | libuser_ID | ||
+ | | Student ID hashed by the SMU Library so as to protect the identity of users | ||
+ | | 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba | ||
+ | |- | ||
+ | | libsession_ID | ||
+ | | Each session is identified by a unique ID, which corresponds to 1 session by a single user | ||
+ | | tDU1zb0CaV2B8qZ | ||
+ | |- | ||
+ | | search_database | ||
+ | | The e-resources database which the search query is searched on | ||
+ | | heinonline | ||
+ | |- | ||
+ | | timestamp | ||
+ | | Date and time when the search query is executed by the user in the format: DD/MMM/YYYY HH:MM:SS | ||
+ | | 01/Jan/2016:00:01:36 | ||
+ | |- | ||
+ | | search_query | ||
+ | | Search query that was being searched by the user | ||
+ | | (The%20Great%20Peace) | ||
+ | |} | ||
+ | Student Information Data: | ||
+ | {| class="wikitable" | ||
+ | |- | ||
+ | ! Parameters | ||
+ | ! Description | ||
+ | ! Example | ||
+ | |- | ||
+ | | libuser_ID | ||
+ | | Student ID hashed by the SMU Library so as to protect the identity of users | ||
+ | | 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba | ||
+ | |- | ||
+ | | school | ||
+ | | This indicates the school that the user is from | ||
+ | | School of Law | ||
+ | |- | ||
+ | | programme_type | ||
+ | | This indicates the specific programme the user is undertaking | ||
+ | | Bachelor of Laws | ||
+ | |- | ||
+ | | admission_year | ||
+ | | This indicates the year which the user is admitted into SMU | ||
+ | | AY_2013 | ||
+ | |- | ||
+ | | graduating_year | ||
+ | | This indicates the year which the user is graduated from SMU | ||
+ | | GY_2017 | ||
+ | |- | ||
+ | | education_level | ||
+ | | This indicates which level of education the user is in, typically Masters or Bachelors programme | ||
+ | | UNDERGRADUATE STUDENTS | ||
+ | |- | ||
+ | |} | ||
</div> | </div> | ||
+ | [[http://wiki.smu.edu.sg/ANLY482/ANLY482_AY2016-17_Term_2 <font color="Blue">Back To Project Page</font>]] | ||
<!-- End Information --> | <!-- End Information --> |
Latest revision as of 18:22, 20 April 2017
The project sponsor, Singapore Management University’s Library which consists of the Li Ka Shing Library and the Kwa Geok Choo Law Library, has an electronic search platform which offers a wide array of research resources through the EZproxy server. However, the organization requires more valuable insights about the students’ access to the Library’s online database through the EZproxy server. While the team of librarians had an exhaustive repository of EZproxy log data files, they lacked the time and resources to process the data for analysis to better optimize the User Experience. The main focus of this paper consists of our own solution developed in Python 3.0.1 using Jupyter Notebook which is able to process and clean the EZproxy data, and the processed data being tested against 2 test cases, namely the Data Analysis of the search count and the Text Analytics for 3 databases namely Euromonitor, Lawnet and Marketline Advantage. Following these test cases, the paper ends with the conclusion on what can be the future continuation of our project.
Currently, there is no single platform where EZproxy log data can be processed into proper data frames or allow topics to be extracted. We felt that this could be a great opportunity as the log data files could be extracted and analyzed to provide valuable insights for the SMU library team so that the electronic resources database can be better optimized for its users. This motivation originates and resonates deeply with us as students who are active users of the SMU library electronic resources database. We personally use the electronic resources databases frequently to research for academic projects and often faced problems in finding the best and most optimized results on the most appropriate platform. Thus, for this project, we believe that preparing the raw log data onto a single platform, coupled with formulating possible directions for Data Analysis and Textual Analytics, could allow the SMU library team to work more efficiently on the data collected. This in turn could possibly add insights for future projects in optimizing the electronic resources database for current and future students of SMU.
This project aims to create a single platform which enables the preparation of EZproxy raw log data and extraction of search queries. This is done on Jupyter Notebook using Python 3.0.1 to offer a ‘plug & play’ solution for preparation of future data collected on EZproxy by the SMU library team. After which, the processed data would be tested against 3 test cases which covers the insights on search count and textual analytics on 2 electronic databases: Euromonitor, Lawnet and Marketline.
Proxy log data & student information data (Names of Students are Hashed)
Proxy log data:
59.189.71.33 tDU1zb0CaV2B8qZ 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba [01/Jan/2016:00:01:36 +0800] "GET http://heinonline.org:80/HOL/VMTP?base=js&handle=hein.journals/bclr54&div=62&collection=journals&input=(The%20Great%20Peace)&set_as_cursor=0&disp_num=1&viewurl=SearchVolumeSOLR%3Finput%3D%2528The%2520Great%2520Peace%2529%26div%3D62%26f_size%3D600%26num_results%3D10%26handle%3Dhein.journals%252Fbclr54%26collection%3Djournals%26set_as_cursor%3D0%26men_tab%3Dsrchresults%26terms%3D%2528The%2520Great%2520Peace%2529 HTTP/1.1" 200 2291 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
Proxy Log Data:
Parameters | Description | Example |
---|---|---|
libuser_ID | Student ID hashed by the SMU Library so as to protect the identity of users | 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba |
libsession_ID | Each session is identified by a unique ID, which corresponds to 1 session by a single user | tDU1zb0CaV2B8qZ |
search_database | The e-resources database which the search query is searched on | heinonline |
timestamp | Date and time when the search query is executed by the user in the format: DD/MMM/YYYY HH:MM:SS | 01/Jan/2016:00:01:36 |
search_query | Search query that was being searched by the user | (The%20Great%20Peace) |
Student Information Data:
Parameters | Description | Example |
---|---|---|
libuser_ID | Student ID hashed by the SMU Library so as to protect the identity of users | 65ff93f70ca7ceaabcca62de3882ed1633bcd14ecdebbe95f9bd826bd68609ba |
school | This indicates the school that the user is from | School of Law |
programme_type | This indicates the specific programme the user is undertaking | Bachelor of Laws |
admission_year | This indicates the year which the user is admitted into SMU | AY_2013 |
graduating_year | This indicates the year which the user is graduated from SMU | GY_2017 |
education_level | This indicates which level of education the user is in, typically Masters or Bachelors programme | UNDERGRADUATE STUDENTS |