IS428 2016-17 Term1 Assign2 Gwendoline Tan Wan Xin
Contents
Theme Of Interest
The analysis will be conducted using the 101 Innovations – Research Tools Survey Dataset from Kaggle. With new tools and sites appearing to assist people in the entire research process, the types of tools/sites that people get to choose from are enormous. However, one could not have the ability to use all available tools at the same time. As such, I’m interested to identify the factors that could affect people’s choice of tools/sites in supporting their research process.
Analytical/Investigation Questions
The analysis attempts to answer one main question:
- What factors could affect people’s choice of tools/sites used throughout the research process?
However, as the analysis process evolves, two additional questions arises:
- In which phase of the research do people use the most number of tools?
- Are there different patterns of the type of tools used to search data based on different roles or disciplines or across different regions of the world?
The "Analysis & Visualization Construction Process" section will further elaborate on the evolution of questions from one to the other.
Selected Data Attributes for Analysis
The following data attributes are selected for analysis:
- Participant’s Role
- Country
- Discipline
- Date of Publication
- Tools/Sites to Search Literature/Data
Data Transformation Process
Before the analysis began, the dataset is analysed to identify its format and attributes. The dataset comes from a survey conducted across 20,663 researchers, librarians and other groups. In the survey, participants are presented with questions that provide them with the option to choose multiple answers. For example, one of the question in the survey asked participants to check all tools that they use during the process of researching for data. They could select multiple tools, such as a combination of Google Scholar and Web of Science. Each tool is a column in the dataset and if users indicate that they use the tool, the column will be filled. However, if they did not indicate it, the column will be left blank. An example of a row in the dataset is as follows: