Difference between revisions of "IS428 2016-17 Term1 Assign2 Rachel Tay"
(Created page with "==Theme of Interest== I have decided to look into the Innovation research tools. It has come to light that there are many new tools available now that has been developed to su...") |
|||
Line 1: | Line 1: | ||
==Theme of Interest== | ==Theme of Interest== | ||
− | I have decided to look into the Innovation research tools. | + | I have decided to look into the Innovation research tools. With advances in technology and changes in landscapes, many new research tools have henceforth became available that has been developed to support scholarly communication in all phases of the research workflow. A survey was then created in hopes to bridge this gap in information. |
+ | ==Formulation of Questions for Investigation== | ||
Being a student myself, I feel that, out of the 3 topics, this topic is most relevant and interesting. The question from the top of my head was "What are the factors that determine the respondents’ choice of research tools?" | Being a student myself, I feel that, out of the 3 topics, this topic is most relevant and interesting. The question from the top of my head was "What are the factors that determine the respondents’ choice of research tools?" | ||
Revision as of 20:19, 25 September 2016
Contents
Theme of Interest
I have decided to look into the Innovation research tools. With advances in technology and changes in landscapes, many new research tools have henceforth became available that has been developed to support scholarly communication in all phases of the research workflow. A survey was then created in hopes to bridge this gap in information.
Formulation of Questions for Investigation
Being a student myself, I feel that, out of the 3 topics, this topic is most relevant and interesting. The question from the top of my head was "What are the factors that determine the respondents’ choice of research tools?"
Given that there are too many different variables provided in the survey, I narrowed the topic down to only consider the attributes of the users. At this point, my question is "How do the attributes of the users affect the respondents' choice of research tools?".
When taking into account that different users have different requirements when performing different research activities, the question was tweaked to that of the following: "How do the attributes of the users affect respondents' choice of research tools when performing each of the research activities?"
The Dataset
Distribution of Respondents
The attributes of the users that I am interested in and are provided are, specifically, their roles and disciplines. A large bulk of the respondents were from Social Science and Economics, with the other disciplines well represented except Law. As for the research roles, a large bulk are from inside academia. I believe that different roles and disciplines have different requirements and hence might have different preferences over the tools that they utilize more often. Therefore, I have decided to pin these 2 attributes down for investigation.
On top of these, I also took interest in the first year of publication section. This piece of information was collected on the basis that it represented the respondents’ career stage. There was a fairly even distribution of respondents across the various career stages.
Limitations of Dataset
There is a relatively unfair mix of respondents when it comes to comparing the research role field where 41.7% of the respondents are either professors, associate professors or assistant professors and lacking in respondents from the roles of publisher and industry/ government. This could lead to observed and expected biases should there be comparisons across the different research roles.
Dataset Transformation and Rearrangement Process
JMP was utilized throughout this whole process.
Having already determined the details of the data that I require, like the respondents' roles and disciplines, for example, I delete the irrelevant columns to avoid confusion and potential mistakes.
Using the Column Viewer functionality in JMP, I viewed the summary statistics and properties of all the remaining columns. The resultant information table was then sorted according to N categories.
INSERT PICTURE
Through this step, I found out the columns then I can group together and categorise under one singular column. There are 2 separate relevant categories: "Discipline" and "Tools". The columns can be consolidated under their respective categories using the "Stack" function in JMP.
With this step, I also discovered that the columns that has too high a value under "N categories" and "N Missing". These columns include survey questions that allows for user input answers. In order to keep the data consistent, I chose to just represent them under the "Others" group.
The following stacking illustrations will only show the steps to grouping the relevant columns under "Discipline" category. The picture below shows the inputs of the Stack Window.
INSERT PICTURE
After stacking, the data from all 7 columns will then be consolidated into only 2. From this table, you can obviously see each respondent and the disciplines they are in. Given that each respondent can be in multiple disciplines, the respondent ID may repeat.
From this table, one can also tell that there are many null values in the Data column; all of which are meaningless and to be deleted. I go to the Data Filter tab and add Data as my column of interest and delete all of the rows that are then selected.
INSERT PICTURE
After doing the same steps for the "Tools" column, the only other relevant information that are missing are the research functions of the tools and its respective research activities. Going through the 567,166 rows of data and inputting the relevant information individually is too tedious and time-consuming. Hence, I have decided to use the information provided in the “survey_cleaned_variable_list.csv” document. To do this, I need 2 new columns inside this document.
INSERT PICTURE
For research function, I referred to the survey questions to determine the specific function the tools are used for. For research activity, I referred to the following website (http://dashboard101innovations.silk.co/page/Research-activities) and the function column inputs to determine the specific activity that the function of the tools fall under. With these information, I filled down the relationship of each variable name (tools) to their functions and respective activity.
The irrelevant columns, namely the column references, survey question numbers, input type and if multiple questions are allowed, were then deleted.
Once completed, the 2 data tables should be joined by matching the tools column.
INSERT PICTURE
With this, the final data table is ready for use.
Constructing the Visualization
Given that most of the data I have are categorical, except for the number of records in each of the categories. I looked into different visualizations for categorical data, parallel sets were considered. In attempts to answer the ultimate question, I tried plotting the different attributes onto the parallel sets.
In my attempt to answer the ultimate question of how do the utilization of tools differ between different users, I spilt the attributes just so that the lines will be more prominent and not too spread out. This allows for easier reading and comparison. Consequently, I have 3 different diagrams for the 3 different attributes.
INSERT PICTURE
There are a few trends that can be derived from these visualizations. One such example is that, regardless of roles, disciplines and career stage, the ranking of the tools used for a research function or activity is the same. An additional trend could be that regardless of the discipline or role that the respondents are involved with, they generally invest a consistent amount into each of the research activities.
However, absolute comparison is tedious given the large number of variables involved in this chart. With the large number of variables, each line is spilt into more portions, the shaded band emitted from each segment and the differences between each then become harder to determine. As such, I have decided to turn to Tableau to create this final visualization.
The Final Visualization
The final visualization can be found in this link provided: https://public.tableau.com/views/IS428Assignment2/Story1?:embed=y&:display_count=yes
Caption: What innovation research tools are people using these days?
The variables used to define user attributes include their roles, disciplines and career stages. Generally, discovery tools are most utilized while assessment tools are least utilized across the disciplines. Respondents in Social Science & Economics discipline generally have a higher utilization of tools across all research activities as compared to the respondents from other disciplines. There is the trend that for professors/ associate professors/ assistant professors, the least experienced are not utilizing as much of the more traditional tools for their activities as compared to those more experienced. However, this trend is inversed when analyzing PhD students and Postdoc. The professors' choice of tools utilized differs at different career stages. An example would be Acrobat where it is vastly popular with the more experienced professors but only along the average line when it comes to the amateurs.