ISSS608 2016-17 T1 Assign3 Parikshit Ravindra MAYEE
Contents
Overview
In this assignment I have tried to explore and visualize the communication patterns of the visitors, over 3 days, to DinoFun World (fictitious amusement park). All of my analysis and visualization has been consolidated in an interactive dashboard and published to Tableau public.
Approach
My approach for this assignment was focused on answering following questions:
1. Identify those IDs that stand out for their large volumes of communication. For each of these IDs
a. Characterize the communication patterns you see.
b. Based on these patterns, what do you hypothesize about these IDs?
2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.
3. From this data, can you hypothesize when the vandalism was discovered? Describe your rationale.
Data
Only Communication dataset for 3 days was used for the analysis of this assignment. Movement data was not used.
Following files were used:
• DinoFunWorld_CommData.zip consist of in-app communication data over the three days of the Scott Jones celebration.
• DinoFunWorld_LayoutMap.zip consists of a jpg file.
• DinoFunWorld_Website.zip consists of webpages of DinoFun World Park.
Data Preparation & Analysis
Communications data for park visitors was made available for each data. The dataset was structured to show the timestamp of each communication, who initiated the communication, who was the recipient of the communication and the location where the communication was sent from.
Basic data exploration & Combining 3 days data:
In order to simplify visualizing in Tableau I first analysed the data in SAS JMP pro.
I performed basic data exploration by plotting distributions of the columns. I observed that ‘To’ column consisted of text ‘external’ representing all communication where the recipient was out of the park and hence would not have an ID to be represented. I recoded (Cols ==> Utilities ==> recode) the ‘external’ communication to represent as ‘0’.
After recoding I modified the data type-modeling type for ‘From’ & ‘To’ columns to Numeric-Nominal.
I performed a quick check to identify any missing patterns (Tables ==> Missing data patterns). No missing values were observed in the dataset.
The combined dataset of 3 days consisted of 4,153,329 rows. I saved this combined dataset as a CSV file through JMP. The CSV file with combined dataset for 3 days was imported and used as the sources for analysis in tableau.
Next, I combined the 3 days data into one file using JMP’s Tables ==> Concatenate function.
Question 1: Identify those IDs that stand out for their large volumes of communication. For each of these IDs.
1. Characterize the communication patterns you see.
2. Based on these patterns, what do you hypothesize about these IDs?
By plotting the bubble chart for ‘From’ & ‘To’ columns from dataset, using the number of communications for each ID to represent the volume of communication, I was able to identify the IDs sending and receiving High volume of communications.
Based on above visualization, it can be concluded that IDs 1278894 & 839736 showed unusually high proportion of communication volume compared to the other IDs.
The high volume bubble with label ‘0’ represents the communications received by external source.
Next step was to identify the communication pattern for these 2 IDs.
I went back to JMP in order to prepare a new data set for these two IDs. Using Select Rows functionality, I selected all rows where the communication was sent by or received by 1278894 & 839736.
There was no communication between 1278894 & 839736, so the formulas used to derive at the values would not violate any logic in this customized case.
Tools Utilized
1. SAS JMP Pro : Used for initial data analysis and data cleaning. Also used for creating Trellis plot visualization in Iteration 2.
2. Tableau : Used for exploratory data analysis and to generate visual representations.
3. Tableau Public : Visual dashboard was published to Tableau Public and the web url is shared above.
Results
Results for my visual analysis are available on Tableau Public: Sentiment analysis about Wikipedia as teaching resource (Updated)
The dashboard published above is interactive and can be used to explore the sentiments expressed through survey about Wikipedia as teaching resource. 1. Respondents tab from the published dashboard helps to answer my first question Who are my survey respondents? 2. Broken Down Sentiments tab helps to answer question about the changes in Sentiments with respect to varying factors. 3. Overall Sentiment tab helps to answer about the overall sentiments expressed through the survey for each of the question.
Citation & References
1. Meseguer, A., Aibar, E., Lladós, J., Minguillón, J., Lerga, M. (2015). “Factors that influence the teaching use of Wikipedia in Higher Educationâ€. JASIST, Journal of the Association for Information Science and Technology. ISSN: 2330-1635. doi: 10.1002/asi.23488.
2. http://www.datarevelations.com/visualizing-survey-data
3. https://community.jmp.com/community/academic
4. https://community.tableau.com/
5. http://www.datarevelations.com/likert-scales-the-final-word.html
6. https://wiki.smu.edu.sg/1617t1ISSS608g1/ISSS608_2016-17_T1_Assign2_PRASONGTHANAKIT_Kanokkorn