ISSS608 2016-17 T1 Assign3 Li Nanxun

From Visual Analytics and Applications
Revision as of 22:11, 28 October 2016 by Nanxun.li.2015 (talk | contribs)
Jump to navigation Jump to search


Abstract


By leveraging the communication data and the movement data, several tourist patterns are found. But there is no apparent suspicious criminal pattern found, all the data shows that DinoFun World ran properly and reasonably.

Background

DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events. One event last year was a weekend tribute to Scott Jones, internationally renowned football (“soccer,” in US terminology) star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past. While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.


Tool Utilized


  • Excel – to prepare data, and derive node properties (i.e. total number of different contacts)
  • Tableau – to visualize the analysis.
  • JMP – to prepare data, most for data cleaning, data join, data filtering.
  • Gephi – to visualize the communication relationships.


Type of chart used: Bar Chart, Tableau Map, Gephi Degree.


Data Preparation


This part contains so many efforts, but I dont want to waste much time on the preparation because the methods is quite basic and normal. So I would like to briefly mention the way to generate my finaly worksheet and mainly talk about potential usage of my final data.
Data cleaning
After checking the columns one by one for each data sheet, I found several small problems in terms of data quality.

 1.	Missing value
 2.	Wrong map X and Y
 3.	Wrong ID


Due to the problems are really easy to solve, I just simply deleted the rows with confusing data.
ID Summary
In order to get the properties for each ID ( i.e. how many movement records did each ID do? how many different ID did each ID contact with?). There are several new columns derived from data sheets and joint together to get the ID properties. They are

 1.	N of From – number of total Out-contact (the tourist sent message out to other IDs).
 2.	N of To – number of total In-contact.
 3.	Check-in - check-in numbers, which can show how did the tourist participated in the park facilities. 
 4.	Movement – record number of movement. According to my observation, the movement tracker records once the tourist moves 1 no matter in X axis or Y axis in the map, which means the record number can represent how far did the tourist move. 
 5.	External – external communication record number.
 6.	Coaster Alley – Out-contact record number in Coaster Alley.
 7.	Entry Corridor – Out-contact record number in Entry Corridor.
 8.	Kiddie Land – Out-contact record number in Kiddie Land.
 9.	Tundera Land – Out-contact record number in Tundera Land.
 10.	Wet Land – Out-contact record number in Wet Land.
 11.	FromCommN – the number of IDs that the tourist had sent messages to.
 12.	FromAverageComm – the average Out-contact message volume.
 13.	ToCommN - the number of IDs that the tourist had received messages from.
 14.	ToAvgComm - the average In-contact message volume.


I generated all these columns mainly via Tabulate and Join functions in JMP.


Data Analysis

Conclusion



Recommendation


Future Work