ISSS608 2016-17 T1 Assign3 Zhang Jinchuan

From Visual Analytics and Applications
Revision as of 22:54, 28 October 2016 by Jczhang.2015 (talk | contribs)
Jump to navigation Jump to search

introduction

DinoFun World is a typical modest-sized amusement park, sitting on about 215 hectares and hosting thousands of visitors each day. It has a small town feel, but it is well known for its exciting rides and events.

One event last year was a weekend tribute to Scott Jones, internationally renowned football star. Scott Jones is from a town nearby DinoFun World. He was a classic hometown hero, with thousands of fans who cheered his success as if he were a beloved family member. To celebrate his years of stardom in international play, DinoFun World declared “Scott Jones Weekend”, where Scott was scheduled to appear in two stage shows each on Friday, Saturday, and Sunday to talk about his life and career. In addition, a show of memorabilia related to his illustrious career would be displayed in the park’s Pavilion. However, the event did not go as planned. Scott’s weekend was marred by crime and mayhem perpetrated by a poor, misguided and disgruntled figure from Scott’s past.

While the crimes were rapidly solved, park officials and law enforcement figures are interested in understanding just what happened during that weekend to better prepare themselves for future events. They are interested in understanding how people move and communicate in the park, as well as how patterns changes and evolve over time, and what can be understood about motivations for changing patterns.

In this essay will tell the approaches by data analytics tools for the three questions:

  1. Identify those IDs that stand out for their large volumes of communication. For each of these IDs Characterize the communication patterns you see.Based on these patterns, what do you hypothesize about these IDs?
  2. Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where.
  3. From this data, can you hypothesize when the vandalism was discovered? Describe your rationals.

data explore

there are two parts of data,coummnication data and movement data. Communication data records how people communicate in the park. Movement data records the path people move in the park.

Because the volume of the data is quite big. So we can assump that only the large volumes of communication of people are the people we should focus on. By filtering the data we can deal with them and detective the crimes more easily. For example, choosing the tourists with large volumes communication record and regarding them as the satuarday communication data.

By noticing in the colume "to", there are "external" value, considering the people communicate with outside could be useless in the crimes analysis work, so delete them.

By using JMP tool, we can make some sheets showing the number of commnication, they can be useful in the next analytics work.

Approaches

Q1:Identify those IDs that stand out for their large volumes of communication. For each of these IDs Characterize the communication patterns you see.Based on these patterns, what do you hypothesize about these IDs?

Based on the communicate data, we can get one sheet having the number of record of every people's communication in two aspects, from and to. defining the top 10% of them are high communicaters. in these people, there are two people owned extremly high record, their IDs are: 1278894 and 839736.

we can see these points more clearly in the Gephi graph, we can easily see they collect with almost every people, which means, they communicate with almost everyone. So based on this pattern, we can hypothesize they are the staff of the park, they are responside for the security of this park.

we can make a comparation about these tourists communicate with people who communicate high and people who are not frequency talking. By setting a new columes named Central ratio, which is the ratio of the number of these two kinds of people. Based on the ratio we can part them into three patterns, high, medium and low.

summary

future work