ISSS608 2016-17 T1 Assign3 CHIA Yong Jian Data Review

From Visual Analytics and Applications
Jump to navigation Jump to search

CHIA YONG JIAN Assign3 CSI Logo.png Episode VA3: "A Crime-Filled Weekend at DinoFun World"

[Home]

[Data Review and Prep]

[Large Volume Comms IDs]

[Ten Comms Patterns]

[Discovery of Vandalism]


CHIA YONG JIAN Assign3 DatabaseLogo.png

Data Review and Preparations

SAS JMP Pro 12 was used to review and prepare data.

1. Communications Data

Three days worth of data (Friday, Saturday, Sunday) of the fateful weekend was provided, with each having between 948,739 to 1,655,866 records. Each file has the following columns:

Column Description Remarks
Timestamp in YYYY-MM-DD HH24:MM:SS format -
from unique identifier of the sender All communications came from a unique identifier - no messages are sent by an "external" party
to unique identifier number of the receiver Receiver ID can be indicated as "external" as well - assuming external means to a party not in the park.
location indicates where the message was sent from. All messages came from one of the following areas in the park: Wet Land, Tundra Land, Kiddie Land, Entry Corridor, Coaster Alley

No missing data was observed in the dataset.

For any loading into Gephi later, the following columns will be renamed:

  • "from" renamed to "Source"
  • "to" renamed to "Target"

Furthermore, a node file will also be created, consisting of unique IDs from the prepared edge file.


2. Movement Data

Movement data was also provided for the 3 days, with each having between 6 to 10 million records, and the following columns:

Column Description Remarks
Timestamp in YYYY-MM-DD HH24:MM:SS format -
id Unique identifier of the person making the movement -
type Either movement or check-in type -
X X-Coordinate of the location Values are between 0-100
Y Y-Coordinate of the location Values are between 0-100

When a missing data check was performed, there was one row of record for the Sunday movement data that does not have any information on the columns other than timestamp. It is unclear if this is due to dirty data or signs of involvement by the crime perpetrators (such as sabotage of data).

CHIA YONG JIAN Assign3 DataPrep Movement MissingData.png

The movement data, together with the park map, will be plotted using Tableau to visualise movement of the individuals in the park.


Next: IDs with large volume of communications