ISSS608 Assign3 HoLiChin DataPrep

From Visual Analytics and Applications
Jump to navigation Jump to search

Introduction

Data Preparation

Task 1

Task 2

Task 3

Visualisation Links & Conclusion

 


1)Data Provided

The primary source of data for the participants is the DinoFun World app. All visitors to the park (except very young children) must use the app to check in to the park and its rides and to communicate with fellow visitors. Visitors are assigned IDs that remain active for the length of their ticket, which could be multiple days.

App users may send text messages to anyone within their own designated group (for example, a family could have its own group). An app user may also make a friend at the park where they can send and receive texts.

The main data provided consists of three days’ park movement and communication data and they are in csv format. The movement data are in the format of entries consisting of timestamp, id of each unique visitor, X coordinate and Y coordinate. Whereas for Communication data, the entries consisting of Timestamp, from (sender), to (receiver) and location.


2) Data Preparation & Exploration

The park movement and communication datasets for Fri, Sat and Sun were loaded into SAS JMP for data preparation. As the comm data have no information about the visitor X & Y when a communication took place, thus the following steps were performed to combine the comm data to movement tables to identify the visitor’s X & Y coordinates at the moment when the comm data took place.

Connect Communication Data with Movement Data


<Step 1> - Data preparation for comm tables

  • There are 3 JMP tables for communication data, namely comm-Fri, comm-Sat and comm-Sun.
  • Change variable names for all the comm tables (Fri, Sat, Sun)
    • Add in a “Day” Column
    • Rename column name “from” -> “Sender” and column name “to”-> “Receiver”

<Step 2> - Data preparation for movement tables

  • There are 3 JMP tables for movement data, namely movement-Fri, movement-Sat and movement-Sun.
  • Change column name “id” -> Sender (to prepare before concatenate tables)


<Step 3> - Concatenate both comm & movement (to identify the Sender X & Y coordinates for comm data)

  • Concatenate both tables on day by day basis, that means concatenate comm-Fri and movement-Sat, repeat the same for Sat and Sun.
Pic1



  • After concatenating comm + movement tables, in each new combined tables (by day)
    • Sort by Timestamp column (Ascending)
    • Then Sort by “Sender” (Ascending)
  • Add new columns movement_new, X_new, Y_new
    • Create 3 new columns using formulas as shown below
    • The formula below copied the value of “movement”, “X” and “Y” down the new columns for the comm data. Since all data have been sorted by Timestamp followed by Sender, thus the X & Y coordinates will be the closest to the visitor Id during the similar timestamp.


Pic2



Pic3



  • Filter to include only the Comm data, exclude all other movement data with missing Sender & Receiver info.


<Step 4> - Concatenate both comm & movement (to identify the Receiver X & Y coordinates for comm data)

  • Repeat <Step 3> for Receiver X & Y coordinates for Fri, Sat and Sun comm data.

<Step 5> - Perform Join using Matching X & Y coordinates to Park Paces Description

This step is to join using matching X & Y of Park Place description with Sender and Receiver X & Y in step 4 & 5 above. After the join, for each comm data entry, we can know where was the whereabouts of Sender and Receiver when a message exchanges took place.

New Columns for combined data with X, Y coordinates and Place Description

Pic4



<Step 6> - Finally concatenate all Fri, Sat & Sun tables into a single Master table.

  • Export the JMP master table into a csv file to be loaded into Tableau, Qlik Sense for subsequent analysis and data visualisation.