Difference between revisions of "IS428 AY2019-20T1 Assign Lee Cheng Leng EDA"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 24: Line 24:
  
 
= Data Cleaning Process =
 
= Data Cleaning Process =
 +
<!--
 +
{| class="wikitable"
 +
|-
 +
!  !! Problem #1
 +
|-
 +
| Issue || The Static Sensor Locations and Readings are in two separate files. In order to make meaningful analysis, we would require the data from both files to be joined together into one table.
 +
|-
 +
| Solution || Join the two files in Tableau Prep Builder using the 'Join' function. We would be joining them on the sensor-id, which are unique identifiers of the static sensors.
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #2
 +
|-
 +
| Issue || It was found that there are sensor-ids that overlap in the static and mobile sensors, such as 15. If we were to combine the two data sources for analysis in Tableau, it might be misleading as the user would not know if the sensor is a static or mobile one.
 +
|-
 +
| Solution || Add a calculated field "Sensor Type", so as to identify if the sensor is a mobile or static sensor. This was done in Tableau Prep Builder.
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #3
 +
|-
 +
| Issue || The mobile and static sensor readings were contained in two different files, StaticSensorReadings and MobileSensorReadings. However, in order to create a Tableau dashboard with both, both files would need to be merged together.
 +
|-
 +
| Solution || Upon data cleaning of the static and mobile sensor data, join the sources together with ‘Union’ on Tableau Data Prep.
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #4
 +
|-
 +
| Issue || Among the mobile sensors, the one with sensor-id 12 has an outlier value of 57345 on Day 9, 2am. We can hypothesise that this value was due to a malfunction in the device as the other readings collected by sensors in the vicinity of sensor-id 12 at that time, such as sensor-id 7 and 8, both record readings that average to 10.5.
 +
|-
 +
| Solution || I chose to remove the outlier from the dataset as it caused the axis to be heavily skewed, such that we would not be able to gain insights from the other data points. This can be done easily on Tableau Prep Builder using the ‘Filter’ function.
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #5
 +
|-
 +
| Issue || For each sensor, the readings are taken every five seconds. As a result, I realised that the granularity of the data is too fine, which might hinder us from being able to create visualisations from which actionable insights can be derived effectively. 
 +
|-
 +
| Solution || I chose to group the data by a longer time frame of 10 minute intervals. This can be done in Tableau Prep Builder using a calculated field, such that the data that is used in Tableau would be a much smaller subset of the original provided data. This would allow the visualisations to be generated quicker, and for actionable insights to be derived more effectively.
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #6
 +
|-
 +
| Issue || Having solved the above issue, there were still almost a million data rows present in the combined sensor data file. This was because upon grouping the sensor readings into 10 minute intervals, the number of records present in the file would still remain the same.
 +
|-
 +
| Solution || Hence, I chose to aggregate the data based on the combination of 'Sensor Type', 'sensor-id', 'Timestamp', 'Lat' and 'Long'. I also aggregated the values by taking the average of the readings across the 10 minute interval, so as to provide a more accurate representation of the sensor readings within the 10 minute interval. 
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #7
 +
|-
 +
| Issue || From the data description, we can tell that the user-id of the mobile sensors are not unique as some users choose not to change their user-id from the default, ‘MySensor’. This means that user-id is not a unique identifier of the mobile sensors. Furthermore, static sensors do not have any user-id tagged to it. Hence, it is redundant.
 +
|-
 +
| Solution || Drop the user-id column during data cleaning so as to remove any unnecessary attributes and reduce noise.
 +
|}
 +
 +
{| class="wikitable"
 +
|-
 +
! !! Problem #7
 +
|-
 +
| Issue || The map provided, 'StHimarkNeighborhoodMap.png', is too rich in colour to be used as a background map image.
 +
|-
 +
| Solution || I used Powerpoint to recolour the image to a greyscale. Upon applying washout before applying it as the background map image, I would then be able to obtain a background that does not take away the attention from the actual data points.
 +
|}
 +
-->

Revision as of 00:52, 9 October 2019

Frederic-paulussen-LWnD8U2OReU-unsplash.jpg VAST 2019 MC2: Citizen Science to the Rescue

Overview

Data Exploration and Transformation

Interactive Visualisation

Task Findings

References

Data Cleaning Process