Difference between revisions of "IS428 AY2019-20T1 Assign Lee Cheng Leng EDA"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 67: Line 67:
 
! !! Problem #4
 
! !! Problem #4
 
|-
 
|-
| Issue || The map provided, 'StHimarkNeighborhoodMap.png', is too rich in colour to be used as a background map image.  
+
| Issue || From the data description, we can tell that the user-id of the mobile sensors are not unique as some users choose not to change their user-id from the default, ‘MySensor’. This means that user-id is not a unique identifier of the mobile sensors. Furthermore, static sensors do not have any user-id tagged to it. Hence, it is redundant. Also, all the readings are taken in the units of counts per minute (cpm), hence there is no need for that column as all the column values are identical.
 
|-
 
|-
| Solution || I used Powerpoint to recolour the image to a greyscale. Upon applying washout before applying it as the background map image, I would then be able to obtain a background that does not take away the attention from the actual data points.  
+
| Solution || Drop the user-id and units column during data cleaning so as to remove any unnecessary attributes and reduce runtime. This can be done in Tableau Prep Builder.
 
|}
 
|}
  
<!-- {| class="wikitable"
+
Final Data Preparation Workflow from Tableau Prep Builder:
 +
 
 +
 
 +
{| class="wikitable"
 
|-
 
|-
 
! !! Problem #5
 
! !! Problem #5
 
|-
 
|-
| Issue || For each sensor, the readings are taken every five seconds. As a result, I realised that the granularity of the data is too fine, which might hinder us from being able to create visualisations from which actionable insights can be derived effectively.
+
| Issue || The mobile sensor readings file only contains latitude and longitude data for the sensor positions, but does not have any neighbourhood data. This would pose a problem if we are trying to analyse the sensor readings based on the neighbourhood it is currently located in, in order to compare across neighbourhoods for our analysis.
 
|-
 
|-
| Solution || I chose to group the data by a longer time frame of 10 minute intervals. This can be done in Tableau Prep Builder using a calculated field, such that the data that is used in Tableau would be a much smaller subset of the original provided data. This would allow the visualisations to be generated quicker, and for actionable insights to be derived more effectively.  
+
| Solution || We would be able to map each of the points by doing a spatial join between the MobileSensorReadings.csv and StHimark.shp shapefile, namely the Intercept Outer Join. This would create an additional ‘Neighbourhood’ column for us to identify the neighbourhood that the sensor is currently located in for us to do more detailed analysis.
 
|}
 
|}
 +
  
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! !! Problem #6
+
! !! Problem #5
 
|-
 
|-
| Issue || Having solved the above issue, there were still almost a million data rows present in the combined sensor data file. This was because upon grouping the sensor readings into 10 minute intervals, the number of records present in the file would still remain the same.
+
| Issue || The map provided, 'StHimarkNeighborhoodMap.png', is too rich in colour to be used as a background map image.  
 
|-
 
|-
| Solution || Hence, I chose to aggregate the data based on the combination of 'Sensor Type', 'sensor-id', 'Timestamp', 'Lat' and 'Long'. I also aggregated the values by taking the average of the readings across the 10 minute interval, so as to provide a more accurate representation of the sensor readings within the 10 minute interval.
+
| Solution || I used Powerpoint to recolour the image to a greyscale. Upon applying washout before applying it as the background map image, I would then be able to obtain a background that does not take away the attention from the actual data points.  
 
|}
 
|}
 
-->
 
-->

Revision as of 05:02, 11 October 2019

Frederic-paulussen-LWnD8U2OReU-unsplash.jpg VAST 2019 MC2: Citizen Science to the Rescue

Overview

Data Exploration and Transformation

Interactive Visualisation

Task Findings

References

Data Cleaning Process