Difference between revisions of "IS428 AY2019-20T1 Assign Sean Chai Shong Hee Data Preprosessing"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 21: Line 21:
 
|   
 
|   
 
|}
 
|}
<br>
+
 
<div style="margin-top:-50px">
+
== Data Preprocessing ==
<div style="background: #f48024 ; margin-top: 40px; font-weight: bold; line-height: 0.3em;letter-spacing:-0.08em;font-size:16px"><font color=#f48024 face="Century Gothic">DATA PREPROCESSING</font></div>
+
Data was provided in the form of 3 csv files:
<br>
 
<p>Data was provided in the form of 3 csv files:</p>
 
 
*MobileSensorReadings.csv: Consists of measures of radiation values in count per minute. Also contained the longitude and latitude of where the values were collected, and the respective time they were recorded
 
*MobileSensorReadings.csv: Consists of measures of radiation values in count per minute. Also contained the longitude and latitude of where the values were collected, and the respective time they were recorded
 
*StaticSensorReadings.csv: Consists of measure of radiation values in count per minute. Also contained the respective time at which the values were recorded.  
 
*StaticSensorReadings.csv: Consists of measure of radiation values in count per minute. Also contained the respective time at which the values were recorded.  
Line 34: Line 32:
 
<p>Before any meaningful observations can be made with Tableau, the csv files have to preprocessed to allow Tableau to carry out a spatial join between point and polygon geometric data. As such, the longitude and latitude values needs to be converted to Point geometry values using geoPandas. The geoPandas Dataframe is later saved as a Shapefile.</p>
 
<p>Before any meaningful observations can be made with Tableau, the csv files have to preprocessed to allow Tableau to carry out a spatial join between point and polygon geometric data. As such, the longitude and latitude values needs to be converted to Point geometry values using geoPandas. The geoPandas Dataframe is later saved as a Shapefile.</p>
  
== 1. Importing python libraries ==
+
=== 1. Importing python libraries ===
 
[[File:Importing relevant libraries.png|thumb|none]]
 
[[File:Importing relevant libraries.png|thumb|none]]
 
<p>We first start off by importing libraries that are necessary in our creation of our new Shapefile. These libraries include Pandas, Shapely and GeoPandas.</p>
 
<p>We first start off by importing libraries that are necessary in our creation of our new Shapefile. These libraries include Pandas, Shapely and GeoPandas.</p>
  
== 2. Reading csv files to Pandas Dataframe ==  
+
=== 2. Reading csv files to Pandas Dataframe ===
 
[[File:Reading StaticSensorReadings.csv and StaticSensorLocations.csv.png|thumb|none]]
 
[[File:Reading StaticSensorReadings.csv and StaticSensorLocations.csv.png|thumb|none]]
 
<p>Next, we read our csv files to a Pandas Dataframe.</p>
 
<p>Next, we read our csv files to a Pandas Dataframe.</p>
  
== 3. Merging StaticSensorReadings and StaticSensorLocations ==  
+
=== 3. Merging StaticSensorReadings and StaticSensorLocations ===
 
[[File:Merging dataframes.png|thumb|none]]
 
[[File:Merging dataframes.png|thumb|none]]
 
<p>We proceed by merging our static_reading dataframe with our static_loc dataframe. Here, we do a full outer join on "Sensor-id" to ensure all values are accounted for, even if they are null.</p>
 
<p>We proceed by merging our static_reading dataframe with our static_loc dataframe. Here, we do a full outer join on "Sensor-id" to ensure all values are accounted for, even if they are null.</p>
  
== 4. Creating our GeoPandas Dataframe ==  
+
=== 4. Creating our GeoPandas Dataframe ===  
 
[[File:GeoPandas Dataframe.png|thumb|none]]
 
[[File:GeoPandas Dataframe.png|thumb|none]]
 
<p>We first convert our Longitude and Latitude into geometry point values. Here, we append the geometry point values into a new column called "geometry" and create our new geo Dataframe</p>
 
<p>We first convert our Longitude and Latitude into geometry point values. Here, we append the geometry point values into a new column called "geometry" and create our new geo Dataframe</p>
Line 53: Line 51:
 
<p>We can see that we now have a new column called "geometry", which contains geometry point values for each of the rows</p>
 
<p>We can see that we now have a new column called "geometry", which contains geometry point values for each of the rows</p>
  
== 5. Exporting our Dataframe to a Shapefile ==  
+
=== 5. Exporting our Dataframe to a Shapefile ===  
 
[[File:Exporting Shapefile.png|thumb|none]]
 
[[File:Exporting Shapefile.png|thumb|none]]
 
<p>Finally, we export our geo Dataframe to a Shapefile. This Shapefile can be later used for spatial join with the provided "StHimark.shp"</p>
 
<p>Finally, we export our geo Dataframe to a Shapefile. This Shapefile can be later used for spatial join with the provided "StHimark.shp"</p>
  
== 6. Creating Shapefile for MobileSensorReadings ==
+
=== 6. Creating Shapefile for MobileSensorReadings ===
 
[[File:Creating MobileSensorReadings Shapefile.png|thumb|none]]
 
[[File:Creating MobileSensorReadings Shapefile.png|thumb|none]]
 
<p>Following the steps above, we do the same for "MobileSensorReadings.csv" and generate our mobile sensors Shapefile.</p>
 
<p>Following the steps above, we do the same for "MobileSensorReadings.csv" and generate our mobile sensors Shapefile.</p>

Revision as of 23:11, 11 October 2019

Problem And Motivation

 

Data Preprocessing

 

Interactive Visualisation

 

Interesting Anomalies & Observations

 

References

 

Data Preprocessing

Data was provided in the form of 3 csv files:

  • MobileSensorReadings.csv: Consists of measures of radiation values in count per minute. Also contained the longitude and latitude of where the values were collected, and the respective time they were recorded
  • StaticSensorReadings.csv: Consists of measure of radiation values in count per minute. Also contained the respective time at which the values were recorded.
  • StaticSensorLocations.csv: Consists of the longitude and latitude of each static sensor

A polygon Shapefile was also provided, which provides the spatial representation of St. Himark

Before any meaningful observations can be made with Tableau, the csv files have to preprocessed to allow Tableau to carry out a spatial join between point and polygon geometric data. As such, the longitude and latitude values needs to be converted to Point geometry values using geoPandas. The geoPandas Dataframe is later saved as a Shapefile.

1. Importing python libraries

Importing relevant libraries.png

We first start off by importing libraries that are necessary in our creation of our new Shapefile. These libraries include Pandas, Shapely and GeoPandas.

2. Reading csv files to Pandas Dataframe

Reading StaticSensorReadings.csv and StaticSensorLocations.csv.png

Next, we read our csv files to a Pandas Dataframe.

3. Merging StaticSensorReadings and StaticSensorLocations

Merging dataframes.png

We proceed by merging our static_reading dataframe with our static_loc dataframe. Here, we do a full outer join on "Sensor-id" to ensure all values are accounted for, even if they are null.

4. Creating our GeoPandas Dataframe

GeoPandas Dataframe.png

We first convert our Longitude and Latitude into geometry point values. Here, we append the geometry point values into a new column called "geometry" and create our new geo Dataframe

Sample of dataframe.png

We can see that we now have a new column called "geometry", which contains geometry point values for each of the rows

5. Exporting our Dataframe to a Shapefile

Exporting Shapefile.png

Finally, we export our geo Dataframe to a Shapefile. This Shapefile can be later used for spatial join with the provided "StHimark.shp"

6. Creating Shapefile for MobileSensorReadings

Creating MobileSensorReadings Shapefile.png

Following the steps above, we do the same for "MobileSensorReadings.csv" and generate our mobile sensors Shapefile.