Difference between revisions of "ISSS608 2018-19 T1 Assign Yan Huilin"
| (27 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| − | + | ||
| + | |||
| + | |||
| + | ==Task 1: Spatio-temporal Analysis of Official Air Quality== | ||
| + | |||
| Methodology: | Methodology: | ||
| Line 9: | Line 13: | ||
| Using Year as filter, color as the level of the concentration. Place three months a row for a better view. | Using Year as filter, color as the level of the concentration. Place three months a row for a better view. | ||
| + | |||
| + | {| | ||
| + | |[[File:yan1.png|800px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | Reference: https://www.tableau.com/about/blog/2017/2/viz-variety-show-heatmaps-66330 | ||
| Line 37: | Line 48: | ||
| In 2013, the polluted days were in the early January and the late December. In 2014 and 2015, the polluted days were in January, February, November and December. In 2016, the only heavily polluted day was in January. In 2018, the polluted days were in January. | In 2013, the polluted days were in the early January and the late December. In 2014 and 2015, the polluted days were in January, February, November and December. In 2016, the only heavily polluted day was in January. In 2018, the polluted days were in January. | ||
| Early or late of the year are the times that Sofia City most likely to be polluted. | Early or late of the year are the times that Sofia City most likely to be polluted. | ||
| + | |||
| + | {| | ||
| + | |[[File:yan2.png|800px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| Line 51: | Line 67: | ||
| 2) Since lost track of the data of one single year. Cannot get full view of the pollution trends and the understanding of the pollution pattern may also be flawed. | 2) Since lost track of the data of one single year. Cannot get full view of the pollution trends and the understanding of the pollution pattern may also be flawed. | ||
| + | |||
| + | |||
| + | |||
| + | ==Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements == | ||
| + | |||
| + | |||
| + | Methodology: | ||
| + | |||
| + | Using geohash package to decode the geohash data. And using R language to join two data tables into one, since the size of the file is too big for other methods to perform combination. Using map to visual the data, and using animation to show the trends. | ||
| + | |||
| + | |||
| + | Visualization design: | ||
| + | |||
| + | Using Year as filter, using Day as pages. It’s easy to visual the changes of the measurements and also the changes of the station. | ||
| + | |||
| + | {| | ||
| + | |[[File:yan3.png|800px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | Insights: | ||
| + | |||
| + | '''Are they well distributed over the entire city: | ||
| + | ''' | ||
| + | |||
| + | Total sensor number: 2017- 383, 2018- 1253 | ||
| + | The number of sensors inside the city: 2017- 240, 2018- 713 | ||
| + | |||
| + | The sensors highly aggregated around the center of the city. They are not uniformly distributed across the city. | ||
| + | |||
| + | But it may also makes sense that the center of the city has larger population and is modernized, thus this area need to be monitored. | ||
| + | |||
| + | |||
| + | '''Are they all working properly at all times:''' | ||
| + | |||
| + | Numbers of shutdown sensor when performing different measurements: | ||
| + | |||
| + | {| class="wikitable" | ||
| + | |- | ||
| + | ! Year/Measurement !! Humidity !! Temperature !! Pressure | ||
| + | |- | ||
| + | | 2017|| 7|| 7|| 23 | ||
| + | |- | ||
| + | | 2018|| 29|| 29|| 132 | ||
| + | |} | ||
| + | |||
| + | |||
| + | '''Unexpected behaviors of the sensors through analyzing the readings:''' | ||
| + | |||
| + | 1.	In April 31 and July 5, 2018, the majority of sensors shut down, causing a plump in the data table. | ||
| + | |||
| + | 2.	There is a significant positive correlation between humidity and pressure. When the humidity goes up, the pressure tends to raise. However, no significant correlation found with temperature. | ||
| + | |||
| + | 3.	As time goes by, the number of the station increased. But it still focused on the center of the city. From the shape of the aggregated station we could also see that the urban area develop in a north-west and south-east direction. | ||
| + | |||
| + | {| | ||
| + | |[[File:yan4.png|800px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | '''Which part of the city shows relatively higher readings than others? Are these differences time dependent?''' | ||
| + | |||
| + | As mentioned above, the center of the city have higher readings than the other part of the city. As the number of the stations get higher, the more the center of the city becomes aggregated. | ||
| + | |||
| + | |||
| + | ==Task 3== | ||
| + | |||
| + | |||
| + | methodology:  | ||
| + | |||
| + | Use simple techniques such as plot the measurements line by line to compare. | ||
| + | |||
| + | |||
| + | Visualization design: | ||
| + | |||
| + | {| | ||
| + | |[[File:mia7.png|800px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | Insights: | ||
| + | |||
| + | |||
| + | 1. As learnt from wiki: The measurement of the dew point is related to humidity. A higher dew point means there will be more moisture in the air. | ||
| + | |||
| + | The dew point temperature have positive relationship with temperature: | ||
| + | |||
| + | {| | ||
| + | |[[File:mia3.png|600px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | 2. The relative humidity have positive relationship with wind speed: | ||
| + | |||
| + | {| | ||
| + | |[[File:mia4.png|800px|frameless|center]] | ||
| + | |- | ||
| + | |}  | ||
| + | |||
| + | |||
| + | 3. The pollution concentration has negative relationship with dew point temperature/temperature: | ||
| + | |||
| + | {| | ||
| + | |[[File:mia5.png|400px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | {| | ||
| + | |[[File:mia6.png|400px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | 4. There is no significant correlations with topography and meteorology. It is shown that the city has a relatively high elevation in south-west: | ||
| + | |||
| + | {| | ||
| + | |[[File:mia1.png|400px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | {| | ||
| + | |[[File:mia2.png|400px|frameless|center]] | ||
| + | |- | ||
| + | |} | ||
| + | |||
| + | |||
| + | |||
| + | ==Conclusion== | ||
| + | |||
| + | |||
| + | '''Assignment comments:''' | ||
| + | |||
| + | At the beginning the city didn't have too much stations, and they were all located in the central area. As developing, the number of stations raised, but still focused on the center. | ||
| + | |||
| + | In the beginning and the end of year are the times that the city most likely to get polluted, we could assume that this is due to external polluted source. | ||
| + | |||
| + | For the meteorological measurements, humidity and temperature have negative relationship with pollution condensation. This could be explained for the colder the weather, the harder the air flow. | ||
| + | |||
| + | For the city to prevent or ease the pollution, they could consider both external solution and internal solution. 1. Negotiate with neighbor countries about the transboundary pollution issue. 2. Develop environmental-protection methods according to the weather.   | ||
| + | |||
| + | |||
| + | '''Personal thoughts:''' | ||
| + | |||
| + | By working on this assignment, I gained a lot of understandings and comprehension about what I have learnt in this term.  | ||
| + | |||
| + | My work might be rough but I started to appreciate the beauty of data visualization and the Tableau platform. | ||
| + | |||
| + | It occurred to me that I had once read that: the future lies in the combination of technology and design. From my limited knowledge, I think data visualization is the combination of technology and design by combining data analytics and data visualization. | ||
| + | |||
| + | This is where Tableau had done a good job, the interface and the visualization functions are not only useful but also beautiful. The same reason why I hate JMP and SAS EM. | ||
| + | |||
| + | Of course, I strongly agree that data mining is a dirty but necessary work that requires brave man to do it. I respect those who could do it. I simply hope that in the future there could be a more well-designed interface and maybe workflow for these data mining platforms. | ||
| + | |||
| + | I am currently applying for a data visualization internship, looking forward to applying my knowledge in the real workplace. | ||
| + | |||
| + | Lastly, my sincere regards to the tutor, Kam. Although I only listened to 80% of what he has told in the class, my homework only met 50% of his standards ( maybe even lower ), and probably after the term ends I will only remember 10% of what he has taught me --- he fed me with his experience and knowledge, to me what is most important when learning things is not the substances but the concepts. He passed the principles and the beliefs that he holds, which is what I found inspiring and intriguing, for that I would very much appreciated.  | ||
| + | |||
| + | |||
| + | |||
| + | ==Dashboard== | ||
| + | |||
| + | |||
| + | Task 1: https://public.tableau.com/profile/huilin.yan#!/vizhome/task1_65/Story1 | ||
| + | |||
| + | Task 2: https://public.tableau.com/profile/huilin.yan#!/vizhome/task2_27/Story1 | ||
| + | |||
| + | Task 3: https://public.tableau.com/profile/huilin.yan#!/vizhome/task3_2_0/Story1?publish=yes; https://public.tableau.com/profile/huilin.yan#!/vizhome/task3_1_1/Sheet1?publish=yes | ||
Latest revision as of 13:49, 19 November 2018
Contents
Task 1: Spatio-temporal Analysis of Official Air Quality
Methodology:
Heatmap calendar. The reason that I chose heatmap calendar is that the data set are full of single day data, which means that I have to find a way that not only could visual one single day but also gives me the entire trends. And I think heatmap calendar is good choice, for it can not only visual the value of one single day by the color but could also show the entire date set by presenting the calendar.
Visualization Design:
Using Year as filter, color as the level of the concentration. Place three months a row for a better view.
Reference: https://www.tableau.com/about/blog/2017/2/viz-variety-show-heatmaps-66330
Insights:
Characterize the past and most recent situation with respect to air quality measures in Sofia City:
2013 has 2 days of heavily polluted days, about 10 days of medium polluted days, total polluted rate, medium.
2014 has 4 days of heavily polluted days, about 35 days of medium polluted days, total polluted rate, medium.
2015 has 6 days of heavily polluted days, about 45 days of medium polluted days, total polluted rate, high.
2016 has 1 days of heavily polluted days, about 1 days of medium polluted days, total polluted rate, very low.
2018 has 5 days of heavily polluted days, about 6 days of medium polluted days, total polluted rate, low.
From 2013 to 2015, the polluted days increased, thus the pollution level became higher. In 2016 however, the concentration plumped, and there was very few polluted day across the year. In 2018, the polluted days increased a bit at the beginning of the year, but stayed modest-polluted to the middle of September.
What does a typical day look like for Sofia city: 
Condensation: 1500 – 7500
Trends of possible interests:
In 2013, the polluted days were in the early January and the late December. In 2014 and 2015, the polluted days were in January, February, November and December. In 2016, the only heavily polluted day was in January. In 2018, the polluted days were in January. Early or late of the year are the times that Sofia City most likely to be polluted.
Anomalies found:
1) Different duration data type: day or hour. From 2013 to 2015, the [ average time ] is day, while from 2016 onwards, the [ average time ] is hour.
2) 2017 data is not available: It only contains data from 2017.11.28 to 2017.12.31.
Potential problems of the anomalies:
1) When conducting year comparison, the different time dimension will affect the feasibility of tableau functions.
2) Since lost track of the data of one single year. Cannot get full view of the pollution trends and the understanding of the pollution pattern may also be flawed.
Task 2: Spatio-temporal Analysis of Citizen Science Air Quality Measurements
Methodology:
Using geohash package to decode the geohash data. And using R language to join two data tables into one, since the size of the file is too big for other methods to perform combination. Using map to visual the data, and using animation to show the trends.
Visualization design:
Using Year as filter, using Day as pages. It’s easy to visual the changes of the measurements and also the changes of the station.
Insights:
Are they well distributed over the entire city:
Total sensor number: 2017- 383, 2018- 1253 The number of sensors inside the city: 2017- 240, 2018- 713
The sensors highly aggregated around the center of the city. They are not uniformly distributed across the city.
But it may also makes sense that the center of the city has larger population and is modernized, thus this area need to be monitored.
Are they all working properly at all times:
Numbers of shutdown sensor when performing different measurements:
| Year/Measurement | Humidity | Temperature | Pressure | 
|---|---|---|---|
| 2017 | 7 | 7 | 23 | 
| 2018 | 29 | 29 | 132 | 
Unexpected behaviors of the sensors through analyzing the readings:
1. In April 31 and July 5, 2018, the majority of sensors shut down, causing a plump in the data table.
2. There is a significant positive correlation between humidity and pressure. When the humidity goes up, the pressure tends to raise. However, no significant correlation found with temperature.
3. As time goes by, the number of the station increased. But it still focused on the center of the city. From the shape of the aggregated station we could also see that the urban area develop in a north-west and south-east direction.
Which part of the city shows relatively higher readings than others? Are these differences time dependent?
As mentioned above, the center of the city have higher readings than the other part of the city. As the number of the stations get higher, the more the center of the city becomes aggregated.
Task 3
methodology:
Use simple techniques such as plot the measurements line by line to compare.
Visualization design:
Insights:
1. As learnt from wiki: The measurement of the dew point is related to humidity. A higher dew point means there will be more moisture in the air.
The dew point temperature have positive relationship with temperature:
2. The relative humidity have positive relationship with wind speed:
3. The pollution concentration has negative relationship with dew point temperature/temperature:
4. There is no significant correlations with topography and meteorology. It is shown that the city has a relatively high elevation in south-west:
Conclusion
Assignment comments:
At the beginning the city didn't have too much stations, and they were all located in the central area. As developing, the number of stations raised, but still focused on the center.
In the beginning and the end of year are the times that the city most likely to get polluted, we could assume that this is due to external polluted source.
For the meteorological measurements, humidity and temperature have negative relationship with pollution condensation. This could be explained for the colder the weather, the harder the air flow.
For the city to prevent or ease the pollution, they could consider both external solution and internal solution. 1. Negotiate with neighbor countries about the transboundary pollution issue. 2. Develop environmental-protection methods according to the weather.
Personal thoughts:
By working on this assignment, I gained a lot of understandings and comprehension about what I have learnt in this term.
My work might be rough but I started to appreciate the beauty of data visualization and the Tableau platform.
It occurred to me that I had once read that: the future lies in the combination of technology and design. From my limited knowledge, I think data visualization is the combination of technology and design by combining data analytics and data visualization.
This is where Tableau had done a good job, the interface and the visualization functions are not only useful but also beautiful. The same reason why I hate JMP and SAS EM.
Of course, I strongly agree that data mining is a dirty but necessary work that requires brave man to do it. I respect those who could do it. I simply hope that in the future there could be a more well-designed interface and maybe workflow for these data mining platforms.
I am currently applying for a data visualization internship, looking forward to applying my knowledge in the real workplace.
Lastly, my sincere regards to the tutor, Kam. Although I only listened to 80% of what he has told in the class, my homework only met 50% of his standards ( maybe even lower ), and probably after the term ends I will only remember 10% of what he has taught me --- he fed me with his experience and knowledge, to me what is most important when learning things is not the substances but the concepts. He passed the principles and the beliefs that he holds, which is what I found inspiring and intriguing, for that I would very much appreciated.
Dashboard
Task 1: https://public.tableau.com/profile/huilin.yan#!/vizhome/task1_65/Story1
Task 2: https://public.tableau.com/profile/huilin.yan#!/vizhome/task2_27/Story1
Task 3: https://public.tableau.com/profile/huilin.yan#!/vizhome/task3_2_0/Story1?publish=yes; https://public.tableau.com/profile/huilin.yan#!/vizhome/task3_1_1/Sheet1?publish=yes











