Group16 Report

From Visual Analytics and Applications
Revision as of 20:45, 9 August 2018 by Allorens.2016 (talk | contribs) (Created page with "<div style="background:#17202A; padding:22px; text-align:center;"> <font size = 7; color="#FFC300"><span style="font-family: Calibri;">Urban Pulse: A Case Study on Beijing's...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Urban Pulse: A Case Study on Beijing's Traffic

Proposal

Poster

Application

Report


Abstract

Abstract—The increasing adoption of GPS and other location tracking technologies is leading to the colleaction of large spatio temporal datasets and this brings the opportunity for discovering of valuable knowledge of taxi fleet movement behavior which foresters new innovative applications and services that can be monetized. The main objective of this project is to find traffic patterns in Beijing. For this, we applied geospatial and statistical analysis on a Beijing Taxi fleet dataset from 2008. The user will be able to explore the data from various angles. Our methodology is comprised of two parts. First, we examine traffic conditions in different parts of Beijing during 1 day to find interesting patterns. Finally, we explore Beijing taxidrivers patterns and behaviour. For this we have used R. And we have also built an app for an easy visualization in Shiny R.

Introduction

Beijing, formerly romanized as Peking, is the capital of the People's Republic of China, the world's third most populous city proper, and most populous capital city. The city, located in northern China, is governed as a direct-controlled municipality under the national government with 16 urban, suburban, and rural districts. According to the report from ChinaDaily in early 2018, Beijing comes as the second top traffic congested city in mainland China, right following Jinan, Shandong. Statistics form 2015 suggests that the rush hour delay index of Beijing reaches as 2.046, with the speed of rush hour reaching 21.91 Km/h. Given such severe situation, Chinese government has been working hard to trace the traffic situation in Beijing and come up with implementable suggestions.

PS-equipped taxis can be regarded as mobile sensors probing traffic flows on road surfaces, and taxi drivers are usually experienced in finding the fastest (quickest) route to a destination based on their knowledge.

In this paper, we mine driving directions from the historical GPS trajectories of a large number of taxis in Beijing, China to derive insights of traffic conditions in one of the busiest capitals of the world.

Our motivation has been defined by a large dataset of over 10357 text files containing the GPS position of each taxi with samples every 10 minutes and during a whole week. The final dataset was comprised of 16million rows and 4 columns.

Dataset

The Dataset is comprised of tracking samples of over 10 thousand taxis in Beijing by date. Each taxi is identified with a taxi ID. Furthermore, longitude and latitude location data is provided with a sample every 10minutes.

The dataset contains the GPS trajectories of 10,357 taxis during the period of Feb. 2 to Feb. 8, 2008 within Beijing. The total number of points in this dataset is about 15 million and the total distance of the trajectories reaches to 9 million kilometers. The average sampling interval is about 177 seconds with a distance of about 623 meters. Each variable y the taxi ID, and contains the trajectories of one taxi.

The team has derived three other valuable variables: - Time difference: calculation made from the Time variable which specifies what’s the time difference between the previous sample and the current one. Most of them are 600 seconds or 10 minutes. However this helps us identify longer periods of time between rides. For instance, we can identify the last ride of the day and the first ride of next day. - Distance: this calculation helps identify the distance between one point to the next one in meters. - Speed: From the two previous new derived variables we can calculate the average speed for each sample. This variable has helped to understand when the taxi driver was not in movement. Combined with Time difference we could understand when the Taxi Driver arrived home in his last trip (timedifference > 6 hours) for sleep. Therefore we could determine the location of each driver’s home.

These new calculations provide a lot of information on driver behaviour and help us understand the patterns.

Figure116.png

Visual Design Framework

900px

Data Tables

700px

Objectives

The team has used R to develop insights out of the dataset. The team is motivated to desing and develop a single dynamic view of traffic and patterns in Beijing.

With real-time taxi traffic data of 1 week, we can discover traffic patterns such as what is the peak time of traffic in a certain area, what is the direction of traffic stream, and what is the average time of traffic congestion given a specific site and time point.

To sum up, we explore the following issues: - Traffic pattern by hours around CBD Area - Traffic pattern by distric in Beijing. - Working patterns and behaviour of Beijing taxi drivers

Specific R packages used are: - Shiny - GGgplot2 - Plotly - Leaflet, - Dplyr - Lubridate

Insights and Visual Representations

Visualization of Traffic patterns by hour of the day

The following chart shows data traffic patterns in Beijing during 24hours of the day. We can observe the traffic sharply increasing from 10am onwards.


Figure216.png

Visualization of traffic patterns by district

Beijing is comprised of 16 urban and suburban districts. We have analyzed the traffic pattern for 9 urban districts including: Changping, Chaoyang, Daxing, Dongcheng, Fengtai, Haidian, Miyun, Shungyi, Xicheng for 5 days. We can derive from the visualizations that Shunyi and Changping are the less busiest districts compare to the other 7. Overall, the traffic decreases by 30% on Chinese New year and holidays.



Figure316.png

Heatmap of home addresses and starting points for taxis per day

The plot below represents the specific locations in Beijing for starting points of the 10357 taxis. We have developed, in R, a heatmap. It was critical, for this work, to understand which was the last trip of the day for the taxi driver (which represents home location).

The team thought that trip poing (home) was determined by speed=0 and a time difference with next trip of more than 6hours. Evidently, there was one for each taxi each day. The drivers arrived home at different hours to rest and work the following day. We were able to derive new variables and determine each ‘home’ point for each taxi driver. Then we ploted those points in an interactive heatmap



Figure416.png


Insights on working patterns for taxi drivers in Beijing

In this section we dug deeeper into Taxi behaviour to get specific insights on number of working hours, number of taxis per hour and how many days a week taxi drivers work in this Beijing sample.

The number of taxis per hour is represented below. We can observe that the number of hours in public holidays diminishes, as we assumed. Finally, the number of hours increases from 10am and peaks at 15 hours in both inside and outside inner circle.

Opposite to that, the speed increases as the traffic (and number of hours) reduces such as in Chinese New Year.

Total distance during the night reaches a minimum and increases from 9am during the day until 9pm.



Figure516.png

We have analyzed in R the Number of working days for the 10357 taxis. The distribution of this data is represented in the chart below. As we can see over 6500 taxis or almost 63% of the taxi fleet works 7 days a week while 20% of the fleet in this data works 6 days a week. There’s a very small percentage of taxi drivers working 4 or less number of days in a week.



Figure616.png

Finally, we show the distribution of the number of working hours for the 10357 taxi drivers in our dataset.


Figure716.png

Conclusion

Beijing is one of the busiest cities in the world. From a simple dataset of 4 variables: Taxi ID, Latitude, Longitude and Timestamp the team has managed to derive insights from different angles on Taxi driver behaviour, traffic patterns and other distributions.

This data analysis could be useful for Beijing’s municipalities. It could also serve as a research paper and tool for new start-ups who want to understand the taxi patterns and locations in Beijing.

Other uses of this research are:

- For Government: Due to limited information for Beijing's city construction and road planning, we may only give suggestions based on our cognition. The suggestions may not be applicable in real world, but it gives a general direction of how to mitigate congestion by improving the traffic arrangements. - For Taxi Companoies: We will suggest taxi companies in Beijing how to allocate taxis in a more efficient way. - For Commuters: Commuters may have a more clear view of the traffic condition at different time and site to make cleverer decisions on their transportation planning.


The data used for this analysis is dated 2008, however, it would be straightforward to apply it to a more recent dataset.

Overall, the team has demonstrated that by taking periodical samples of position for no matter what vehicle, useful insights can be derived for decision making.


Acknowledgements

The authors wish to thank Ting Seong KAM, professor of Visual analytics in School of Information Systems, Singapore Management University for his ongoing support.

References

  • [1] Introductory statistics with R. By Peter Dalgaard published in 2002.
  • [2] Tutorials from Datacamp on R visualization and ShinyR
  • [3] R Graphics cookbook by Winston Chang, 2012
  • [4] Data Visualization with R: 100 examples by Thomas Rahlf, published in 2017.
  • [5] T-Drive: Driving directions based on taxi trajectories pdf. By Microsoft November 1, 2010
  • [6] Urban computing with Taxicabs, October 20th, 2011
  • [7] Analyzing 1.1Billion NYC Taxi and Uber trips by Todd Schneider, 2018