Difference between revisions of "Group16 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 102: Line 102:
 
<br>[[File:Figure716.png|800px]]
 
<br>[[File:Figure716.png|800px]]
  
==Future work=
+
==Future work==
  
 
Beijing is one of the busiest cities in the world. From a simple dataset of 4 variables: Taxi ID, Latitude, Longitude and Timestamp the team has managed to derive insights from different angles on Taxi driver behaviour, traffic patterns and other distributions.
 
Beijing is one of the busiest cities in the world. From a simple dataset of 4 variables: Taxi ID, Latitude, Longitude and Timestamp the team has managed to derive insights from different angles on Taxi driver behaviour, traffic patterns and other distributions.

Revision as of 17:25, 12 August 2018

Urban Pulse: A Case Study on Beijing's Traffic

Proposal

Poster

Application

Report


Motivation

The increasing adoption of GPS and other location tracking technologies is leading to the colleaction of large spatio temporal datasets and this brings the opportunity for discovering of valuable knowledge of taxi fleet movement behavior which foresters new innovative applications and services that can be monetized. The main objective of this project is to find traffic patterns in Beijing. For this, we applied geospatial and statistical analysis on a Beijing Taxi fleet dataset from 2008. The user will be able to explore the data from various angles. Sophisticated graphs the group has see that were made in R inspired us to pursue this work but also two members of the team come from Beijing and we were all very interested in working out depth insights from a simple dataset of moving objects. Our methodology is comprised of two parts. First, we examine traffic conditions in different parts of Beijing during 1 day to find interesting patterns. Finally, we explore Beijing taxidrivers patterns and behaviour. For this we have used R. And we have also built an app for an easy visualization in Shiny R.

Introduction

Beijing, formerly romanized as Peking, is the capital of the People's Republic of China, the world's third most populous city proper, and most populous capital city. The city, located in northern China, is governed as a direct-controlled municipality under the national government with 16 urban, suburban, and rural districts. According to the report from ChinaDaily in early 2018, Beijing comes as the second top traffic congested city in mainland China, right following Jinan, Shandong. Statistics form 2015 suggests that the rush hour delay index of Beijing reaches as 2.046, with the speed of rush hour reaching 21.91 Km/h. Given such severe situation, Chinese government has been working hard to trace the traffic situation in Beijing and come up with implementable suggestions.

GPS-equipped taxis can be regarded as mobile sensors probing traffic flows on road surfaces, and taxi drivers are usually experienced in finding the fastest (quickest) route to a destination based on their knowledge.

In this paper, we mine driving directions from the historical GPS trajectories of a large number of taxis in Beijing, China to derive insights of traffic conditions in one of the busiest capitals of the world.

Design Framework

Our design framework consists of the following parts:

Data Preperation

Our data sourse is a huge dataset downloaded from Microsoft T-Drive Beijing trajectory data sample, which comprises 10357 text files. Each text file is named with a taxi id, recording the GPS coordinates of that taxi in every 10 mins during the week from 2008-02-02 to 2008-02-08. After aggregating all the text files, we got a table consisting of 16 million rows and 4 columns (taxi id, date time, latitude and longitude).

One of the major challenge we faced is that this original dataset doesn't provide enough variables. In order to get deep insights of Traffic Patterns in Beijing, our team tried to derived more information from the existing variables. Four important variables are derived from the time, latitude and longitude:

  • Time difference: calculation made from the Time variable which specifies what's the time difference between the previous sample and the current one. This was done by converting all date-time into numeric version directly applying minus to get the seconds. Most of the time difference are 600 seconds or 10 minutes, which is consistence with the description with Microsoft document. Moreover, the time difference helps us identify longer periods of time between rides. For instance, we can identify the last ride of the day and the first ride of next day.
  • Distance: this calculation helps identify the distance between one point to the next one in meters. It is done by using Geosphere package. However, this package can only help us by calculating the straight-line difference between two coordinates of latitude and longitude, which leads to some inaccuracy. The reasons why we didn't call google api to find the real distance are: a) we have almost 16 million coordinates, and it takes quite long machine time to do calculation, and b) the data was record in 2008, and the street in Beijing nowadays might be quite different with 10 years ago.
  • Speed: from the two previous new derived variables we can calculate the average speed for each sample. Also, we binned speed into 'low', 'medium', and 'high'. Firstly, This variable provides the traffic congestion information of different streets. Secondly, This variable has helped to understand when the taxi driver was not in movement. Combined with Time difference we could understand when the Taxi Driver arrived home in his last trip (time difference > 6 hours) for sleep. Therefore we could determine the location of each driver's home.
  • District: label the coordinates into 16 districts of Beijing. Since Beijing is quite big, analyze and compare the traffic patterns of difference area can help us get some deep insights. There are two steps to get the district label: a)find shape file of Beijing in district level and convert coordinates of polygons into latitude and longitude pattern with the help of sp, and project4 package; and b)label each points into one district using sp package. Also, based on the location of districts, we binned district variable into 'urban' and 'suburban'.

Below is a sample of our dataset after data wrangling and cleaning.

Group16 dataset.png

Design Framework and Visualization Methodology

Our project is mainly comprised of three parts, and design framework is shown in the figure.

  • Examine the traffic conditions by exploration
  • Find traffic patterns by statistical analysis
  • Analyze the working pattern of Beijing taxi drivers

Discussion

Visualization of Traffic patterns by hour of the day

The following chart shows data traffic patterns in Beijing during 24hours of the day. We can observe the traffic sharply increasing from 10am onwards.


Figure216.png

Visualization of traffic patterns by district

Beijing is comprised of 16 urban and suburban districts. We have analyzed the traffic pattern for 9 urban districts including: Changping, Chaoyang, Daxing, Dongcheng, Fengtai, Haidian, Miyun, Shungyi, Xicheng for 5 days. We can derive from the visualizations that Shunyi and Changping are the less busiest districts compare to the other 7. Overall, the traffic decreases by 30% on Chinese New year and holidays.



Figure316.png

Heatmap of home addresses and starting points for taxis per day

The plot below represents the specific locations in Beijing for starting points of the 10357 taxis. We have developed, in R, a heatmap. It was critical, for this work, to understand which was the last trip of the day for the taxi driver (which represents home location).

The team thought that trip poing (home) was determined by speed=0 and a time difference with next trip of more than 6hours. Evidently, there was one for each taxi each day. The drivers arrived home at different hours to rest and work the following day. We were able to derive new variables and determine each ‘home’ point for each taxi driver. Then we ploted those points in an interactive heatmap



Figure416.png


Insights on working patterns for taxi drivers in Beijing

In this section we dug deeeper into Taxi behaviour to get specific insights on number of working hours, number of taxis per hour and how many days a week taxi drivers work in this Beijing sample.

The number of taxis per hour is represented below. We can observe that the number of hours in public holidays diminishes, as we assumed. Finally, the number of hours increases from 10am and peaks at 15 hours in both inside and outside inner circle.

Opposite to that, the speed increases as the traffic (and number of hours) reduces such as in Chinese New Year.

Total distance during the night reaches a minimum and increases from 9am during the day until 9pm.



Figure516.png

We have analyzed in R the Number of working days for the 10357 taxis. The distribution of this data is represented in the chart below. As we can see over 6500 taxis or almost 63% of the taxi fleet works 7 days a week while 20% of the fleet in this data works 6 days a week. There’s a very small percentage of taxi drivers working 4 or less number of days in a week.



Figure616.png

Finally, we show the distribution of the number of working hours for the 10357 taxi drivers in our dataset.


Figure716.png

Future work

Beijing is one of the busiest cities in the world. From a simple dataset of 4 variables: Taxi ID, Latitude, Longitude and Timestamp the team has managed to derive insights from different angles on Taxi driver behaviour, traffic patterns and other distributions.

This data analysis could be useful for Beijing’s municipalities. It could also serve as a research paper and tool for new start-ups who want to understand the taxi patterns and locations in Beijing.

Other uses of this research are:

- For Government: Due to limited information for Beijing's city construction and road planning, we may only give suggestions based on our cognition. The suggestions may not be applicable in real world, but it gives a general direction of how to mitigate congestion by improving the traffic arrangements. - For Taxi Companoies: We will suggest taxi companies in Beijing how to allocate taxis in a more efficient way. - For Commuters: Commuters may have a more clear view of the traffic condition at different time and site to make cleverer decisions on their transportation planning.


The data used for this analysis is dated 2008, however, it would be relevant to apply it to a more recent dataset and real time data. Future work can consist of mapping a real time dataset with our application.

Overall, the team has demonstrated that by taking periodical samples of position for no matter what vehicle, useful insights can be derived for decision making.


Acknowledgements

The authors wish to thank Ting Seong KAM, professor of Visual analytics in School of Information Systems, Singapore Management University for his ongoing support.

References

  • [1] Introductory statistics with R. By Peter Dalgaard published in 2002.
  • [2] Tutorials from Datacamp on R visualization and ShinyR
  • [3] R Graphics cookbook by Winston Chang, 2012
  • [4] Data Visualization with R: 100 examples by Thomas Rahlf, published in 2017.
  • [5] T-Drive: Driving directions based on taxi trajectories pdf. By Microsoft November 1, 2010
  • [6] Urban computing with Taxicabs, October 20th, 2011
  • [7] Analyzing 1.1Billion NYC Taxi and Uber trips by Todd Schneider, 2018