G7 Report
|
|
|
|
Contents
Objective and Motivation
Pronto Cycle Share was a public bicycle sharing system in Seattle, that operated from 2014 to 2017. The system, owned initially by a non-profit and later by the Seattle Department of Transportation, included 58 stations in the city's central neighbourhoods and above 500 bicycles.
Bike-sharing is a short distance transportation for people to make their life more convenient. When people use shared-bike, they can borrow and return bikes at any service stations. Some stations have too many incoming bike and get jammed without enough docks for upcoming bikes, while some other stations get empty quickly and lack enough bikes for people to check out.
Based on this problem, we want to measure the popularity of each station by calculating the degree, so that we can know which station people usually go to pick up or drop bike. And we also want to use different time ranges to monitor the degree difference for each station, so we can find what time the station will have lots of bike to pick up, or what time that there is no bike is available. Finally, we want to use the above information to provide company a route to re-dispatch bike among all the stations at a lower cost.
Data Preparation
Dataset
Our datasets are from Kaggle, there are three datasets named “Station”, “Trip” and “Weather” respectively. Station dataset records the information related to each station, we can get station id, station position and other useful data from here. Trip dataset is the most dataset in our project, from it we can get the date information of a trip to support us to do time series analysis, and the start and end station information to calculate the degree for each station. Weather dataset records the weather and temperature information of each day. However, this time, we didn’t use this dataset.
Handle with data
600px |
|
||||||||||
|
|
||||||||||
600px |
R packages usedCirclize
TSPThe second package we used is TSP, it’s not a package that can help us plot graph. it’s more likely to provide us an algorithm. Because we want to find a shortest path to optimize the way to re-dispatch bike. TSP, the full name is Travelling Salesman Problem. After we input the distance matrix, it can give us a way to calculate the Hamiltonian path. And this Hamiltonian path is the shortest path that we think can optimize the re-dispatch way. In our thinking, we calculate the difference of in-degree and out-degree of each station by daily. Then group all the station with a positive degree difference together and other station is another group. When the company re-dispatch the bike, we want to ask them go through the station which in positive degree group first, and then go through others. So, we will get two Hamiltonian paths, finally, merge these two paths together, we can get the whole path. And we regard this path as our optimize result. Leaflet MapLeaflet can be used to show geographical visualization in map. We use leaflet to show each station, draw the routes on the map, visualize indegree and outdegree with different colours and circles. Heatmap by ggmapAlthough ggmap is not interactive, it contains lots of methods for analysis. In this place, we use Kernel Density Estimation Method in ggmap to show the heatmap about distribution of stations and which places are used more frequency. PlotlyPlotly package can provide us interactive basic plot for us. We use scatter plot to show the correlation of usage and weather. Boxplot can show the distribution of average demand by time series. In the boxplot, we can interactively choose stations with whatever you want to combine. The time interval grouped by Hour, Week, Month, Year will help us to find out the flows’ trends with time. R shiny application
|