Objective and Motivation

Pronto Cycle Share was a public bicycle sharing system in Seattle, that operated from 2014 to 2017. The system, owned initially by a non-profit and later by the Seattle Department of Transportation, included 58 stations in the city's central neighbourhoods and above 500 bicycles.

Bike-sharing is a short distance transportation for people to make their life more convenient. When people use shared-bike, they can borrow and return bikes at any service stations. Some stations have too many incoming bike and get jammed without enough docks for upcoming bikes, while some other stations get empty quickly and lack enough bikes for people to check out.

Based on this problem, we want to measure the popularity of each station by calculating the degree, so that we can know which station people usually go to pick up or drop bike. And we also want to use different time ranges to monitor the degree difference for each station, so we can find what time the station will have lots of bike to pick up, or what time that there is no bike is available. Finally, we want to use the above information to provide company a route to re-dispatch bike among all the stations at a lower cost.

Data Preparation

Dataset

Our datasets are from Kaggle, there are three datasets named “Station”, “Trip” and “Weather” respectively. Station dataset records the information related to each station, we can get station id, station position and other useful data from here. Trip dataset is the most dataset in our project, from it we can get the date information of a trip to support us to do time series analysis, and the start and end station information to calculate the degree for each station. Weather dataset records the weather and temperature information of each day. However, this time, we didn’t use this dataset.

Handle with data

Preparation	Process
Date data	Transform time series data type. Time series is essential for us to find out some trends about bike flows. We use lubridate package and anytime package to prepare our datasets. Finally, we can group trips by every hour, weekdays, months and years.
Degree	Indegree and Outdegree means how many cars will come in and out. We calculate indegree and outdegree so that we can find out which stations need to re-distribution.
Distance	We use osmar package and crawl some routes from osmar website. Website is more flexible to collect data. Trips data source only give us each from-station and to-station, however, routes between stations are useful for us to design re-distribution routes. After collecting routes data, we save it in a matrix which is more convenient to read.

R packages used

Circlize

This package can help us to plot any graph like bar chart, line chart or histogram, into a circle region. We use this package is because it is useful, can plot beautiful graph, and what the most important is it’s friendly to beginners. We can easily to plot a basic graph after reading an example. In addition, this package also can help us to make some fantastic graph and to do some special analysis like Genomics. However, it also has some shortages, in my opinion, I think it cannot provide too many interactive functions as plotly.

Here is the basic concept of this package. For example, assume that we have a circle, the package firstly will divide the circle into some factors sectors based on how many factors you have. Then, for each factor sector, the package will divide it again into different tracks. So, after that, we can get lots of circle regions which we call them cell. The cell is the place that allow us to plot graph inside. And usually, the plotting order is from outside to inside, the direction is from “A” to “E”. So, this is how the circlize package works.

TSP

The second package we used is TSP, it’s not a package that can help us plot graph. it’s more likely to provide us an algorithm. Because we want to find a shortest path to optimize the way to re-dispatch bike. TSP, the full name is Travelling Salesman Problem. After we input the distance matrix, it can give us a way to calculate the Hamiltonian path. And this Hamiltonian path is the shortest path that we think can optimize the re-dispatch way. In our thinking, we calculate the difference of in-degree and out-degree of each station by daily. Then group all the station with a positive degree difference together and other station is another group. When the company re-dispatch the bike, we want to ask them go through the station which in positive degree group first, and then go through others. So, we will get two Hamiltonian paths, finally, merge these two paths together, we can get the whole path. And we regard this path as our optimize result.

Leaflet Map

Leaflet can be used to show geographical visualization in map. We use leaflet to show each station, draw the routes on the map, visualize indegree and outdegree with different colours and circles.

Heatmap by ggmap

Although ggmap is not interactive, it contains lots of methods for analysis. In this place, we use Kernel Density Estimation Method in ggmap to show the heatmap about distribution of stations and which places are used more frequency.

Plotly

Plotly package can provide us interactive basic plot for us. We use scatter plot to show the correlation of usage and weather. Boxplot can show the distribution of average demand by time series. In the boxplot, we can interactively choose stations with whatever you want to combine. The time interval grouped by Hour, Week, Month, Year will help us to find out the flows’ trends with time.

R shiny application

Shiny Page	Explanation
Dashboard	This Rshiny page is used to show stations on the map and the path between every two stations. The leaflet map is the benchmark for us to design re-distribution routes. The normalized indegree and outdegree are showed on the map. Indegree and outdegree are used for us to analyse whether should take in or out. At the bottom of map, interactive boxplot with time series will show us average demand which is useful for us to find bike trends such as rush hour, tourists usage and so on.
Station Drilldown	The circle graph shows us the hourly distribution of in-degree, out-degree, and the difference between them, Where the green means out-degree and the red means in-degree. Inside the circle is the link relationships of these stations, we can know which pair of stations have relationship from this. On the right, we also can see the detail information of these link relationships in the table. We also can adjust the from station and to station in the input region. Once we choose any station, we can also see the degree statistic by hourly and weekday of the station that we chose.
Result	This is the result that we provide to company to find a shortest re-dispatch route. You can see the specified route on the map, and you also can get the order of stations in the route in the table. In addition, we also allow users to choose a start station by themselves. If you choose a station as the start station, you can see the particular station occur at the first position in the table.

G7 Report

Contents

Objective and Motivation

Data Preparation

Dataset

Handle with data

R packages used

Circlize

TSP

Leaflet Map

Heatmap by ggmap

Plotly

R shiny application

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools