Objective and Motivation

Pronto Cycle Share was a public bicycle sharing system in Seattle, that operated from 2014 to 2017. The system, owned initially by a non-profit and later by the Seattle Department of Transportation, included 58 stations in the city's central neighbourhoods and above 500 bicycles.

Bike-sharing is a short distance transportation for people to make their life more convenient. When people use shared-bike, they can borrow and return bikes at any service stations. Some stations have too many incoming bike and get jammed without enough docks for upcoming bikes, while some other stations get empty quickly and lack enough bikes for people to check out.

Based on this problem, we want to measure the popularity of each station by calculating the degree, so that we can know which station people usually go to pick up or drop bike. And we also want to use different time ranges to monitor the degree difference for each station, so we can find what time the station will have lots of bike to pick up, or what time that there is no bike is available. Finally, we want to use the above information to provide company a route to re-dispatch bike among all the stations at a lower cost.

Data Preparation

Dataset

Our datasets are from Kaggle, there are three datasets named “Station”, “Trip” and “Weather” respectively. Station dataset records the information related to each station, we can get station id, station position and other useful data from here. Trip dataset is the most dataset in our project, from it we can get the date information of a trip to support us to do time series analysis, and the start and end station information to calculate the degree for each station. Weather dataset records the weather and temperature information of each day. However, this time, we didn’t use this dataset.

Handle with data

Preparation

Process

600px
Airline Traffic

Transform time series data type. Time series is essential for us to find out some trends about bike flows. We use lubridate package and anytime package to prepare our datasets. Finally, we can group trips by every hour, weekdays, months and years.

600px

Facet

Indegree and Outdegree means how many cars will come in and out. We calculate indegree and outdegree so that we can find out which stations need to re-distribution.

600px
Airline by Time

We use osmar package and crawl some routes from osmar website. Website is more flexible to collect data. Trips data source only give us each from-station and to-station, however, routes between stations are useful for us to design re-distribution routes. After collecting routes data, we save it in a matrix which is more convenient to read.

R packages used

Circlize

This package can help us to plot any graph like bar chart, line chart or histogram, into a circle region. We use this package is because it is useful, can plot beautiful graph, and what the most important is it’s friendly to beginners. We can easily to plot a basic graph after reading an example. In addition, this package also can help us to make some fantastic graph and to do some special analysis like Genomics. However, it also has some shortages, in my opinion, I think it cannot provide too many interactive functions as plotly.

Here is the basic concept of this package. For example, assume that we have a circle, the package firstly will divide the circle into some factors sectors based on how many factors you have. Then, for each factor sector, the package will divide it again into different tracks. So, after that, we can get lots of circle regions which we call them cell. The cell is the place that allow us to plot graph inside. And usually, the plotting order is from outside to inside, the direction is from “A” to “E”. So, this is how the circlize package works.

TSP

The second package we used is TSP, it’s not a package that can help us plot graph. it’s more likely to provide us an algorithm. Because we want to find a shortest path to optimize the way to re-dispatch bike. TSP, the full name is Travelling Salesman Problem. After we input the distance matrix, it can give us a way to calculate the Hamiltonian path. And this Hamiltonian path is the shortest path that we think can optimize the re-dispatch way. In our thinking, we calculate the difference of in-degree and out-degree of each station by daily. Then group all the station with a positive degree difference together and other station is another group. When the company re-dispatch the bike, we want to ask them go through the station which in positive degree group first, and then go through others. So, we will get two Hamiltonian paths, finally, merge these two paths together, we can get the whole path. And we regard this path as our optimize result.

Leaflet Map

Leaflet can be used to show geographical visualization in map. We use leaflet to show each station, draw the routes on the map, visualize indegree and outdegree with different colours and circles.

Heatmap by ggmap

Although ggmap is not interactive, it contains lots of methods for analysis. In this place, we use Kernel Density Estimation Method in ggmap to show the heatmap about distribution of stations and which places are used more frequency.

Plotly

Plotly package can provide us interactive basic plot for us. We use scatter plot to show the correlation of usage and weather. Boxplot can show the distribution of average demand by time series. In the boxplot, we can interactively choose stations with whatever you want to combine. The time interval grouped by Hour, Week, Month, Year will help us to find out the flows’ trends with time.

R shiny application

Preparation	Process
600px Airline Traffic	Transform time series data type. Time series is essential for us to find out some trends about bike flows. We use lubridate package and anytime package to prepare our datasets. Finally, we can group trips by every hour, weekdays, months and years.
600px Facet	Indegree and Outdegree means how many cars will come in and out. We calculate indegree and outdegree so that we can find out which stations need to re-distribution.
600px Airline by Time	We use osmar package and crawl some routes from osmar website. Website is more flexible to collect data. Trips data source only give us each from-station and to-station, however, routes between stations are useful for us to design re-distribution routes. After collecting routes data, we save it in a matrix which is more convenient to read.

G7 Report

Contents

Objective and Motivation

Data Preparation

Dataset

Handle with data

R packages used

Circlize

TSP

Leaflet Map

Heatmap by ggmap

Plotly

R shiny application

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools