| title | author | date |
|---|---|---|
#Developing Shiny App: Grab a Bike |
Fig & Co| i. Noraisha Yusuf (WQD180008)| ii.Low Tsu Siang (WQD180072)| iii.Kaveenaasvini (WQD180017)| Iv. Prabavathi (WQD180030) |
December 13, 2018 |
App.
A group assignment for WQD7001: Principle Data Science, University Malaya
A brief summary of this document is also available in presentation format at this link: Rpub link.
Introduction
Bike sharing culture is on the rise particularly in urban cities. Many urbanites have adopted this mode of transport. It provides an alternative means of travelling that is cheaper, healthier and environmentally friendly.
The City of Chicago launched Divvy, a new bike share system designed to connect people to transit, and to make short one-way trips across town easy. The Divvy bikes are gaining more popularity since its establishment in 2013.
Question
The Divvy bikes has an impressive amount of 592 bike stations in Chicago city. We were particularly interested with the number of trips at these locations in 2017. When and where do bikers normally ride their rented Divvy bikes?
We believed that these information would be useful to bikers, nearby and new businesses and the bike rental company itself. In view that these interested parties would be driven by different motives, it is essential that the shiny app allows the users to explore the bike trips in accordance with their preferences. It is challenging to visualize the distribution of bike trips across the 592 Divvy bike stations. Hence, we opted to develop the Shiny App to produce a spatial/map visualization of the bike trips.
Goal
In order to answer our questions, we have to map out and categorize the popularity of the bike stations based on:
- specific date
- time frame
- gender
Note: On this page, the focus is on describing our data preparation and exploratory data analysis. For the Shiny App codes, the ui.r and server.r files are available in the same github account.
Data Preparation
We obtained the dataset from https://www.kaggle.com/yingwurenjian/chicago-divvy-bicycle-sharing-data. The same dataset can also be obtained directly from Divvy Bike Share's website.
- Load the data. In view that our focus is only for 2017, we subset the data by filtering the year column to 2017 only.
bike <- read.csv("data.csv")
dataset <- subset(bike,bike$year == 2017)
library(dplyr)
library(tidyr)
library(tidyverse)
- A brief overview of the dataset.
glimpse(dataset)
- There are no missing values found in the dataset.
colSums(is.na(dataset))
- Aligning with our objective, we filtered the dataset further based on these variables
- stoptime: the time when the bike trips end
- to_station_name: the name of the bike destination station
- latitude_end: part of the coordinates of the bike destination station
- longitude_end: part of the coordinates of the bike destination station
- gender: male or female bikers
dataset <- dataset[c(10,20,21,22,8)]
- We separated the date and time from the variable stoptime into two columns respectively.
dataset <- separate(dataset, stoptime, into=c("arr_date","time"),sep=" ")
head(dataset)
- For the purpose of our analysis, we extracted the arrival hour information from time variable
dataset <- separate(dataset, time, into=c("arr_hour","min","sec"),sep=":")
dataset <- dataset[-c(3,4)]
- Renamed columns
colnames(dataset)[c(4,5)] <- c("latitude","longitude")
- Created a new column that computes the total number of trips given a date, gender and destination station
dataset$arr_date <- as.Date(dataset$arr_date)
dataset$arr_hour <- as.numeric(dataset$arr_hour)
dataset <- dataset %>%
group_by(arr_date,arr_hour,to_station_name,latitude,longitude,gender) %>%
summarise(total_trips = n())%>%
glimpse()
Exploratory Data Analysis
- We explored the statistical information of our processed dataset.
- The range of arrival date is from 1st January 2017 up till 1st January 2018.
- The median of arrival hour is around 3pm, implying about 50% of the bike trips occurs at 3 pm.
- Gender wise, Male bikers are significantly higher than female bikers by around 34%
- Lastly, the total number of trips given a specific date, hour and gender range from 1 to a maximum of 98 bike trips. The average number of trips is 1.924. Most of the bike trips i.e. 75% reached up to 2 trips only.
summary(dataset)
- Visualizing distribution of total number of bike trips by gender factor in 2017.
We can observe that the peak number of trips is similar for both genders, form mid-June to October 2017. The bike trips for male can reached more than 10,000 bike trips in a day during peak season.
library(ggplot2)
dately <- dataset %>%
group_by(arr_date,gender) %>%
summarise(total = sum(total_trips))
ggplot(dately, aes(x=arr_date,y=total))+
geom_jitter(aes(color = gender))+
theme_light()
- Visualizing bike trips according to bike stations
This is a basic view using tree map approach. From the diagram, we can see the more popular destination bike stations are Canal St & Adams St, ClintonSt & Madison St, Clinton St & Washington Blvd, Kingsbury st & Kinzie St, Canal St & Madison St, Daisy Center Plaza, Micigan Ave & Washington St. and Franklin St & Monroe St.
station <- dataset %>%
group_by(to_station_name) %>%
summarise(total = sum(total_trips))
sample_n(station, 10)
library(treemap)
treemap(station,index = "to_station_name",vSize = "total")
- Visualizing the peak number of bike trips by date
Majority of the high number of bike trips occurred between mid June up till October 2017. Hence, Summer to Autumn are the most preferred season to bike in Chicago.
dateterm <- dataset %>%
group_by(arr_date) %>%
summarise(total = sum(total_trips))
ggplot(dateterm, aes(x=arr_date,y=total,fill=total))+
geom_bar(stat = "identity") +
scale_fill_gradient(low="gray", high="blue") +
theme_light()+
coord_polar()+
labs(title="Polar Area Diagram", subtitle="Daily total number of trips", x="Date", y="Total trips")
- Visualizing the number of trips around the 24 hours duration
Majority preferred to bike around 7-8 am and 4-6 pm. The highest number of bike trips happened around 8 am and 5 pm. Although minimal, there are some bike trips nearing midnight time. This would be useful information to bike share company to ensure that their bikes are safe for use during late night time.
hourly <- dataset%>%
group_by(arr_hour)%>%
summarise(total = sum(total_trips))
ggplot(hourly, aes(x=arr_hour,y=total,fill=total))+
geom_bar(stat="identity") +
theme_light()+
coord_polar()+
scale_fill_gradient(low="gray", high="blue") +
labs(title="Polar Area Diagram", subtitle="Hourly number of trips", x="Hour", y="Total trips")
A quick look at the Grab a Bike app
Note: Please refer to the ui.r and server.r files which are available in the same github account, for the Shiny App construction.
