UBER PICKUPS IN NEW YORK CITY

Data visualization using R

Contributors:

Shon Inouye
Edward Kim

Introduction

This repo is dedicated to create data visualization for uber and other for-hire vehicle pickups in New York City. We used datasets from Kaggle. The goal of this project was to visualize the data in different ways and point out any interesting discoveries. One thing we sought to discover was the meaning behind the base codes that accompanied each uber pickup. A quick search led to these codes being associated with several of Uber's bases.

Base Code	Base Name
B02512	Unter
B02598	Hinter
B02617	Weiter
B02682	Schmecken
B02764	Danach-NY
B02765	Grun
B02835	Dreist
B02836	Drinnen

These are Uber's bases located in New York. Each uber pickup is affiliated with a TLC (Taxi and Limousine Commission) company base.

Loading Packages

library(ggplot2)
library(plotly) #used along with ggplot2 for data visualization.
library(ggmap)  #used for geocoding
library(plyr)   #used along with dplyr to aggregate data
library(dplyr)

Plot of Total Uber Pickups

First, we read the data taken from Kaggle.com. In order to keep the plot simple, we only used Uber pickup data from April 2014.

# Read data
uber_apr14 <- read.csv("/Users/Shon/Documents/MyProjects/Data Science/uber data visualization/uber-pickups-in-new-york-city/uber-raw-data-apr14.csv")

After reading the data, added a column for the day of the week. This will be used later on to plot the data over the course of the month.

# Remove minutes and seconds from Date.Time and create Day column
uber_apr14$Date.Time <- as.Date(uber_apr14$Date.Time, "%m/%d/%Y")
uber_apr14$Day <- format(as.Date(uber_apr14$Date.Time, format = "%m/%d/%Y"), "%d") #adds a Day column

We then got a map of New York City using the ggmap package.

# Get map of New York City
NY <- get_map(location = c(lon = mean(uber_apr14$Lon), lat = mean(uber_apr14$Lat)), 
             zoom = 10, 
             maptype = "roadmap", 
             color = "bw")
NYmap <- ggmap(NY)

Using the map of New York City as the background, we plotted the Uber pickup locations. Because we used the "city" zoom value for our map of New York City, we decided that all pickup locations outside the map would be labeled as outliers.

# Plot Uber pickup locations on top of New York map
fullMap <- NYmap + geom_point(data = uber_apr14,
                                 aes(x = Lon, y = Lat),
                                 colour = '#000066',
                                 size = 0.1, alpha = 0.5, na.rm = TRUE)
fullMap

Plot of Uber Pickups by TLC Base Code

With every Uber pickup, there is a TLC base company code that is associated with it. To get a better understanding of what these codes mean, we created a bar graph of the number of pickups for each base code.

# SEPARATING DATA BASED ON UBER BASE 
uber_baseB02512 <- filter(uber_apr14, Base == "B02512")
uber_baseB02598 <- filter(uber_apr14, Base == "B02598")
uber_baseB02617 <- filter(uber_apr14, Base == "B02617")
uber_baseB02682 <- filter(uber_apr14, Base == "B02682")
uber_baseB02764 <- filter(uber_apr14, Base == "B02764")

# CREATE BAR GRAPH OF UBER DATA BY BASE
numrow_B02512 <- nrow(uber_baseB02512)
numrow_B02598 <- nrow(uber_baseB02598)
numrow_B02617 <- nrow(uber_baseB02617)
numrow_B02682 <- nrow(uber_baseB02682)
numrow_B02764 <- nrow(uber_baseB02764)

# Create bar graph using plotly
basedata <- plot_ly(x = c("B02512", "B02598", "B02617", "B02682", "B02764"),
                y = c(numrow_B02512, numrow_B02598, numrow_B02617, numrow_B02682, numrow_B02764),
                name = "Uber Pickups by TLC Base Code",
                type = "bar")
basedata

Here, we can see that there is a very large difference between the number of pickups for each base code. Base codes B02682 (Schmecken) and B02598 (Hinter) have over 150,000 pickups in the month of April while base codes B02512 (Unter) and B02764 (Danach-NY) do not even reach 50,000 pickups. Now, we can plot the pickup points on a map and distinguish them based on their base codes.

# PLOT BY BASES
base_plot <- NYmap + 
  geom_point(data = uber_apr14, 
         mapping = aes(x = Lon, y = Lat, colour = factor(uber_apr14$Base)), 
         size = 0.1, alpha = 0.4, na.rm = TRUE) + 
scale_color_manual(values = c("#CA226B", "#7F462c", "#FF0033", "#2B60DE", "#008080"))

base_plot

Plot of Uber Pickups for each Individual Base Code

We can also plot the pickups for each base individually in order to see which areas these codes cover.

### PLOT DATA BY UBER BASE ###
#Base B02512
base_B02512 <- nyMap + 
  geom_point(data = uber_baseB02512, 
         mapping = aes(x = Lon, y = Lat,  colour = factor(uber_baseB02512$Base)), 
         alpha = 0.4, na.rm = TRUE) +
  scale_color_manual(values = c("#008080"))

#Base B02598
base_B02598 <- nyMap + 
  geom_point(data = uber_baseB02598, 
         mapping = aes(x = Lon, y = Lat,  colour = factor(uber_baseB02598$Base)), 
         alpha = 0.4, na.rm = TRUE) +
  scale_color_manual(values = c("#7F462c"))

#Base B02617
base_B02617 <- nyMap + 
  geom_point(data = uber_baseB02617, 
         mapping = aes(x = Lon, y = Lat,  colour = factor(uber_baseB02617$Base)), 
         alpha = 0.4, na.rm = TRUE) +
  scale_color_manual(values = c("#FF2400"))

#Base B02682
base_B02682 <- nyMap + 
  geom_point(data = uber_baseB02682, 
         mapping = aes(x = Lon, y = Lat,  colour = factor(uber_baseB02682$Base)), 
         alpha = 0.4, na.rm = TRUE) +
  scale_color_manual(values = c("#2B60DE"))

#Base B02764
base_B02764 <- nyMap + 
  geom_point(data = uber_baseB02764, 
         mapping = aes(x = Lon, y = Lat,  colour = factor(uber_baseB02764$Base)), 
         alpha = 0.4, na.rm = TRUE) +
  scale_color_manual(values = c("#CA226B"))

plot(base_B02512)
plot(base_B02598)
plot(base_B02617)
plot(base_B02682)
plot(base_B02764)

Using Plotly to See Pickups Over Time

uber_apr14$Date.Time <- as.Date(uber_apr14$Date.Time, "%m/%d/%Y")
uber_apr14$Day <- format(as.Date(uber_apr14$Date.Time, format = "%m/%d/%Y"), "%d") #adds a Day column
#counting the rides for each day
newber <- count(uber_apr14, as.numeric(Day))
colnames(newber)[1] <- "Day"
colnames(newber)[2] <- "Number of Rides"


line <- ggplot(newber, aes(x = Day, y = `Number of Rides`)) +
 geom_area(alpha = 1, position = position_dodge(width = 0.05), color = '#99cc00', fill = '#787a76') +
xlab("Day") + ylab("Total Rides per Day") +
ggtitle("Total Uber Rides in New York City in the month of April") + 
theme_minimal()

ggplotly(line)

We created a new column in our dataset that had each uber pickup with the day it occured. This was necessary because the original date was in a format that was not compatible. Once we had the day for each pickup, we could sum the number of rides on for a particular day and graph it to create an interactive linegraph.

We chose specifically the month of april, not for any particular reason, in order to showcase a linegraph produced through plotly. We see that across the course of the month, the total number of rides fluctuate between highs and lows, and we see that at the end of the month there is a spike in usage.

Conclusion

Given our dataset, we produced points on a map, a histogram, and a line graph. Our original goal was to visualize the data in order to see any interesting observations. By geocoding, we were able to clearly see which areas of New York were more dense in activity. If the maps by base were not enough, the histogram clearly shows which Uber bases complete more pickups. Finally, the line graph allows us to see if there are any trends within the pickup activity.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
IMAGE		IMAGE
IMAGES		IMAGES
Images		Images
Inertia7_uberDataVisualization.md		Inertia7_uberDataVisualization.md
README.md		README.md
uber_data_visualization.R		uber_data_visualization.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UBER PICKUPS IN NEW YORK CITY

Data visualization using R

Contributors:

Introduction

Loading Packages

Plot of Total Uber Pickups

Plot of Uber Pickups by TLC Base Code

Plot of Uber Pickups for each Individual Base Code

Using Plotly to See Pickups Over Time

Conclusion

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

shoninouye/DataVisualization_Uber_R

Folders and files

Latest commit

History

Repository files navigation

UBER PICKUPS IN NEW YORK CITY

Data visualization using R

Contributors:

Introduction

Loading Packages

Plot of Total Uber Pickups

Plot of Uber Pickups by TLC Base Code

Plot of Uber Pickups for each Individual Base Code

Using Plotly to See Pickups Over Time

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages