The Cyclistic Bike Share Case Study is a capstone project to complete the Google Data Analytics Professional Certificate on Coursera. This serves to demonstrate skills from the course following classic data analysis process: Ask, Prepare, Process, Analyze, Share and Act.
Cyclistic is a bike-share company based in Chicago, IL. It first launched in 2016 and has since grown significantly to a fleet of 5,824 bicycles with a network of 692 geo-tracked stations across the city. The bikes can be unlocked from any station and returned to any other station in the system at any time. This system is designed to encourage cycling as a mode of transportation, allowing users to rent bikes from any location at their convenience, without the challenges of owning, maintaining and storing their own bicycle. Cyclistic offers flexible pricing plans such as single-ride passes, full-day passes and annual memberships.
Cyclistic’s marketing strategy started with building general awareness to reach broad consumer segments. While previous marketing campaigns were designed to target all-new customers, the company’s marketing director believes there is a solid opportunity to convert casual riders into members and maximizing the number of annual memberships will be the key to future growth.
I am a junior data analyst working on the marketing analyst team at Cyclistic. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. The team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members.
Director of Marketing (my manager) is responsible for development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclistic Marketing Analytics Team (my team) is responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
Cyclistic Executive Team is notoriously detail-oriented and will decide whether to approve the recommended marketing program.
I am assigned to the first of 3 main questions to guide Cyclistic’s future marketing program:
Identify the data needed to complete analysis
Cyclistic is a fictional company based on divvybikes.com.
This case study uses this public data and is made available by Motivate International Inc. under this license.
I used RStudio to write R code to complete all phases of this analysis, including loading, processing (cleaning) and building visualizations. I produced this report using R Markdown, enhanced with HTML/CSS.
Load, format, and clean data
R packages referenced in this project: pacman, tidyverse, readr, tidyr, dplyr, ggplot2, here, knitr, scales, kableExtra, glue
if (requireNamespace("pacman", quietly = TRUE)) {
pacman::p_load(
tidyverse, readr, tidyr, dplyr, ggplot2,
here, knitr, scales, kableExtra, glue
)
} else {
stop("Please install the 'pacman' package before knitting.")
}
Retrieve 12 monthly .csv files (Jan - Dec 2024) to create a single
dataset called tripdata
# Load 12 monthly csv files from divvy tripdata
tripdata_01_2024 <- read_csv(here("data","x202401-divvy-tripdata.csv"))
tripdata_02_2024 <- read_csv(here("data","x202402-divvy-tripdata.csv"))
tripdata_03_2024 <- read_csv(here("data","x202403-divvy-tripdata.csv"))
tripdata_04_2024 <- read_csv(here("data","x202404-divvy-tripdata.csv"))
tripdata_05_2024 <- read_csv(here("data","x202405-divvy-tripdata.csv"))
tripdata_06_2024 <- read_csv(here("data","x202406-divvy-tripdata.csv"))
tripdata_07_2024 <- read_csv(here("data","x202407-divvy-tripdata.csv"))
tripdata_08_2024 <- read_csv(here("data","x202408-divvy-tripdata.csv"))
tripdata_09_2024 <- read_csv(here("data","x202409-divvy-tripdata.csv"))
tripdata_10_2024 <- read_csv(here("data","x202410-divvy-tripdata.csv"))
tripdata_11_2024 <- read_csv(here("data","x202411-divvy-tripdata.csv"))
tripdata_12_2024 <- read_csv(here("data","x202412-divvy-tripdata.csv"))
# Combine to make single data set called tripdata
tripdata <- bind_rows(
tripdata_01_2024,
tripdata_02_2024,
tripdata_03_2024,
tripdata_04_2024,
tripdata_05_2024,
tripdata_06_2024,
tripdata_07_2024,
tripdata_08_2024,
tripdata_09_2024,
tripdata_10_2024,
tripdata_11_2024,
tripdata_12_2024
)
# Show total number of records
glue("<br>**Total Records:** {comma(nrow(tripdata))}<br>")
# Pare data to only the columns used for analysis
tripdata <- tripdata %>%
select(
ride_id,
rideable_type,
started_at,
ended_at,
member_casual
)
# Create metadata table to describe specific columns to be used for analysis
column_metadata <- tibble::tibble(
Column = c("ride_id", "rideable_type", "started_at", "ended_at", "member_casual"),
`Data Type` = c("character", "character", "POSIXct", "POSIXct", "character"),
description = c(
"Unique ID for each ride",
"Type of bike used",
"Ride start timestamp",
"Ride end timestamp",
"Rider type (casual or member)"
)
)
# Show column names, data type, description using kableExtra
column_metadata %>%
kable(
caption = "Pared columns used for analysis"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
)
# Add vertical spacing
glue("<br>")
| Column | Data Type | description |
|---|---|---|
| ride_id | character | Unique ID for each ride |
| rideable_type | character | Type of bike used |
| started_at | POSIXct | Ride start timestamp |
| ended_at | POSIXct | Ride end timestamp |
| member_casual | character | Rider type (casual or member) |
These are the unique identifiers for each bike trip tracked
# Count number of ride_id duplicates
dupe_ride_id_count <- sum(duplicated(tripdata$ride_id))
glue("<br>**Duplicate records removed:** {dupe_ride_id_count}<br>")
# Remove duplicate ride_ids from tripdata
tripdata <- tripdata %>%
distinct(ride_id, .keep_all = TRUE)
Duplicate records removed: 211
trip_duration, month, day_of_week, hour_of_day, day_type
tripdata <- tripdata %>%
mutate(
trip_duration = as.numeric(difftime(ended_at, started_at, units="mins")), # sets end - start time in minutes
month = format(started_at, "%b"), # sets month of start time, abbreviated
day_of_week = weekdays(started_at), # sets day of week of start time
hour_of_day = as.numeric(format(started_at, "%H")), # sets hour of ride 0 - 23 based on start time
day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday") # sets as weekday or weekend
)
# Save off trips to be excluded from tripdata
excluded_trips <- tripdata %>%
filter(trip_duration < 1 | trip_duration > 1440) %>%
mutate(reason = case_when(
trip_duration < 1 ~ "Too Short (<1 min)",
trip_duration > 1440 ~ "Too Long (>1 day)"
))
# Summarize number of excluded trips by reason, then add a total row
excluded_summary <- excluded_trips %>%
count(reason, name = "Trip Count") %>%
bind_rows(
tibble(reason = "Total Trips Excluded Based on Duration Criteria", `Trip Count` = sum(.$`Trip Count`))
)
# Add vertical spacing
glue("<br>")
# Display summary table using kableExtra
excluded_summary %>%
kable() %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
) %>%
row_spec(nrow(excluded_summary), bold = TRUE)
# Remove excluded trips
tripdata <- tripdata %>%
filter(trip_duration >= 1, trip_duration <= 1440)
# Add vertical spacing
glue("<br>")
| reason | Trip Count |
|---|---|
| Too Long (>1 day) | 7553 |
| Too Short (<1 min) | 131530 |
| Total Trips Excluded Based on Duration Criteria | 139083 |
# Add vertical spacing
glue("<br>")
# Create summary table
rideable_summary <- tripdata %>%
count(rideable_type, name = "Trip Count") %>%
bind_rows(
summarise(., rideable_type = "Total", `Trip Count` = sum(`Trip Count`))
)
# Display summary table using kableExtra
rideable_summary %>%
kable() %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
) %>%
row_spec(nrow(rideable_summary), bold = TRUE)
# Add vertical spacing
glue("<br>")
| rideable_type | Trip Count |
|---|---|
| classic_bike | 2714623 |
| electric_bike | 2869067 |
| electric_scooter | 137584 |
| Total | 5721274 |
# Identify electric scooter rentals to be excluded from tripdata
electric_scooters <- tripdata %>%
filter(rideable_type == "electric_scooter") %>%
count(month, name = "Count") %>%
bind_rows(
summarize(.,
month = "Electric Scooter Trips Excluded",
Count = sum(Count))
)
# Add vertical spacing
glue("<br>")
# Display electric scooters summary table using kableExtra
electric_scooters %>%
kable() %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE
) %>%
row_spec(nrow(electric_scooters), bold = TRUE)
# Remove electric scooters from tripdata
tripdata <- tripdata %>%
filter(rideable_type != "electric_scooter")
# Add vertical spacing after the table
glue("<br>")
| month | Count |
|---|---|
| Aug | 82 |
| Sep | 137502 |
| Electric Scooter Trips Excluded | 137584 |
# Show total number of records after cleaning
glue("<br>**Total Records:** {comma(nrow(tripdata))}<br>")
# Create metadata table to describe specific columns to be used for analysis
column_metadata_cleaned <- tibble::tibble(
Column = c("ride_id", "rideable_type", "started_at", "ended_at", "member_casual", "trip_duration", "month",
"day_of_week", "hour_of_day", "day_type"),
`Data Type` = c("character", "character", "POSIXct", "POSIXct", "character", "double", "character", "character", "integer", "character"),
description = c(
"Unique ID for each ride",
"Type of bike used",
"Ride start timestamp",
"Ride end timestamp",
"Rider type (casual or member)",
"Calculate trip duration in minutes",
"Month of ride start",
"Day of week of ride start",
"Hour of ride start",
"Day type (weekend or weekday)"
)
)
# Show column names, data type, description using kableExtra
column_metadata_cleaned %>%
kable(caption = "Trip Data (cleaned): Column Descriptions") %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE)
# Add vertical spacing after the table
glue("<br>")
| Column | Data Type | description |
|---|---|---|
| ride_id | character | Unique ID for each ride |
| rideable_type | character | Type of bike used |
| started_at | POSIXct | Ride start timestamp |
| ended_at | POSIXct | Ride end timestamp |
| member_casual | character | Rider type (casual or member) |
| trip_duration | double | Calculate trip duration in minutes |
| month | character | Month of ride start |
| day_of_week | character | Day of week of ride start |
| hour_of_day | integer | Hour of ride start |
| day_type | character | Day type (weekend or weekday) |
Create visualizations to gather insights
# Summarize trip counts and compute share
rider_share <- tripdata %>%
count(member_casual) %>%
mutate(
share = n / sum(n),
label = paste0(comma(n), "\n", percent(share, accuracy = 0.1))
)
# Add vertical spacing before the chart
glue("<br><br>")
# Plot pie chart
ggplot(rider_share, aes(x = "", y = share, fill = member_casual)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y") +
geom_text(aes(label = label), position = position_stack(vjust = 0.5), color = "white", size = 5, fontface = "bold") +
scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
labs(
title = "Share of Trips by Rider Type",
fill = "Rider Type"
) +
theme_minimal() +
theme(
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b=10)
),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Summarize trip counts and compute share
bike_share <- tripdata %>%
count(member_casual, rideable_type) %>%
group_by(member_casual) %>%
mutate(
share = n / sum(n),
label = paste0(comma(n), "\n", percent(share, accuracy = .1))
)
# Add vertical spacing before the chart
glue("<br><br>")
# Plot pie chart
ggplot(bike_share, aes(x = "", y = share, fill = rideable_type)) +
geom_bar(stat = "identity", width = 1, color = "white") +
coord_polar("y") +
geom_text(aes(label = label),
position = position_stack(vjust = 0.5),
color = "white", size = 5, fontface = "bold") +
scale_fill_manual(
values = c("classic_bike" = "burlywood2", "electric_bike" = "darkseagreen"),
labels = c("classic_bike" = "Classic Bike", "electric_bike" = "Electric Bike")
) +
facet_wrap(~member_casual, labeller = as_labeller(c(
"casual" = "Casual",
"member" = "Member"
))) +
labs(
title = "Share of Trips by Bike Type",
subtitle = "Split by Rider Type",
fill = "Bike Type"
) +
theme_minimal() +
theme(
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
strip.text = element_text(size = 12, face="bold"),
# Add spacing below title and subtitle
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(b = 10)
), # bottom margin
plot.subtitle = element_text(
face = "bold",
hjust = 0.5,
size = 14,
margin = margin(b = 30)
),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Set month order
tripdata <- tripdata %>%
mutate(month = factor(
month,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
ordered = TRUE
))
# Summarize trip counts by month and rider type
monthly_counts <- tripdata %>%
count(month, member_casual)
# Identify min and max trips for each rider type
label_points <- monthly_counts %>%
group_by(member_casual) %>%
filter(n == max(n) | n == min(n)) %>%
ungroup()
# Add vertical spacing before the chart
glue("<br><br>")
# Plot bar chart
ggplot(monthly_counts, aes(x = month, y = n, fill = member_casual)) +
geom_col(position = "dodge") +
geom_text(
data = label_points,
aes(label = comma(n)),
position = position_dodge(width = 0.9),
vjust = -0.4,
size = 4,
fontface = "bold"
) +
scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
labs(
title = "Trips by Month and Rider Type",
y = NULL,
x = NULL,
fill = "Rider Type"
) +
theme_minimal() +
theme(
axis.text = element_text(size = 12),
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b = 10)
),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Extract date and day of week
tripdata <- tripdata %>%
mutate(
date = as.Date(started_at),
day_of_week = factor(
weekdays(date),
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
ordered = TRUE
)
)
# Count daily trips per rider type and day of week
daily_counts <- tripdata %>%
group_by(date, day_of_week, member_casual) %>%
summarise(trips = n(), .groups = "drop")
# Average trips per day of week and rider type
avg_trip_counts <- daily_counts %>%
group_by(day_of_week, member_casual) %>%
summarise(avg_trips = mean(trips), .groups = "drop")
# Identify min/max for labels
extreme_counts <- avg_trip_counts %>%
group_by(member_casual) %>%
filter(avg_trips == max(avg_trips) | avg_trips == min(avg_trips)) %>%
ungroup()
# Add vertical spacing before the chart
glue("<br><br>")
# Plot bar chart with labels for min/max
ggplot(avg_trip_counts, aes(x = day_of_week, y = avg_trips, fill = member_casual)) +
geom_col(position = "dodge") +
geom_text(
data = extreme_counts,
aes(label = comma(round(avg_trips, 0))),
position = position_dodge(width = 0.9),
vjust = -0.5,
fontface = "bold",
size = 3.5
) +
scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
labs(
title = "Average Daily Trips by Day of Week and Rider Type",
y = "Avg Daily Trips",
x = NULL,
fill = "Rider Type"
) +
theme_minimal() +
theme(
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b = 15)
),
axis.text = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Summary of average trips by day_type
trip_summary <- tripdata %>%
mutate(
date = as.Date(started_at),
day_of_week = weekdays(date),
day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
) %>%
group_by(date, day_type, member_casual) %>%
summarize(trip_count = n(), .groups = "drop") %>%
group_by(day_type, member_casual) %>%
summarize(avg_daily_trips = mean(trip_count), .groups = "drop")
# Add vertical spacing before the chart
glue("<br><br>")
# Plot average daily trip counts
ggplot(trip_summary, aes(x = day_type, y = avg_daily_trips, fill = member_casual)) +
geom_col(position = "dodge") +
geom_text(
aes(label = comma(round(avg_daily_trips))),
position = position_dodge(width = 0.9),
vjust = -0.5,
size = 3.5,
fontface = "bold"
) +
scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
labs(
title = "Average Daily Trips by Day Type and Rider Type",
x = NULL,
y = "Avg Daily Trips",
fill = "Rider Type"
) +
theme_minimal() +
theme(
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b = 15)
),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Prepare summarized data (filtered to Weekdays only)
hourly_avg_weekday <- tripdata %>%
mutate(
date = as.Date(started_at),
hour_of_day = as.integer(format(started_at, "%H")),
day_of_week = weekdays(date),
day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
) %>%
filter(day_type == "Weekday") %>%
group_by(date, hour_of_day, member_casual) %>%
summarize(trip_count = n(), .groups = "drop") %>%
group_by(hour_of_day, member_casual) %>%
summarize(avg_trips = mean(trip_count), .groups = "drop")
# Add vertical spacing before the chart
glue("<br><br>")
# Plot average weekday trips by hour, with highlighted peak hours
ggplot(hourly_avg_weekday, aes(x = hour_of_day, y = avg_trips, fill = member_casual)) +
# Highlight morning peak (7–9 AM)
annotate("rect", xmin = 6.5, xmax = 9.5, ymin = -Inf, ymax = Inf,
fill = "yellow", alpha = 0.3) +
# Highlight evening peak (4–6 PM)
annotate("rect", xmin = 15.5, xmax = 18.5, ymin = -Inf, ymax = Inf,
fill = "yellow", alpha = 0.3) +
geom_col(position = "dodge") +
scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
scale_x_continuous(breaks = 0:23) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Average Weekday Trips by Hour and Rider Type",
subtitle = "Commute hours highlighted (7–9 AM, 4–6 PM)",
x = "Hour of Day",
y = "Avg Daily Trips",
fill = "Rider Type"
) +
theme_minimal() +
theme(
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b = 8)
),
plot.subtitle = element_text(
face = "bold",
hjust = 0.5,
size = 14,
margin = margin(b = 15)
),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Prepare summarized data for percent-based stacked bar by rider and bike type
hourly_bike_type_pct <- tripdata %>%
mutate(
date = as.Date(started_at),
hour_of_day = as.integer(format(started_at, "%H")),
day_of_week = weekdays(date),
day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
) %>%
filter(day_type == "Weekday") %>%
group_by(date, hour_of_day, member_casual, rideable_type) %>%
summarise(trip_count = n(), .groups = "drop") %>%
group_by(hour_of_day, member_casual, rideable_type) %>%
summarise(avg_trips = mean(trip_count), .groups = "drop")
# Add vertical spacing before the chart
glue("<br><br>")
# Plot percent-based stacked bars, split by rider type
ggplot(hourly_bike_type_pct, aes(x = hour_of_day, y = avg_trips, fill = rideable_type)) +
# Highlight morning and evening commute times
annotate("rect", xmin = 6.5, xmax = 9.5, ymin = -Inf, ymax = Inf, fill = "yellow", alpha = 0.3) +
annotate("rect", xmin = 15.5, xmax = 18.5, ymin = -Inf, ymax = Inf, fill = "yellow", alpha = 0.3) +
geom_col(position = "fill") +
facet_wrap(~ member_casual, labeller = as_labeller(c("casual" = "Casual Riders", "member" = "Members"))) +
scale_fill_manual(
values = c("classic_bike" = "burlywood2", "electric_bike" = "darkseagreen"),
labels = c("classic_bike" = "Classic Bike", "electric_bike" = "Electric Bike")
) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_x_continuous(breaks = 0:23) +
labs(
title = "Bike Type Distribution by Hour of Day",
subtitle = "Percent of Classic vs Electric Bikes Used (Weekdays Only)",
x = "Hour of Day",
y = "Percent of Trips",
fill = "Bike Type"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, size = 18, margin = margin(t = 15, b = 8)),
plot.subtitle = element_text(face = "bold", hjust = 0.5, size = 14, margin = margin(b = 15)),
axis.text.x = element_text(size = 9),
axis.text.y = element_text(size = 12),
axis.title = element_text(size = 12),
strip.text = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Ensure month is ordered
tripdata <- tripdata %>%
mutate(month = factor(
month,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
ordered = TRUE
))
# Calculate trip counts per month, rider type, and bike type
monthly_bike_share <- tripdata %>%
count(month, member_casual, rideable_type) %>%
group_by(month, member_casual) %>%
mutate(share = n / sum(n)) %>%
ungroup()
# Add spacing before the chart
glue("<br><br>")
# Plot chart
ggplot(monthly_bike_share, aes(x = month, y = share, fill = rideable_type)) +
geom_col(position = "fill") +
facet_wrap(~ member_casual, labeller = as_labeller(c(
"casual" = "Casual Riders", "member" = "Members"
))) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_fill_manual(
values = c("classic_bike" = "burlywood2", "electric_bike" = "darkseagreen"),
labels = c("classic_bike" = "Classic Bike", "electric_bike" = "Electric Bike")
) +
labs(
title = "Bike Type Usage by Month",
subtitle = "Percent-based split between Classic and Electric Bikes",
x = "Month",
y = "Share of Monthly Trips",
fill = "Bike Type"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, size = 18, margin = margin(t = 15, b = 8)),
plot.subtitle = element_text(face = "bold", hjust = 0.5, size = 14, margin = margin(b = 15)),
axis.text = element_text(size = 11),
axis.title = element_text(size = 12),
strip.text = element_text(size = 13, face = "bold"),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
# Summarize data
monthly_duration <- tripdata %>%
group_by(month, member_casual) %>%
summarize(mean_duration = mean(trip_duration, na.rm = TRUE), .groups = "drop")
# Set order of months in summary table
monthly_duration <- monthly_duration %>%
mutate(month = factor(month,
levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
ordered = TRUE)
)
# Identify the max point for each rider type group
max_points <- monthly_duration %>%
group_by(member_casual) %>%
filter(mean_duration == max(mean_duration)) %>%
ungroup()
# Identify the min point for each rider type group
min_points <- monthly_duration %>%
group_by(member_casual) %>%
filter(mean_duration == min(mean_duration)) %>%
ungroup()
# Add vertical spacing before the chart
glue("<br><br>")
# Plot Line chart
ggplot(monthly_duration, aes(x = month, y = mean_duration, group = member_casual, color = member_casual)) +
geom_line(linewidth = 1.2) +
geom_point(size = 1) +
geom_text(
data = max_points,
aes(label = round(mean_duration, 1)), # or paste0(...) for more detail
vjust = -0.7,
fontface = "bold",
color = "black",
size = 3
) +
geom_text(
data = min_points,
aes(label = round(mean_duration, 1)),
vjust = 1.7,
fontface = "bold",
color = "black",
size = 3
) +
scale_color_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
scale_y_continuous(
limits = c(0, NA),
breaks = seq(0, 60, by = 5) # adjust as needed
) +
labs(
title = "Average Trip Duration by Month and Rider Type",
x = NULL,
y = "Minutes",
color = "Rider Type"
) +
theme_minimal() +
theme(
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b = 20)
),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
# Add vertical spacing after the chart
glue("<br><br>")
Insights:
# Summarize by day of week
weekday_duration <- tripdata %>%
mutate(day_of_week = weekdays(as.Date(started_at))) %>%
group_by(day_of_week, member_casual) %>%
summarize(mean_duration = mean(trip_duration, na.rm = TRUE), .groups = "drop")
# Set order of days of week
weekday_duration <- weekday_duration %>%
mutate(day_of_week = factor(day_of_week,
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
ordered = TRUE))
# Identify max point for each rider type
max_weekday_points <- weekday_duration %>%
group_by(member_casual) %>%
filter(mean_duration == max(mean_duration)) %>%
ungroup()
# Identify min point for each rider type
min_weekday_points <- weekday_duration %>%
group_by(member_casual) %>%
filter(mean_duration == min(mean_duration)) %>%
ungroup()
# Add vertical spacing before the chart
glue("<br><br>")
# Create plot
ggplot(weekday_duration, aes(x = day_of_week, y = mean_duration, group = member_casual, color = member_casual)) +
geom_line(linewidth = 1.2) +
geom_point(size = 1) +
geom_text(
data = max_weekday_points,
aes(label = round(mean_duration, 1)),
vjust = -0.7,
fontface = "bold",
color = "black",
size = 4
) +
geom_text(
data = min_weekday_points,
aes(label = round(mean_duration, 1)),
vjust = 1.7,
fontface = "bold",
color = "black",
size = 4
) +
scale_color_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
scale_y_continuous(
limits = c(0, NA),
breaks = seq(0, 60, by = 5)
) +
labs(
title = "Average Trip Duration by Day of Week and Rider Type",
x = NULL,
y = "Minutes",
color = "Rider Type"
) +
theme_minimal() +
theme(
plot.title = element_text(
face = "bold",
hjust = 0.5,
size = 18,
margin = margin(t = 15, b = 20)
),
axis.text = element_text(size = 12),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12, face = "bold"),
legend.text = element_text(size = 12)
)
Insights:
Based on the analysis of rider behavior, the following marketing strategies are recommended to encourage casual riders to convert to annual Cyclistic members:
Seasonal Membership Offers
Introduce flexible or short-term membership plans (e.g. 3-month or summer-only) to appeal to casual riders who prefer biking during fair-weather months.
Commuter-focused Incentives
Promote weekday riding with incentives like reduced weekday rates or early morning discounts. Encourage casual riders to see Cyclistic as a viable commuting solution.
Membership Loyalty Program
Launch a points-based rewards system to incentivize frequent usage. Riders can earn perks (e.g., free rides, swag, partner discounts) for reaching ride milestones.
Night Rider Promotions
Since electric bike usage increases late at night, offer “Night Owl” discounts or event-based promotions (e.g., concert or nightlife partnerships) to boost off-peak ridership.
Weekend-Driven Conversions
Capitalize on heavy weekend usage by casual riders with targeted in-app ads or pop-ups showcasing how membership benefits can enhance weekend exploration and save money.
Social Media Engagement:
Use member testimonials, user-generated content, and influencer partnerships to highlight the convenience, savings, and lifestyle appeal of being a Cyclistic member.
This analysis offers valuable insights into the behaviors and preferences of both Cyclistic members and casual riders. By tailoring marketing strategies based on these findings, Cyclistic is well-positioned to drive higher conversion of casual riders into loyal members.
To maximize impact, future initiatives should continue to be guided by data, aligned with rider trends, and tested through pilot campaigns. With a deeper understanding of when, how, and why riders engage with Cyclistic, the organization can grow its member base and promote long-term ridership sustainability.