Introduction

The Cyclistic Bike Share Case Study is a capstone project to complete the Google Data Analytics Professional Certificate on Coursera. This serves to demonstrate skills from the course following classic data analysis process: Ask, Prepare, Process, Analyze, Share and Act.

Background

Cyclistic is a bike-share company based in Chicago, IL. It first launched in 2016 and has since grown significantly to a fleet of 5,824 bicycles with a network of 692 geo-tracked stations across the city. The bikes can be unlocked from any station and returned to any other station in the system at any time. This system is designed to encourage cycling as a mode of transportation, allowing users to rent bikes from any location at their convenience, without the challenges of owning, maintaining and storing their own bicycle. Cyclistic offers flexible pricing plans such as single-ride passes, full-day passes and annual memberships.

Cyclistic’s marketing strategy started with building general awareness to reach broad consumer segments. While previous marketing campaigns were designed to target all-new customers, the company’s marketing director believes there is a solid opportunity to convert casual riders into members and maximizing the number of annual memberships will be the key to future growth.

Scenario

I am a junior data analyst working on the marketing analyst team at Cyclistic. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. The team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members.

Audience and Context

Director of Marketing (my manager) is responsible for development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclistic Marketing Analytics Team (my team) is responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
Cyclistic Executive Team is notoriously detail-oriented and will decide whether to approve the recommended marketing program.

Ask

Business Task

I am assigned to the first of 3 main questions to guide Cyclistic’s future marketing program:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Prepare

Identify the data needed to complete analysis

Data Source

Cyclistic is a fictional company based on divvybikes.com.

This case study uses this public data and is made available by Motivate International Inc. under this license.

Tools

I used RStudio to write R code to complete all phases of this analysis, including loading, processing (cleaning) and building visualizations. I produced this report using R Markdown, enhanced with HTML/CSS.

Process

Load, format, and clean data

Environment Setup

Install and load packages

R packages referenced in this project: pacman, tidyverse, readr, tidyr, dplyr, ggplot2, here, knitr, scales, kableExtra, glue

Show Code

if (requireNamespace("pacman", quietly = TRUE)) {
  pacman::p_load(
    tidyverse, readr, tidyr, dplyr, ggplot2,
    here, knitr, scales, kableExtra, glue
  )
} else {
  stop("Please install the 'pacman' package before knitting.")
}

Load and Combine Data

Retrieve 12 monthly .csv files (Jan - Dec 2024) to create a single dataset called tripdata

Show Code

# Load 12 monthly csv files from divvy tripdata
tripdata_01_2024 <- read_csv(here("data","x202401-divvy-tripdata.csv"))
tripdata_02_2024 <- read_csv(here("data","x202402-divvy-tripdata.csv"))
tripdata_03_2024 <- read_csv(here("data","x202403-divvy-tripdata.csv"))
tripdata_04_2024 <- read_csv(here("data","x202404-divvy-tripdata.csv"))
tripdata_05_2024 <- read_csv(here("data","x202405-divvy-tripdata.csv"))
tripdata_06_2024 <- read_csv(here("data","x202406-divvy-tripdata.csv"))
tripdata_07_2024 <- read_csv(here("data","x202407-divvy-tripdata.csv"))
tripdata_08_2024 <- read_csv(here("data","x202408-divvy-tripdata.csv"))
tripdata_09_2024 <- read_csv(here("data","x202409-divvy-tripdata.csv"))
tripdata_10_2024 <- read_csv(here("data","x202410-divvy-tripdata.csv"))
tripdata_11_2024 <- read_csv(here("data","x202411-divvy-tripdata.csv"))
tripdata_12_2024 <- read_csv(here("data","x202412-divvy-tripdata.csv"))

# Combine to make single data set called tripdata
tripdata <- bind_rows(
  tripdata_01_2024,
  tripdata_02_2024,
  tripdata_03_2024,
  tripdata_04_2024,
  tripdata_05_2024,
  tripdata_06_2024,
  tripdata_07_2024,
  tripdata_08_2024,
  tripdata_09_2024,
  tripdata_10_2024,
  tripdata_11_2024,
  tripdata_12_2024
)

Review Data

tripdata summary (pre-cleaned)

Show Code

# Show total number of records
glue("<br>**Total Records:**  {comma(nrow(tripdata))}<br>")

# Pare data to only the columns used for analysis
tripdata <- tripdata %>%
  select(
    ride_id, 
    rideable_type, 
    started_at, 
    ended_at, 
    member_casual
  )

# Create metadata table to describe specific columns to be used for analysis
column_metadata <- tibble::tibble(
  Column = c("ride_id", "rideable_type", "started_at", "ended_at", "member_casual"),
  `Data Type` = c("character", "character", "POSIXct", "POSIXct", "character"),
  description = c(
    "Unique ID for each ride",
    "Type of bike used",
    "Ride start timestamp",
    "Ride end timestamp",
    "Rider type (casual or member)"
  )
)

# Show column names, data type, description using kableExtra
column_metadata %>%
  kable(
      caption = "Pared columns used for analysis"
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"), 
    full_width = TRUE
  )

# Add vertical spacing
glue("<br>")

Total Records: 5,860,568

Pared columns used for analysis
Column	Data Type	description
ride_id	character	Unique ID for each ride
rideable_type	character	Type of bike used
started_at	POSIXct	Ride start timestamp
ended_at	POSIXct	Ride end timestamp
member_casual	character	Rider type (casual or member)

Clean Data

Remove duplicate ride_id records

These are the unique identifiers for each bike trip tracked

Show Code

# Count number of ride_id duplicates
dupe_ride_id_count <- sum(duplicated(tripdata$ride_id))
glue("<br>**Duplicate records removed:**  {dupe_ride_id_count}<br>")

# Remove duplicate ride_ids from tripdata
tripdata <- tripdata %>%
  distinct(ride_id, .keep_all = TRUE)

Duplicate records removed: 211

Add columns

trip_duration, month, day_of_week, hour_of_day, day_type

Show Code

tripdata <- tripdata %>%
  mutate(
    trip_duration = as.numeric(difftime(ended_at, started_at, units="mins")), # sets end - start time in minutes
    month = format(started_at, "%b"),  # sets month of start time, abbreviated
    day_of_week = weekdays(started_at),  # sets day of week of start time
    hour_of_day = as.numeric(format(started_at, "%H")),  # sets hour of ride 0 - 23 based on start time
    day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")  # sets as weekday or weekend
  )

Remove trip_durations that are less than 1 minute and greater than 1 day

Show Code

# Save off trips to be excluded from tripdata
excluded_trips <- tripdata %>%
  filter(trip_duration < 1 | trip_duration > 1440) %>%
  mutate(reason = case_when(
    trip_duration < 1 ~ "Too Short (<1 min)",
    trip_duration > 1440 ~ "Too Long (>1 day)"
  ))

# Summarize number of excluded trips by reason, then add a total row
excluded_summary <- excluded_trips %>%
  count(reason, name = "Trip Count") %>%
  bind_rows(
    tibble(reason = "Total Trips Excluded Based on Duration Criteria", `Trip Count` = sum(.$`Trip Count`))
  )

# Add vertical spacing
glue("<br>")

# Display summary table using kableExtra
excluded_summary %>%
  kable() %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"), 
    full_width = TRUE
  ) %>%
  row_spec(nrow(excluded_summary), bold = TRUE)

# Remove excluded trips
tripdata <- tripdata %>%
  filter(trip_duration >= 1, trip_duration <= 1440)

# Add vertical spacing
glue("<br>")

reason	Trip Count
Too Long (>1 day)	7553
Too Short (<1 min)	131530
Total Trips Excluded Based on Duration Criteria	139083

Examine bike types

Show Code

# Add vertical spacing
glue("<br>")

# Create summary table
rideable_summary <- tripdata %>%
  count(rideable_type, name = "Trip Count") %>%
  bind_rows(
    summarise(., rideable_type = "Total", `Trip Count` = sum(`Trip Count`))
  )

# Display summary table using kableExtra
rideable_summary %>%
  kable() %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = TRUE
  ) %>%
  row_spec(nrow(rideable_summary), bold = TRUE)

# Add vertical spacing
glue("<br>")

rideable_type	Trip Count
classic_bike	2714623
electric_bike	2869067
electric_scooter	137584
Total	5721274

Remove electric scooter trips due to limited seasonal usage

Show Code

# Identify electric scooter rentals to be excluded from tripdata
electric_scooters <- tripdata %>%
  filter(rideable_type == "electric_scooter") %>%
  count(month, name = "Count") %>%
  bind_rows(
    summarize(., 
              month = "Electric Scooter Trips Excluded", 
              Count = sum(Count))
  )

# Add vertical spacing
glue("<br>")

# Display electric scooters summary table using kableExtra
electric_scooters %>%
  kable() %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"), 
    full_width = TRUE
  ) %>%
  row_spec(nrow(electric_scooters), bold = TRUE)

# Remove electric scooters from tripdata
tripdata <- tripdata %>%
  filter(rideable_type != "electric_scooter")

# Add vertical spacing after the table
glue("<br>")

month	Count
Aug	82
Sep	137502
Electric Scooter Trips Excluded	137584

tripdata summary (cleaned)

Show Code

# Show total number of records after cleaning
glue("<br>**Total Records:**  {comma(nrow(tripdata))}<br>")

# Create metadata table to describe specific columns to be used for analysis
column_metadata_cleaned <- tibble::tibble(
  Column = c("ride_id", "rideable_type", "started_at", "ended_at", "member_casual", "trip_duration", "month", 
             "day_of_week", "hour_of_day", "day_type"),
  `Data Type` = c("character", "character", "POSIXct", "POSIXct", "character", "double", "character", "character", "integer", "character"),
  description = c(
    "Unique ID for each ride",
    "Type of bike used",
    "Ride start timestamp",
    "Ride end timestamp",
    "Rider type (casual or member)",
    "Calculate trip duration in minutes",
    "Month of ride start",
    "Day of week of ride start",
    "Hour of ride start",
    "Day type (weekend or weekday)"
  )
)

# Show column names, data type, description using kableExtra
column_metadata_cleaned %>%
  kable(caption = "Trip Data (cleaned): Column Descriptions") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"), 
    full_width = TRUE)

# Add vertical spacing after the table
glue("<br>")

Total Records: 5,583,690

Columns (includes 5 added during cleaning)
Column	Data Type	description
ride_id	character	Unique ID for each ride
rideable_type	character	Type of bike used
started_at	POSIXct	Ride start timestamp
ended_at	POSIXct	Ride end timestamp
member_casual	character	Rider type (casual or member)
trip_duration	double	Calculate trip duration in minutes
month	character	Month of ride start
day_of_week	character	Day of week of ride start
hour_of_day	integer	Hour of ride start
day_type	character	Day type (weekend or weekday)

Analyze

Create visualizations to gather insights

Members vs Casual Riders

Show Code

# Summarize trip counts and compute share
rider_share <- tripdata %>%
  count(member_casual) %>%
  mutate(
    share = n / sum(n),
    label = paste0(comma(n), "\n", percent(share, accuracy = 0.1))
  )

# Add vertical spacing before the chart
glue("<br><br>")

# Plot pie chart
ggplot(rider_share, aes(x = "", y = share, fill = member_casual)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y") +
  geom_text(aes(label = label), position = position_stack(vjust = 0.5), color = "white", size = 5, fontface = "bold") +
  scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
  labs(
    title = "Share of Trips by Rider Type",
    fill = "Rider Type"
  ) +
  theme_minimal() +
  theme(
    axis.title = element_blank(),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    plot.title = element_text(
      face = "bold", 
      hjust = 0.5, 
      size = 18,
      margin = margin(t = 15, b=10)
    ),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Cyclistic members make up over 64% of all rides tracked in 2024.
Members took approximately 79% more rides than casual riders.

Bike Type Preferences

Show Code

# Summarize trip counts and compute share
bike_share <- tripdata %>%
  count(member_casual, rideable_type) %>%
  group_by(member_casual) %>%
  mutate(
    share = n / sum(n),
    label = paste0(comma(n), "\n", percent(share, accuracy = .1))
  )

# Add vertical spacing before the chart
glue("<br><br>")

# Plot pie chart
ggplot(bike_share, aes(x = "", y = share, fill = rideable_type)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar("y") +
  geom_text(aes(label = label), 
            position = position_stack(vjust = 0.5), 
            color = "white", size = 5, fontface = "bold") +
  scale_fill_manual(
    values = c("classic_bike" = "burlywood2", "electric_bike" = "darkseagreen"), 
    labels = c("classic_bike" = "Classic Bike", "electric_bike" = "Electric Bike")
  ) +
facet_wrap(~member_casual, labeller = as_labeller(c(
  "casual" = "Casual",
  "member" = "Member"
))) +
  labs(
    title = "Share of Trips by Bike Type",
    subtitle = "Split by Rider Type",
    fill = "Bike Type"
  ) +
  theme_minimal() +
  theme(
    axis.title = element_blank(),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    strip.text = element_text(size = 12, face="bold"),
    # Add spacing below title and subtitle
    plot.title = element_text(
      face = "bold", 
      hjust = 0.5, 
      size = 18, 
      margin = margin(b = 10)
    ),  # bottom margin
    plot.subtitle = element_text(
      face = "bold", 
      hjust = 0.5, 
      size = 14, 
      margin = margin(b = 30)
    ),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

There is a near identical split between classic bikes and electric bike usage for all riders with a slight preference for electric bikes throughout 2024.
Both rider types are virtually identical in this split preference, amounting to only a 0.5% share variance between members and casual riders.

Trips by Month

Show Code

# Set month order
tripdata <- tripdata %>%
  mutate(month = factor(
    month,
    levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
               "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
    ordered = TRUE
  ))

# Summarize trip counts by month and rider type
monthly_counts <- tripdata %>%
  count(month, member_casual)

# Identify min and max trips for each rider type
label_points <- monthly_counts %>%
  group_by(member_casual) %>%
  filter(n == max(n) | n == min(n)) %>%
  ungroup()

# Add vertical spacing before the chart
glue("<br><br>")

# Plot bar chart
ggplot(monthly_counts, aes(x = month, y = n, fill = member_casual)) +
  geom_col(position = "dodge") +
  geom_text(
    data = label_points,
    aes(label = comma(n)),
    position = position_dodge(width = 0.9),
    vjust = -0.4,
    size = 4,
    fontface = "bold"
  ) +
  scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) + 
  scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
  labs(
    title = "Trips by Month and Rider Type",
    y = NULL,
    x = NULL,
    fill = "Rider Type"
  ) +
  theme_minimal() +
  theme(
    axis.text = element_text(size = 12),
    plot.title = element_text(
      face = "bold",
      hjust = 0.5,
      size = 18,
      margin = margin(t = 15, b = 10)
    ),    
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Both Cyclistic members and casual riders follow similar seasonal usage trends, lowest in winter months and highest in summer months.
Cyclistic members show higher comparative usage during colder weather, with 83% as compared to 17% by casual riders in January.
This usage gap shrinks considerably during the warmest months (Jul/Aug), with 58% share of rides by members as compared to 42% by casual riders.
While both rider types follow very similar weather usage trends, casual riders seem to be more influenced by weather changes than Cyclistic members.

Average Trips by Day of Week

Show Code

# Extract date and day of week
tripdata <- tripdata %>%
  mutate(
    date = as.Date(started_at),
    day_of_week = factor(
      weekdays(date),
      levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
      ordered = TRUE
    )
  )

# Count daily trips per rider type and day of week
daily_counts <- tripdata %>%
  group_by(date, day_of_week, member_casual) %>%
  summarise(trips = n(), .groups = "drop")

# Average trips per day of week and rider type
avg_trip_counts <- daily_counts %>%
  group_by(day_of_week, member_casual) %>%
  summarise(avg_trips = mean(trips), .groups = "drop")

# Identify min/max for labels
extreme_counts <- avg_trip_counts %>%
  group_by(member_casual) %>%
  filter(avg_trips == max(avg_trips) | avg_trips == min(avg_trips)) %>%
  ungroup()

# Add vertical spacing before the chart
glue("<br><br>")

# Plot bar chart with labels for min/max
ggplot(avg_trip_counts, aes(x = day_of_week, y = avg_trips, fill = member_casual)) +
  geom_col(position = "dodge") +
  geom_text(
    data = extreme_counts,
    aes(label = comma(round(avg_trips, 0))),
    position = position_dodge(width = 0.9),
    vjust = -0.5,
    fontface = "bold",
    size = 3.5
  ) +
  scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
  scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
  labs(
    title = "Average Daily Trips by Day of Week and Rider Type",
    y = "Avg Daily Trips",
    x = NULL,
    fill = "Rider Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(
      face = "bold",
      hjust = 0.5,
      size = 18,
      margin = margin(t = 15, b = 15)
    ),
    axis.text = element_text(size = 12),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Cyclistic members take the most trips Monday through Friday, with a max average of over 11,348 on Wednesdays and a min average of 7,750 rides on Sundays.
Casual riders show an inverse trend, with a max average of 8,009 rides on Saturdays and a min average of 4,072 rides on Tuesdays.

Weekend vs Weekday Trips

Show Code

# Summary of average trips by day_type
trip_summary <- tripdata %>%
  mutate(
    date = as.Date(started_at),
    day_of_week = weekdays(date),
    day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
  ) %>%
  group_by(date, day_type, member_casual) %>%
  summarize(trip_count = n(), .groups = "drop") %>%
  group_by(day_type, member_casual) %>%
  summarize(avg_daily_trips = mean(trip_count), .groups = "drop")

# Add vertical spacing before the chart
glue("<br><br>")

# Plot average daily trip counts
ggplot(trip_summary, aes(x = day_type, y = avg_daily_trips, fill = member_casual)) +
  geom_col(position = "dodge") +
  geom_text(
    aes(label = comma(round(avg_daily_trips))),
    position = position_dodge(width = 0.9),
    vjust = -0.5,
    size = 3.5,
    fontface = "bold"
  ) +
  scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) + 
  scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
  labs(
    title = "Average Daily Trips by Day Type and Rider Type",
    x = NULL,
    y = "Avg Daily Trips",
    fill = "Rider Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(
      face = "bold",
      hjust = 0.5,
      size = 18,
      margin = margin(t = 15, b = 15)
    ),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 12),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Consolidating day-of-week trends into weekdays and weekends, Cyclistic members take 120% more rides than casual riders on weekdays.
During the weekend, both rider types show much closer ridership trends, with members taking only 14% rides than casual riders.
This shows that members tend to ride more during standard work week while casual riders prefer riding during the weekend.

Trips by Hour of Day

Show Code

# Prepare summarized data (filtered to Weekdays only)
hourly_avg_weekday <- tripdata %>%
  mutate(
    date = as.Date(started_at),
    hour_of_day = as.integer(format(started_at, "%H")),
    day_of_week = weekdays(date),
    day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
  ) %>%
  filter(day_type == "Weekday") %>%
  group_by(date, hour_of_day, member_casual) %>%
  summarize(trip_count = n(), .groups = "drop") %>%
  group_by(hour_of_day, member_casual) %>%
  summarize(avg_trips = mean(trip_count), .groups = "drop")

# Add vertical spacing before the chart
glue("<br><br>")

# Plot average weekday trips by hour, with highlighted peak hours
ggplot(hourly_avg_weekday, aes(x = hour_of_day, y = avg_trips, fill = member_casual)) +
  # Highlight morning peak (7–9 AM)
  annotate("rect", xmin = 6.5, xmax = 9.5, ymin = -Inf, ymax = Inf, 
           fill = "yellow", alpha = 0.3) +
  # Highlight evening peak (4–6 PM)
  annotate("rect", xmin = 15.5, xmax = 18.5, ymin = -Inf, ymax = Inf, 
           fill = "yellow", alpha = 0.3) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
  scale_x_continuous(breaks = 0:23) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Average Weekday Trips by Hour and Rider Type",
    subtitle = "Commute hours highlighted (7–9 AM, 4–6 PM)",
    x = "Hour of Day",
    y = "Avg Daily Trips",
    fill = "Rider Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(
      face = "bold",
      hjust = 0.5,
      size = 18,
      margin = margin(t = 15, b = 8)
    ),
    plot.subtitle = element_text(
      face = "bold",
      hjust = 0.5,
      size = 14,
      margin = margin(b = 15)
    ),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 12),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Both cyclistic members and casual riders follow a similar hourly trend on weekdays, peaking during standard work commute times (7-9am and 4-6pm).
Both rider types show the highest ride counts at 5pm hour, right in the middle of the standard afternoon commute times (4-6pm).
Members show a far higher relative trip count during these peak times, further supporting their preference to use Cyclistic bikes for work commuting.

Bike Type Usage by Hour

Show Code

# Prepare summarized data for percent-based stacked bar by rider and bike type
hourly_bike_type_pct <- tripdata %>%
  mutate(
    date = as.Date(started_at),
    hour_of_day = as.integer(format(started_at, "%H")),
    day_of_week = weekdays(date),
    day_type = ifelse(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
  ) %>%
  filter(day_type == "Weekday") %>%
  group_by(date, hour_of_day, member_casual, rideable_type) %>%
  summarise(trip_count = n(), .groups = "drop") %>%
  group_by(hour_of_day, member_casual, rideable_type) %>%
  summarise(avg_trips = mean(trip_count), .groups = "drop")

# Add vertical spacing before the chart
glue("<br><br>")

# Plot percent-based stacked bars, split by rider type
ggplot(hourly_bike_type_pct, aes(x = hour_of_day, y = avg_trips, fill = rideable_type)) +
  # Highlight morning and evening commute times
  annotate("rect", xmin = 6.5, xmax = 9.5, ymin = -Inf, ymax = Inf, fill = "yellow", alpha = 0.3) +
  annotate("rect", xmin = 15.5, xmax = 18.5, ymin = -Inf, ymax = Inf, fill = "yellow", alpha = 0.3) +
  geom_col(position = "fill") +
  facet_wrap(~ member_casual, labeller = as_labeller(c("casual" = "Casual Riders", "member" = "Members"))) +
  scale_fill_manual(
    values = c("classic_bike" = "burlywood2", "electric_bike" = "darkseagreen"),
    labels = c("classic_bike" = "Classic Bike", "electric_bike" = "Electric Bike")
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_x_continuous(breaks = 0:23) +
  labs(
    title = "Bike Type Distribution by Hour of Day",
    subtitle = "Percent of Classic vs Electric Bikes Used (Weekdays Only)",
    x = "Hour of Day",
    y = "Percent of Trips",
    fill = "Bike Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 18, margin = margin(t = 15, b = 8)),
    plot.subtitle = element_text(face = "bold", hjust = 0.5, size = 14, margin = margin(b = 15)),
    axis.text.x = element_text(size = 9),
    axis.text.y = element_text(size = 12),
    axis.title = element_text(size = 12),
    strip.text = element_text(size = 13, face = "bold"),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Both members and casual riders show nearly identical trends with a fairly steady split between electric and classic bikes throughout the main part of each day, including typical commute times. This may be due to bike availability during high-traffic daytime hours or commuter consistency with routine and reliability.
Both rider groups show an increase in electric bike usage during late night to early morning (10pm - 4am). This may be due to greater desire for a faster, easier ride home late at night. Electric bikes are also easier to ride in reduced traffic when faster speeds can be fully utilized, and are more convenient after nightlife events.

Bike Type Usage by Month

Show Code

# Ensure month is ordered
tripdata <- tripdata %>%
  mutate(month = factor(
    month,
    levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
               "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
    ordered = TRUE
  ))

# Calculate trip counts per month, rider type, and bike type
monthly_bike_share <- tripdata %>%
  count(month, member_casual, rideable_type) %>%
  group_by(month, member_casual) %>%
  mutate(share = n / sum(n)) %>%
  ungroup()

# Add spacing before the chart
glue("<br><br>")

# Plot chart
ggplot(monthly_bike_share, aes(x = month, y = share, fill = rideable_type)) +
  geom_col(position = "fill") +
  facet_wrap(~ member_casual, labeller = as_labeller(c(
    "casual" = "Casual Riders", "member" = "Members"
  ))) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_manual(
    values = c("classic_bike" = "burlywood2", "electric_bike" = "darkseagreen"),
    labels = c("classic_bike" = "Classic Bike", "electric_bike" = "Electric Bike")
  ) +
  labs(
    title = "Bike Type Usage by Month",
    subtitle = "Percent-based split between Classic and Electric Bikes",
    x = "Month",
    y = "Share of Monthly Trips",
    fill = "Bike Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 18, margin = margin(t = 15, b = 8)),
    plot.subtitle = element_text(face = "bold", hjust = 0.5, size = 14, margin = margin(b = 15)),
    axis.text = element_text(size = 11),
    axis.title = element_text(size = 12),
    strip.text = element_text(size = 13, face = "bold"),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Electric bike usage is relatively consistent year-round.
Slight dip in electric bike usage in February may be due to inclement weather conditions, lower battery reliability in very low temperatures, or issues with bike availability.
Overall, bike availability, rather than season, may drive usage patterns.

Trip Duration by Month

Show Code

# Summarize data
monthly_duration <- tripdata %>%
  group_by(month, member_casual) %>%
  summarize(mean_duration = mean(trip_duration, na.rm = TRUE), .groups = "drop")
# Set order of months in summary table
monthly_duration <- monthly_duration %>%
  mutate(month = factor(month, 
                        levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
                        ordered = TRUE)
  )

# Identify the max point for each rider type group
max_points <- monthly_duration %>%
  group_by(member_casual) %>%
  filter(mean_duration == max(mean_duration)) %>%
  ungroup()

# Identify the min point for each rider type group
min_points <- monthly_duration %>%
  group_by(member_casual) %>%
  filter(mean_duration == min(mean_duration)) %>%
  ungroup()

# Add vertical spacing before the chart
glue("<br><br>")

# Plot Line chart
ggplot(monthly_duration, aes(x = month, y = mean_duration, group = member_casual, color = member_casual)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 1) +
  geom_text(
    data = max_points,
    aes(label = round(mean_duration, 1)),  # or paste0(...) for more detail
    vjust = -0.7,
    fontface = "bold",
    color = "black",
    size = 3
  ) +
  geom_text(
    data = min_points,
    aes(label = round(mean_duration, 1)),
    vjust = 1.7,
    fontface = "bold",
    color = "black",
    size = 3
  ) +
  scale_color_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
  scale_y_continuous(
    limits = c(0, NA),
    breaks = seq(0, 60, by = 5)  # adjust as needed
  ) +
  labs(
    title = "Average Trip Duration by Month and Rider Type",
    x = NULL,
    y = "Minutes",
    color = "Rider Type"
  ) +
  theme_minimal() + 
  theme(
    plot.title = element_text(
      face = "bold",
      hjust = 0.5,
      size = 18,
      margin = margin(t = 15, b = 20)
    ),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 12),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

# Add vertical spacing after the chart
glue("<br><br>")

Insights:

Casual riders ride longer on average than members, favoring fair weather months.
Casual riders average 79% longer rides in May/June and 30% longer rides in December.
While Cyclistic members tend to ride shorter rides in comparison, they represent a far more consistent ride duration throughout the year.

Trip Duration by Day of Week

Show Code

# Summarize by day of week
weekday_duration <- tripdata %>%
  mutate(day_of_week = weekdays(as.Date(started_at))) %>%
  group_by(day_of_week, member_casual) %>%
  summarize(mean_duration = mean(trip_duration, na.rm = TRUE), .groups = "drop")

# Set order of days of week
weekday_duration <- weekday_duration %>%
  mutate(day_of_week = factor(day_of_week, 
                              levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
                              ordered = TRUE))

# Identify max point for each rider type
max_weekday_points <- weekday_duration %>%
  group_by(member_casual) %>%
  filter(mean_duration == max(mean_duration)) %>%
  ungroup()

# Identify min point for each rider type
min_weekday_points <- weekday_duration %>%
  group_by(member_casual) %>%
  filter(mean_duration == min(mean_duration)) %>%
  ungroup()

# Add vertical spacing before the chart
glue("<br><br>")

# Create plot
ggplot(weekday_duration, aes(x = day_of_week, y = mean_duration, group = member_casual, color = member_casual)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 1) +
  geom_text(
    data = max_weekday_points,
    aes(label = round(mean_duration, 1)),
    vjust = -0.7,
    fontface = "bold",
    color = "black",
    size = 4
  ) +
  geom_text(
    data = min_weekday_points,
    aes(label = round(mean_duration, 1)),
    vjust = 1.7,
    fontface = "bold",
    color = "black",
    size = 4
  ) +
  scale_color_manual(values = c("casual" = "salmon1", "member" = "lightskyblue2")) +
  scale_y_continuous(
    limits = c(0, NA),
    breaks = seq(0, 60, by = 5)
  ) +
  labs(
    title = "Average Trip Duration by Day of Week and Rider Type",
    x = NULL,
    y = "Minutes",
    color = "Rider Type"
  ) +
  theme_minimal() + 
  theme(
    plot.title = element_text(
      face = "bold",
      hjust = 0.5,
      size = 18,
      margin = margin(t = 15, b = 20)
    ),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 12),
    legend.title = element_text(size = 12, face = "bold"),
    legend.text = element_text(size = 12)
  )

Insights:

Casual riders show consistently higher ride duration over members, but are significantly shorter Monday through Friday.
Cyclistic members show more consistent average trip durations throughout the week.

Act

Recommendations

Based on the analysis of rider behavior, the following marketing strategies are recommended to encourage casual riders to convert to annual Cyclistic members:

Seasonal Membership Offers

Introduce flexible or short-term membership plans (e.g. 3-month or summer-only) to appeal to casual riders who prefer biking during fair-weather months.
Commuter-focused Incentives

Promote weekday riding with incentives like reduced weekday rates or early morning discounts. Encourage casual riders to see Cyclistic as a viable commuting solution.
Membership Loyalty Program

Launch a points-based rewards system to incentivize frequent usage. Riders can earn perks (e.g., free rides, swag, partner discounts) for reaching ride milestones.
Night Rider Promotions

Since electric bike usage increases late at night, offer “Night Owl” discounts or event-based promotions (e.g., concert or nightlife partnerships) to boost off-peak ridership.
Weekend-Driven Conversions

Capitalize on heavy weekend usage by casual riders with targeted in-app ads or pop-ups showcasing how membership benefits can enhance weekend exploration and save money.
Social Media Engagement:

Use member testimonials, user-generated content, and influencer partnerships to highlight the convenience, savings, and lifestyle appeal of being a Cyclistic member.

Conclusion

This analysis offers valuable insights into the behaviors and preferences of both Cyclistic members and casual riders. By tailoring marketing strategies based on these findings, Cyclistic is well-positioned to drive higher conversion of casual riders into loyal members.

To maximize impact, future initiatives should continue to be guided by data, aligned with rider trends, and tested through pilot campaigns. With a deeper understanding of when, how, and why riders engage with Cyclistic, the organization can grow its member base and promote long-term ridership sustainability.

Cyclistic Bike Share Case Study

Jeff Hopp

2025-05-15

Introduction

Background

Scenario

Audience and Context

Ask

Business Task

Prepare

Data Source

Tools

Process

Environment Setup

Install and load packages

Load and Combine Data

Review Data

tripdata summary (pre-cleaned)

Clean Data

Remove duplicate ride_id records

Add columns

Remove trip_durations that are less than 1 minute and greater than 1 day

Examine bike types

Remove electric scooter trips due to limited seasonal usage

tripdata summary (cleaned)

Analyze

Members vs Casual Riders

Bike Type Preferences

Trips by Month

Average Trips by Day of Week

Weekend vs Weekday Trips

Trips by Hour of Day

Bike Type Usage by Hour

Bike Type Usage by Month

Trip Duration by Month

Trip Duration by Day of Week

Act

Recommendations

Conclusion

Cyclistic Bike Share Case Study

Jeff Hopp

2025-05-15

Introduction

Background

Scenario

Audience and Context

Ask

Business Task

Prepare

Data Source

Tools

Process

Environment Setup

Install and load packages

Load and Combine Data

Review Data

tripdata summary (pre-cleaned)

Clean Data

Remove duplicate ride_id records

Add columns

Remove trip_durations that are less than 1 minute and greater than 1 day

Examine bike types

Remove electric scooter trips due to limited seasonal usage

tripdata summary (cleaned)

Analyze

Members vs Casual Riders

Bike Type Preferences

Trips by Month

Average Trips by Day of Week

Weekend vs Weekday Trips

Trips by Hour of Day

Bike Type Usage by Hour

Bike Type Usage by Month

Trip Duration by Month

Trip Duration by Day of Week

Share

Summary of Analysis

Act

Recommendations

Conclusion