Open In App

OLA Data Analysis with SQL

Last Updated : 30 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Have you ever thought about how ride-hailing companies manage large amounts of booking data, how to analyze customer behaviour and decide on discounts to offer? In this blog, we will do an in-depth analysis of Bengaluru ride data using a large dataset of 50,000 Ola bookings.

It covers essential aspects like booking statuses, cancellations, ride distances, payment methods, and ratings. With SQL for data preparation and Power BI for visualization, we uncover trends like high weekend demand, popular vehicle types, and reasons for cancellations.

These insights explain how data-driven decisions help companies enhance customer satisfaction by optimizing operations and improving service quality.

The Dataset

The Ola dataset contains 50,000 rows of ride-booking data for Bengaluru over one month. This dataset contains a variety of data related to ride bookings, customer and driver interactions, cancellations and more. The overview of columns are:

  • Date: The date of booking.
  • Time: The time at which the booking was made.
  • Booking ID: A unique 10-digit identifier, prefixed by "CNR" (e.g., CNR1234567890).
  • Booking Status: The status of the booking (Successful/Cancelled).
  • Customer ID: Unique identifier for each customer.
  • Vehicle Type: Different types of vehicles used for the ride (e.g., Auto, Prime Plus, etc.).
  • Pickup Location: Dummy locations from 50 different areas in Bengaluru.
  • Drop Location: Another set of locations, chosen from the list of pickup locations.
  • Avg VTAT (Vehicle Time Arrival Time): The average time taken for the vehicle to arrive at the pickup location.
  • Avg CTAT (Customer Time Arrival Time): The average time taken for the customer to arrive at the pickup location.
  • Cancelled Rides by Customer: If the ride was cancelled by the customer.
  • Reason for Cancelling by Customer: Various reasons for cancellation (e.g., AC not working, change of plans).
  • Cancelled Rides by Driver: Cancellations by the driver, often due to personal or vehicle-related issues.
  • Incomplete Rides: Cases where the ride was incomplete for some reason.
  • Incomplete Rides Reason: Reasons for incomplete rides, such as vehicle breakdown or customer demand.
  • Booking Value: The fare for the ride.
  • Payment Method: The method used to pay (Cash/UPI/Card/Wallet).
  • Ride Distance: Distance of the ride in kilometres.
  • Driver Ratings: Ratings given to the driver by the customer.
  • Customer Rating: Ratings given by the driver to the customer.

You can download the dataset from the GitHub Repository.

Data Constraints and Project Goals

As part of the project, we will do some was tasked with ensuring specific constraints in the data to real-world conditions:

  • Booking Status: 62% of the rides should have a "Success" status with the remaining 38% being cancellations.
  • Cancelled Rides by Customer: No more than 7% of the bookings should be cancelled by customers.
  • Cancelled Rides by Driver: No more than 18% of the bookings should be cancelled by drivers.
  • Incomplete Rides: The percentage of incomplete rides should be under 6%.
  • Weekend and Match Day Orders: The number of orders should be higher on weekends and match days. I marked specific dates as match days.
  • Order Values: Around 70% of orders should have a fare under ₹500, while 28% should be above ₹500, with the remainder above ₹100.
  • Food Category: The dataset reflects 67% of orders being from the food category, a common service in Bengaluru.

Data Preparation with SQL

The first step in the project is to create the dataset using SQL which helped us to handle large volumes of data efficiently. We will create one database which called Oladb on the MySQL Workbench.

We will import the Bengaluru_Ola_Booking_Data.csv file to extract the dataset from the CSV files to SQL Workbench to perform EDA (Exploratory Data Analysis).

If you don’t know how to import CSV files to MySQL Workbench then follow this. Below is the overview of ola_booking_table.

Exploration Data Analysis (EDA)

Let's perform an Exploration Data Analysis (EDA) to find insight that can help to take decisions:

Query 1. Retrieve all successful bookings:

SELECT * FROM ola_booking_table WHERE Booking_Status = 'Success';

Explanation: This query retrieves all the booking details where the status is marked as "Success," providing insights into completed rides.

Query 2. Find the average ride distance for each vehicle type:

SELECT 'Vehicle Type', AVG(‘Ride Distance’) as avg_distance FROM ola_booking_table GROUP BY 'Vehicle Type';

Explanation: This query calculates the average ride distance for each type of vehicle, helping analyze performance and customer preferences by vehicle category.

Query 3. Get the total number of cancelled rides by customers:

SELECT COUNT(*) FROM ola_booking_table WHERE 'Booking Status' = 'cancelled by Customer';

Explanation: This query counts the total number of rides cancelled by customers, useful for understanding customer behaviour and cancellation trends.

Query 4. List the top 5 customers who booked the highest number of rides:

SELECT 'Customer ID', COUNT('Booking ID') as total_rides FROM ola_booking_table 
GROUP BY 'Customer ID'
ORDER BY total_rides DESC
LIMIT 5;

Explanation: This query identifies the top 5 customers based on the number of rides booked, highlighting the most active users.

Query 5. Get the number of rides cancelled by drivers due to personal and car-related issues:

SELECT COUNT(*) FROM ola_booking_table 
WHERE 'cancelled Rides by Driver' ='Personal & Car related issue';

Explanation: This query calculates the number of rides cancelled by drivers due to personal or car-related reasons, helping identify operational issues.

Query 6. Find the maximum and minimum driver ratings for Prime Sedan bookings:

SELECT MAX('Driver Ratings') as max_rating, MIN('Driver Ratings') as min_rating FROM ola_booking_table 
WHERE 'Vehicle Type' = 'Prime Sedan';

Explanation: This query finds the highest and lowest driver ratings for Prime Sedan bookings, helping assess service quality for this vehicle type.

Query 7. Retrieve all rides where payment was made using UPI:

SELECT * FROM ola_booking_table WHERE 'Payment Method' = 'UPI';

Explanation: This query retrieves all ride details where the payment method was made using UPI by providing insights into the popularity of digital payment modes.

Query 8. Find the average customer rating per vehicle type:

SELECT 'Vehicle Type', AVG('Customer Rating') as avg_customer_rating FROM ola_booking_table 
GROUP BY 'Vehicle Type';

Explanation: This query calculates the average customer rating for each vehicle type, helping evaluate customer satisfaction across different categories.

Query 9. Calculate the total booking value of rides completed successfully:

SELECT SUM('Booking Value') as total_successful_value FROM ola_booking_table WHERE
'Booking Status' = 'Success';

Explanation: This query computes the total revenue generated from successfully completed rides, providing key financial metrics.

Query 10. List all incomplete rides along with the reason:

SELECT 'Booking ID', 'Incomplete Rides Reason' FROM ola_booking_table WHERE 'Incomplete Rides'= 'Yes';

Explanation: This query lists all incomplete rides along with the reasons, helping identify and address the root causes of incomplete rides.

Data Analysis and Visualization with Power BI

After the Exploratory Data Analysis now we will export our data into MS Power BI for detailed analysis and visualization. Power BI helped us to transform raw data into interactive dashboards and reports by providing deep insights into ride-booking patterns.

Based on the dataset we will find the insights as shown below:

1. Ride Volume Over Time and Booking Status Breakdown

A time-series chart showing the number of rides per day/week.

A pie or doughnut chart displaying the proportion of different booking statuses (success, cancelled by the customer, cancelled by the driver, etc.)

overall
overall analysis

2. Top Vehicle Types by Ride Distance

This visualization highlights which vehicle types are most utilized in terms of ride distance. For instance, longer bars for "Prime SUV" or "Bike" may indicate their popularity for longer commutes or quick, short-distance trips. Businesses can leverage this information to allocate resources effectively and optimise vehicle availability based on demand trends.

vehicleType
Vehicle Type Analysis

3. Revenue by Payment Method

1. Revenue by Payment Method (Stacked Bar Chart)

A stacked bar chart categorizes total revenue by payment methods such as Cash, UPI and Credit Card. Each bar segment represents the contribution of a specific payment method to the overall revenue.

This visualization highlights the most popular payment methods and their impact on revenue by helping assess customer preferences and streamline payment processes.

2. Customer Spending Leaderboard:

A leaderboard visual ranks customers based on their total spending on bookings by listing the top contributors to revenue.

This helps identify high-value customers who contribute significantly to revenue, enabling targeted promotions or loyalty rewards to retain them.

3. Ride Distance Distribution (Histogram or Scatter Plot)

A histogram shows the frequency distribution of ride distances over different dates, while a scatter plot can depict individual ride distances against dates.

This visualization provides insights into ride patterns, including peak travel days and the variability of ride distances by helping in demand forecasting and route optimization.

revenues
Revenue Analysis

4. Reason for Cancelling Ride Either by Customer or Driver

A pie chart is used to visually represent the most prominent reasons why customers and drivers cancel rides.

The chart segments the data into proportions by highlighting which reason has the highest occurrence.

For customers, the leading reason for cancellations is often "Driver is not moving towards pickup location" reflecting potential delays or miscommunication.

For drivers, "Personal & Car related issues" is typically the top reason, showcasing operational challenges.

cancellation
Cncellation Analysis

5. Customer Rating and Driver Rating

Driver Ratings: These reflect customer satisfaction with the driver, including professionalism, driving skills, and overall service quality. High ratings indicate positive experiences, while low ratings highlight areas for improvement.

Customer Ratings: These are provided by drivers to evaluate the behavior and cooperation of customers during the ride. They help maintain service standards and identify problematic behaviors.

ratingIMG
Rating Analysis

Key Insights and Learnings

The below Power BI dashboard provides a quick overview of key performance metrics, including KPIs such as total bookings, total revenue, and total distance travelled, all of which are dynamically updated through a date slicer.

Users can easily filter data by date to make informed decisions based on the selected time period. The dashboard features card visualizations for each metric and a line chart to visualize trends over time, helping users track performance and spot patterns efficiently.

Ola_Dashboard
Dashboard

Through this project, we derived several key insights about Bengaluru’s ride-hailing data:

  • Booking Status Trends: Success rates were consistent but cancellations by customers peaked during certain times of the day, particularly during bad weather.
  • High Demand on Weekends and Match Days: The data confirmed higher booking volumes during weekends and on match days, which could influence ride-hailing strategies for these days.
  • Popular Vehicle Types: Autos and Prime Sedans saw the highest demand while eBikes were less frequent but growing in popularity.
  • Cancellation Insights: The most common reasons for cancellations were "Driver is not moving towards pickup location" and "Change of plans."
  • Incomplete Rides: Vehicle breakdowns were the leading cause of incomplete rides, followed by customer demands.

Conclusion

Overall, This data analysis project provided a comprehensive look into Bengaluru’s ride-hailing patterns, using SQL, Power BI, and Excel to handle large datasets and derive actionable insights. By adhering to the given constraints and using the appropriate tools, I was able to analyze booking trends, cancellations, ratings, and other key metrics.

These insights can help ride-hailing companies like OLA better understand customer behaviour, optimize driver assignments, and improve customer satisfaction. The project was also an excellent opportunity to strengthen your skills in SQL data management, Power BI visualization and Excel data manipulation.

For those interested in seeing the detailed analysis, I’ve uploaded the entire project, including the dataset and visualizations, to my GitHub repository. Feel free to check it out for more in-depth exploration and learning.


Next Article
Article Tags :

Similar Reads