Have you ever thought about how ride-hailing companies manage large amounts of booking data, how to analyze customer behaviour and decide on discounts to offer? In this blog, we will do an in-depth analysis of Bengaluru ride data using a large dataset of 50,000 Ola bookings.
It covers essential aspects like booking statuses, cancellations, ride distances, payment methods, and ratings. With SQL for data preparation and Power BI for visualization, we uncover trends like high weekend demand, popular vehicle types, and reasons for cancellations.
These insights explain how data-driven decisions help companies enhance customer satisfaction by optimizing operations and improving service quality.
The Dataset
The Ola dataset contains 50,000 rows of ride-booking data for Bengaluru over one month. This dataset contains a variety of data related to ride bookings, customer and driver interactions, cancellations and more. The overview of columns are:
- Date: The date of booking.
- Time: The time at which the booking was made.
- Booking ID: A unique 10-digit identifier, prefixed by "CNR" (e.g., CNR1234567890).
- Booking Status: The status of the booking (Successful/Cancelled).
- Customer ID: Unique identifier for each customer.
- Vehicle Type: Different types of vehicles used for the ride (e.g., Auto, Prime Plus, etc.).
- Pickup Location: Dummy locations from 50 different areas in Bengaluru.
- Drop Location: Another set of locations, chosen from the list of pickup locations.
- Avg VTAT (Vehicle Time Arrival Time): The average time taken for the vehicle to arrive at the pickup location.
- Avg CTAT (Customer Time Arrival Time): The average time taken for the customer to arrive at the pickup location.
- Cancelled Rides by Customer: If the ride was cancelled by the customer.
- Reason for Cancelling by Customer: Various reasons for cancellation (e.g., AC not working, change of plans).
- Cancelled Rides by Driver: Cancellations by the driver, often due to personal or vehicle-related issues.
- Incomplete Rides: Cases where the ride was incomplete for some reason.
- Incomplete Rides Reason: Reasons for incomplete rides, such as vehicle breakdown or customer demand.
- Booking Value: The fare for the ride.
- Payment Method: The method used to pay (Cash/UPI/Card/Wallet).
- Ride Distance: Distance of the ride in kilometres.
- Driver Ratings: Ratings given to the driver by the customer.
- Customer Rating: Ratings given by the driver to the customer.
You can download the dataset from the GitHub Repository.
Data Constraints and Project Goals
As part of the project, we will do some was tasked with ensuring specific constraints in the data to real-world conditions:
- Booking Status: 62% of the rides should have a "Success" status with the remaining 38% being cancellations.
- Cancelled Rides by Customer: No more than 7% of the bookings should be cancelled by customers.
- Cancelled Rides by Driver: No more than 18% of the bookings should be cancelled by drivers.
- Incomplete Rides: The percentage of incomplete rides should be under 6%.
- Weekend and Match Day Orders: The number of orders should be higher on weekends and match days. I marked specific dates as match days.
- Order Values: Around 70% of orders should have a fare under ₹500, while 28% should be above ₹500, with the remainder above ₹100.
- Food Category: The dataset reflects 67% of orders being from the food category, a common service in Bengaluru.
Data Preparation with SQL
The first step in the project is to create the dataset using SQL which helped us to handle large volumes of data efficiently. We will create one database which called Oladb on the MySQL Workbench.
We will import the Bengaluru_Ola_Booking_Data.csv file to extract the dataset from the CSV files to SQL Workbench to perform EDA (Exploratory Data Analysis).
If you don’t know how to import CSV files to MySQL Workbench then follow this. Below is the overview of ola_booking_table.
Exploration Data Analysis (EDA)
Let's perform an Exploration Data Analysis (EDA) to find insight that can help to take decisions:
Query 1. Retrieve all successful bookings:
SELECT * FROM ola_booking_table WHERE Booking_Status = 'Success';
Explanation: This query retrieves all the booking details where the status is marked as "Success," providing insights into completed rides.
Query 2. Find the average ride distance for each vehicle type:
SELECT 'Vehicle Type', AVG(‘Ride Distance’) as avg_distance FROM ola_booking_table GROUP BY 'Vehicle Type';
Explanation: This query calculates the average ride distance for each type of vehicle, helping analyze performance and customer preferences by vehicle category.
Query 3. Get the total number of cancelled rides by customers:
SELECT COUNT(*) FROM ola_booking_table WHERE 'Booking Status' = 'cancelled by Customer';
Explanation: This query counts the total number of rides cancelled by customers, useful for understanding customer behaviour and cancellation trends.
Query 4. List the top 5 customers who booked the highest number of rides:
SELECT 'Customer ID', COUNT('Booking ID') as total_rides FROM ola_booking_table
GROUP BY 'Customer ID'
ORDER BY total_rides DESC
LIMIT 5;
Explanation: This query identifies the top 5 customers based on the number of rides booked, highlighting the most active users.
Query 5. Get the number of rides cancelled by drivers due to personal and car-related issues:
SELECT COUNT(*) FROM ola_booking_table
WHERE 'cancelled Rides by Driver' ='Personal & Car related issue';
Explanation: This query calculates the number of rides cancelled by drivers due to personal or car-related reasons, helping identify operational issues.
Query 6. Find the maximum and minimum driver ratings for Prime Sedan bookings:
SELECT MAX('Driver Ratings') as max_rating, MIN('Driver Ratings') as min_rating FROM ola_booking_table
WHERE 'Vehicle Type' = 'Prime Sedan';
Explanation: This query finds the highest and lowest driver ratings for Prime Sedan bookings, helping assess service quality for this vehicle type.
Query 7. Retrieve all rides where payment was made using UPI:
SELECT * FROM ola_booking_table WHERE 'Payment Method' = 'UPI';
Explanation: This query retrieves all ride details where the payment method was made using UPI by providing insights into the popularity of digital payment modes.
Query 8. Find the average customer rating per vehicle type:
SELECT 'Vehicle Type', AVG('Customer Rating') as avg_customer_rating FROM ola_booking_table
GROUP BY 'Vehicle Type';
Explanation: This query calculates the average customer rating for each vehicle type, helping evaluate customer satisfaction across different categories.
Query 9. Calculate the total booking value of rides completed successfully:
SELECT SUM('Booking Value') as total_successful_value FROM ola_booking_table WHERE
'Booking Status' = 'Success';
Explanation: This query computes the total revenue generated from successfully completed rides, providing key financial metrics.
Query 10. List all incomplete rides along with the reason:
SELECT 'Booking ID', 'Incomplete Rides Reason' FROM ola_booking_table WHERE 'Incomplete Rides'= 'Yes';
Explanation: This query lists all incomplete rides along with the reasons, helping identify and address the root causes of incomplete rides.
Data Analysis and Visualization with Power BI
After the Exploratory Data Analysis now we will export our data into MS Power BI for detailed analysis and visualization. Power BI helped us to transform raw data into interactive dashboards and reports by providing deep insights into ride-booking patterns.
Based on the dataset we will find the insights as shown below:
1. Ride Volume Over Time and Booking Status Breakdown
A time-series chart showing the number of rides per day/week.
A pie or doughnut chart displaying the proportion of different booking statuses (success, cancelled by the customer, cancelled by the driver, etc.)
overall analysis2. Top Vehicle Types by Ride Distance
This visualization highlights which vehicle types are most utilized in terms of ride distance. For instance, longer bars for "Prime SUV" or "Bike" may indicate their popularity for longer commutes or quick, short-distance trips. Businesses can leverage this information to allocate resources effectively and optimise vehicle availability based on demand trends.
Vehicle Type Analysis3. Revenue by Payment Method
1. Revenue by Payment Method (Stacked Bar Chart)
A stacked bar chart categorizes total revenue by payment methods such as Cash, UPI and Credit Card. Each bar segment represents the contribution of a specific payment method to the overall revenue.
This visualization highlights the most popular payment methods and their impact on revenue by helping assess customer preferences and streamline payment processes.
2. Customer Spending Leaderboard:
A leaderboard visual ranks customers based on their total spending on bookings by listing the top contributors to revenue.
This helps identify high-value customers who contribute significantly to revenue, enabling targeted promotions or loyalty rewards to retain them.
3. Ride Distance Distribution (Histogram or Scatter Plot)
A histogram shows the frequency distribution of ride distances over different dates, while a scatter plot can depict individual ride distances against dates.
This visualization provides insights into ride patterns, including peak travel days and the variability of ride distances by helping in demand forecasting and route optimization.
Revenue Analysis4. Reason for Cancelling Ride Either by Customer or Driver
A pie chart is used to visually represent the most prominent reasons why customers and drivers cancel rides.
The chart segments the data into proportions by highlighting which reason has the highest occurrence.
For customers, the leading reason for cancellations is often "Driver is not moving towards pickup location" reflecting potential delays or miscommunication.
For drivers, "Personal & Car related issues" is typically the top reason, showcasing operational challenges.
Cncellation Analysis5. Customer Rating and Driver Rating
Driver Ratings: These reflect customer satisfaction with the driver, including professionalism, driving skills, and overall service quality. High ratings indicate positive experiences, while low ratings highlight areas for improvement.
Customer Ratings: These are provided by drivers to evaluate the behavior and cooperation of customers during the ride. They help maintain service standards and identify problematic behaviors.
Rating AnalysisKey Insights and Learnings
The below Power BI dashboard provides a quick overview of key performance metrics, including KPIs such as total bookings, total revenue, and total distance travelled, all of which are dynamically updated through a date slicer.
Users can easily filter data by date to make informed decisions based on the selected time period. The dashboard features card visualizations for each metric and a line chart to visualize trends over time, helping users track performance and spot patterns efficiently.
Dashboard Through this project, we derived several key insights about Bengaluru’s ride-hailing data:
- Booking Status Trends: Success rates were consistent but cancellations by customers peaked during certain times of the day, particularly during bad weather.
- High Demand on Weekends and Match Days: The data confirmed higher booking volumes during weekends and on match days, which could influence ride-hailing strategies for these days.
- Popular Vehicle Types: Autos and Prime Sedans saw the highest demand while eBikes were less frequent but growing in popularity.
- Cancellation Insights: The most common reasons for cancellations were "Driver is not moving towards pickup location" and "Change of plans."
- Incomplete Rides: Vehicle breakdowns were the leading cause of incomplete rides, followed by customer demands.
Conclusion
Overall, This data analysis project provided a comprehensive look into Bengaluru’s ride-hailing patterns, using SQL, Power BI, and Excel to handle large datasets and derive actionable insights. By adhering to the given constraints and using the appropriate tools, I was able to analyze booking trends, cancellations, ratings, and other key metrics.
These insights can help ride-hailing companies like OLA better understand customer behaviour, optimize driver assignments, and improve customer satisfaction. The project was also an excellent opportunity to strengthen your skills in SQL data management, Power BI visualization and Excel data manipulation.
For those interested in seeing the detailed analysis, I’ve uploaded the entire project, including the dataset and visualizations, to my GitHub repository. Feel free to check it out for more in-depth exploration and learning.
Similar Reads
SQL for Data Analysis
SQL (Structured Query Language) is an indispensable tool for data analysts, providing a powerful way to query and manipulate data stored in relational databases. With its ability to handle large datasets and perform complex operations, SQL has become a fundamental skill for anyone involved in data a
7 min read
Why Data Analysis is Important?
DData Analysis involves inspecting, transforming, and modeling data to discover useful information, inform conclusions, and support decision-making. It encompasses a range of techniques and tools used to interpret raw data, identify patterns, and extract actionable insights. Effective data analysis
5 min read
Healthcare Data Analysis using SQL
Healthcare data analysis plays a vital role in enhancing patient care, improving hospital efficiency and managing financial operations. By utilizing Power BI, healthcare professionals and administrators can gain valuable insights into patient demographics, medical conditions, hospital performance, a
7 min read
Data analysis using R
Data Analysis is a subset of data analytics, it is a process where the objective has to be made clear, collect the relevant data, preprocess the data, perform analysis(understand the data, explore insights), and then visualize it. The last step visualization is important to make people understand wh
10 min read
SQL vs R - Which to use for Data Analysis?
Data Analysis, as the name suggests, means the evaluation or examination of the data, in Laymanâs terms. The answer to the question as to why Data Analysis is important lies in the fact that deriving insights from the data and understanding them are extremely crucial for organizations and businesses
5 min read
What is Data Analytics?
Data analytics, also known as data analysis, is a crucial component of modern business operations. It involves examining datasets to uncover useful information that can be used to make informed decisions. This process is used across industries to optimize performance, improve decision-making, and ga
9 min read
Walmert Sales Data Analysis using SQL
Walmart is one of the largest retail chains globally which offers a wide range of products at competitive prices. It is known for its large network of stores and online platforms. It serves millions of customers worldwide by providing everything from groceries to electronics. Walmart can refine its
8 min read
How to Write Data Analysis Reports
Reports on data analysis are essential for communicating data-driven insights to decision-makers, stakeholders, and other pertinent parties. These reports provide an organized format for providing conclusions, analyses, and suggestions derived from data set analysis. In this guide, we will learn how
9 min read
Top 10 SQL Projects For Data Analysis
SQL stands for Structured Query Language and is a standard database programming language that is used in data analysis and to access data in databases. It is a popular query language that is used in all types of devices. SQL is a fundamental tool for data scientists to extract, manipulate, and analy
9 min read
Data Analysis Examples
Data analysis stands as the cornerstone of informed decision-making in today's data-driven world, driving innovation and yielding actionable insights across industries. From healthcare and finance to retail and urban planning, the applications of data analysis are vast and transformative. In this in
7 min read