SELECT DISTINCT vs GROUP BY in MySQL
Last Updated :
08 Feb, 2024
In MySQL, the two most common ways of managing and retrieving unique values are with SELECT and GROUP BY. However, they are used for different reasons. With SELECT, we can get different values from the same column, so we don’t have to worry about duplicates. With GROUP BY, we can aggregate data and group results based on specific columns. However, there are some differences between the two operators. Both can be used to generate the same output. But we need to know the difference for better utilization of resources and time.
SELECT DISTINCT
The SELECT DISTINCT statement is used to retrieve unique values from a single column or a combination of columns in a table. It is often used when you want to eliminate duplicate rows from the result set.
Syntax:
SELECT DISTINCT column1, column2
FROM table_name
Parameters Used:
- column1, column2: Names of the fields of the table.
- table_name: Table from where we want to fetch the records.
GROUP BY
The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows, like calculating aggregates (e.g., COUNT, SUM, AVG) for each group. It is typically used with aggregate functions.
Syntax:
SELECT column1, column2, ..., aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column1, column2, ...;
Parameters Used:
- column1, column2, etc.: Columns by which you want to group the rows.
- aggregate_function(column_name): Aggregate functions like COUNT, SUM, AVG, etc., applied to the grouped rows.
- table_name: The name of the table you are querying.
- condition: Optional. Conditions to filter the rows before they are grouped.
- GROUP BY column1, column2, ...: Specifies the columns used for grouping. Rows with the same values in these columns will be grouped together.
Let's understand the difference between SELECT DISTINCT and GROUP BY through some examples:
Example of SELECT DISTINCT vs GROUP BY in MySQL
Let's assume we have a customers table and an orders table. We are going to use these tables to show how DISTINCT and GROUP BY can be used for different use cases:
Query:
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
city VARCHAR(255) NOT NULL
);
INSERT INTO customers (customer_id, name, city) VALUES
(1, 'John Doe', 'New York'),
(2, 'Jane Smith', 'London'),
(3, 'Mike Brown', 'Paris'),
(2, 'Jane Smith', 'London');
Output:
customers tableOrders Table
Query:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL,
product VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
INSERT INTO orders (order_id, customer_id, product, price) VALUES
(1, 1, 'Phone', 100.00),
(2, 2, 'Laptop', 500.00),
(3, 1, 'Tablet', 200.00),
(4, 2, 'Watch', 150.00);
Output:
orders tableExample 1: Find distinct customer cities
Here, we want to get names of distinct cities only, so DISTINCT will be the obvious choice here.
Use SELECT DISTINCT to retrieve unique cities where customers reside:
SELECT DISTINCT city
FROM customers;
Output:
Distinct clause output
Explanation: The SQL query retrieves unique values from the "city" column in the "customers" table. The output provides a list of distinct cities where customers are located, eliminating duplicate entries and showcasing the unique cities in the dataset.
Example 2: Count Orders Per Customer City
Here, we also want to count the number of orders per city. Since an aggregate function(COUNT) is involved here, we will be using GROUP BY.
Use GROUP BY to group customers by city and count their orders:
SELECT city, COUNT(*) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;
Output:
order by Explanation: The SQL query joins the "customers" and "orders" tables on the customer_id, grouping the results by city. It counts the number of orders for each city, providing an output that displays the distinct cities along with the corresponding count of orders placed by customers in each city.
Key Differences Observed in This Example
- SELECT DISTINCT: Returns only unique city values, ignoring duplicates.
- GROUP BY: Groups customers by city and counts their orders, showing duplicate city entries with individual counts.
- Aggregate function: COUNT(*) is used with GROUP BY to summarize data within each group.
Purposes and Functionalities
SELECT DISTINCT:
- Removes duplicate rows: It eliminates rows where all the selected columns have identical values.
- No aggregation: Cannot be used with aggregate functions like SUM, AVG, etc.
- Simpler syntax: Easier to write and understand for basic non-duplication.
- Performance: Can be faster than GROUP BY if no indexes are used, as it avoids sorting.
GROUP BY:
- Grouping data: Creates groups of rows based on shared values in one or more columns.
- Aggregation: Used with aggregate functions to summarize data within each group (e.g., COUNT, MAX, AVG).
- More complex: Requires specifying group columns and aggregate functions.
- Sorting: Generally involves sorting data, which can impact performance.
Choosing between them
- Use SELECT DISTINCT when you simply want to eliminate duplicate rows without summarizing data.
- Use GROUP BY when you need to group data and perform aggregations within each group.
Feature
| SELECT DISTINCT
| GROUP BY
|
---|
Purpose
| Remove duplicate rows
| Group data and aggregate
|
---|
Aggregate functions
| No
| Yes
|
---|
Sorting
| No (optional)
| Yes (default)
|
---|
Performance
| Can be faster (no index)
| Slower (sorting)
|
---|
Syntax
| Simpler
| More Complex
|
---|
Conclusion
In MySQL, SELECT DISTINCT is used to retrieve unique values from one or more columns in a result set, eliminating duplicates without aggregation. Conversely, GROUP BY is employed to group rows sharing common values in specified columns, enabling aggregation functions to summarize data within each group. While SELECT DISTINCT is suitable for obtaining unique values, GROUP BY is essential for performing complex aggregations and generating summary results based on grouping criteria, each serving distinct purposes in MySQL query operations.
Similar Reads
SQL Interview Questions
Are you preparing for a SQL interview? SQL is a standard database language used for accessing and manipulating data in databases. It stands for Structured Query Language and was developed by IBM in the 1970s, SQL allows us to create, read, update, and delete data with simple yet effective commands.
15+ min read
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
SQL Tutorial
SQL is a Structured query language used to access and manipulate data in databases. SQL stands for Structured Query Language. We can create, update, delete, and retrieve data in databases like MySQL, Oracle, PostgreSQL, etc. Overall, SQL is a query language that communicates with databases.In this S
11 min read
SQL Commands | DDL, DQL, DML, DCL and TCL Commands
SQL commands are crucial for managing databases effectively. These commands are divided into categories such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), Data Query Language (DQL), and Transaction Control Language (TCL). In this article, we will e
7 min read
SQL Joins (Inner, Left, Right and Full Join)
SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
6 min read
Normal Forms in DBMS
In the world of database management, Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and maintaining data integrity. When working with databases, especially relational databases, it is critical to follow normalization techniques that help to eliminate
8 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
ACID Properties in DBMS
In the world of Database Management Systems (DBMS), transactions are fundamental operations that allow us to modify and retrieve data. However, to ensure the integrity of a database, it is important that these transactions are executed in a way that maintains consistency, correctness, and reliabilit
8 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Backpropagation in Neural Network
Backpropagation is also known as "Backward Propagation of Errors" and it is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network. In this article we will explore what
10 min read