Sort CSV by Multiple Columns in Python



In Python, to sort a CSV file by multiple columns, we can use the 'sort_values()' Method provided by the Python Pandas library. This method is used for sorting the values by taking column names as arguments. Some of the common methods for sorting a CSV file by multiple columns are as follows.

  • sort_values() : To sort a DataFrame by multiple columns.

  • sort_values() without inplace : To sort a DataFrame by multiple columns without modifying the original DataFrame.

  • sort_index() : To sort by the index after sorting by the specified columns.

Using 'sort_values()'

The 'sort_values()' method will directly modify the original DataFrame to make it efficient by sorting multiple columns.

Example

In the below example code, the 'pd.read_csv()' function loads the CSV file into a Pandas DataFrame. Sorting by multiple columns using 'sort_values()' by passing a list of column names. The original dataframe is modified directly without creating a new DataFrame by setting inplace=True.

import pandas as pd

# Read the input CSV file
dataFrame = pd.read_csv("C:\Users\amit_\Desktop\SalesRecords.csv")

# Sort by multiple columns: Reg_Price and Car
dataFrame.sort_values(["Reg_Price", "Car"], axis=0, ascending=True, inplace=True, na_position='first')

print("\nSorted CSV file (according to multiple columns) = \n", dataFrame)

Input CSV File

Car Date_of_Purchase Reg_Price
0 BMW 10/10/2020 1000
1 Lexus 10/12/2020 750
2 Audi 10/17/2020 750
3 Jaguar 10/16/2020 1500
4 Mustang 10/19/2020 1100
5 Lamborghini 10/22/2020 1000

Sorted CSV File

Car Date_of_Purchase Reg_Price
2 Audi 10/17/2020 750
1 Lexus 10/12/2020 750
0 BMW 10/10/2020 1000
5 Lamborghini 10/22/2020 1000
4 Mustang 10/19/2020 1100
3 Jaguar 10/16/2020 1500

Using 'sort_values()' without in place

Using the sort_values() method without setting the parameter inplace=True, returns a new Dataframe which is sorted by specified columns. The original DataFrame will remain stable.

Example

In the below example code without setting the inplace=True, the specified columns are sorted using sort_values() and the result is added to a new DataFrame.

import pandas as pd

# Read the input CSV file
dataFrame = pd.read_csv("C:\Users\amit_\Desktop\SalesRecords.csv")

# Sort by multiple columns without modifying the original DataFrame
sortedDataFrame = dataFrame.sort_values(["Reg_Price", "Car"], axis=0, ascending=True, na_position='first')

print("\nSorted CSV file (according to multiple columns) = \n", sortedDataFrame)

Sorted CSV file

Car Date_of_Purchase Reg_Price
2 Audi 10/17/2020 750
1 Lexus 10/12/2020 750
0 BMW 10/10/2020 1000
5 Lamborghini 10/22/2020 1000
4 Mustang 10/19/2020 1100
3 Jaguar 10/16/2020 1500

Using 'sort_index()'

The sort_index() method is used to sort a DataFrame by using its index. After sorting by multiple columns, we can also sort in both ascending or descending order by using the sort_index() method.

Example

import pandas as pd

# Read the input CSV file
dataFrame = pd.read_csv("C:\Users\amit_\Desktop\SalesRecords.csv")

# Sort by multiple columns
sortedDataFrame = dataFrame.sort_values(["Reg_Price", "Car"], axis=0, ascending=True, na_position='first')

# Further sort by index (optional)
sortedDataFrame = sortedDataFrame.sort_index()

print("\nSorted CSV file (according to multiple columns and index) = \n", sortedDataFrame)

Sorted CSV file

Car Date_of_Purchase Reg_Price
0 BMW 10/10/2020 1000
1 Lexus 10/12/2020 750
2 Audi 10/17/2020 750
3 Jaguar 10/16/2020 1500
4 Mustang 10/19/2020 1100
5 Lamborghini 10/22/2020 1000
Updated on: 2024-10-17T12:52:51+05:30

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements