
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Drop Columns in DataFrame by Label Names or Index Positions
A pandas data frame is a 2D data structure consisting of a series of entities. It is very useful in the analysis of mathematical data. The data is arranged in a tabular manner with each row behaving as an instance of the data.
A Pandas data frame is special because it is empowered with numerous functions making it a very powerful programming asset. Each column in a data frame represents a series of information which is labelled. In this article, we will operate on these columns and discuss the various methods to drop columns in a pandas data frame.
Dropping of a single or multiple columns can be achieved by either specifying the column name or with the help of their index value. We will understand both of these method but firstly we have to prepare a dataset and generate a data frame.
Creating The Data Frame
While creating a data frame we can assign column names and row names to our table. This procedure is important as it specify the "label names" and "index values".
Here, we imported the pandas library as "pd" and then passed the dataset using a dictionary of lists. Each key represents a column data and the value associated with it is passed in the form of a list. We created the data frame using pandas "DataFrame()" function. We assigned the row labels to the data frame with the help of "index" parameter. Now let's drop the columns using column names.
Example
import pandas as pd dataset = {"Employee ID":["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], "Age":[25, 28, 27, 26, 25], "Salary":[200000, 250000, 180000, 300000, 280000], "Role":["Junior Developer", "Analyst", "Programmer", "Senior Developer", "HR"]} dataframe = pd.DataFrame(dataset, index=["Nimesh", "Arjun", "Mohan", "Ritesh", "Raghav"]) print(dataframe)
Output
Employee ID Age Salary Role Nimesh CIR45 25 200000 Junior Developer Arjun CIR12 28 250000 Analyst Mohan CIR18 27 180000 Programmer Ritesh CIR50 26 300000 Senior Developer Raghav CIR28 25 280000 HR
Using Column Names and Drop() Method
After generating the data frame, we used the "dataframe.drop" method to remove the "Salary" and "Role" columns from the data frame. We passed these column names in a list.
We specified the "axis" value as 1 because we are operating on the column axis. At last, we stored this new data frame in a variable "colDrop" and printed it.
Example
import pandas as pd dataset = {"Employee ID":["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], "Age":[25, 28, 27, 26, 25], "Salary":[200000, 250000, 180000, 300000, 280000], "Role":["Junior Developer", "Analyst", "Programmer", "Senior Developer", "HR"]} dataframe = pd.DataFrame(dataset, index=["Nimesh", "Arjun", "Mohan", "Ritesh", "Raghav"]) print(dataframe) colDrop = dataframe.drop(["Role", "Salary"], axis=1) print("After dropping the Role and salary column:") print(colDrop)
Output
Employee ID Age Salary Role Nimesh CIR45 25 200000 Junior Developer Arjun CIR12 28 250000 Analyst Mohan CIR18 27 180000 Programmer Ritesh CIR50 26 300000 Senior Developer Raghav CIR28 25 280000 HR After dropping the Role and salary column: Employee ID Age Nimesh CIR45 25 Arjun CIR12 28 Mohan CIR18 27 Ritesh CIR50 26 Raghav CIR28 25
Using Index Values and Drop() Method
We can use the index positions to lock the columns that we want to remove.
Example
Here, we simply used the "dataframe.columns" method along with "dataframe.drop()" to specify the index positions of the columns to be dropped. We passed the "[[2,3]]" argument to drop the "Salary" and "role" columns.
Now that we have discussed both the basic methods for dropping columns, let's discuss some extended concepts.
colDrop = dataframe.drop(dataframe.columns[[2, 3]], axis=1) print("After dropping salary and role: -") print(colDrop)
Output
After dropping salary and role: - Employee ID Age Nimesh CIR45 25 Arjun CIR12 28 Mohan CIR18 27 Ritesh CIR50 26 Raghav CIR28 25
Dropping a Range of Columns from the Data Frame
In the above discussed examples, we only dropped specific columns (Salary& Role) but as we all know pandas offers numerous facilities to the programmer and therefore we can use it to create a range of columns to be dropped. Let's implement this logic.
Using iloc() Function
After generating the data frame, we used the "iloc() function" to select a range of columns and remove it from the data frame. The "iloc()" function takes an index range for both rows and columns. The range for rows was set to "[0:0]" and for columns it was "[1:4]". Finally we use "dataframe.drop()" method to drop these columns.
Example
import pandas as pd dataset = {"Employee ID":["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], "Age":[25, 28, 27, 26, 25], "Salary":[200000, 250000, 180000, 300000, 280000], "Role":["Junior Developer", "Analyst", "Programmer", "Senior Developer", "HR"]} dataframe = pd.DataFrame(dataset, index=["Nimesh", "Arjun", "Mohan", "Ritesh", "Raghav"]) print(dataframe) colDrop = dataframe.drop(dataframe.iloc[0:0, 1:4],axis=1) print("Dropping a range of columns from 'Age' to 'Role' using iloc() function") print(colDrop)
Output
Employee ID Age Salary Role Nimesh CIR45 25 200000 Junior Developer Arjun CIR12 28 250000 Analyst Mohan CIR18 27 180000 Programmer Ritesh CIR50 26 300000 Senior Developer Raghav CIR28 25 280000 HR Dropping a range of columns from 'Age' to 'Role' using iloc() function Employee ID Nimesh CIR45 Arjun CIR12 Mohan CIR18 Ritesh CIR50 Raghav CIR28
Using loc() Function
If we want to use labels instead of indices for creating a range, we use "loc() function".
Example
We created a range with the help of "loc()" function. Unlike iloc(), it includes the last column. The "loc()" function selects the columns by taking the column names as the argument. At last, we printed the new data frame with the remaining columns.
colDrop = dataframe.drop(dataframe.loc[:, "Age": "Role"].columns, axis=1) print("Dropping a range of columns from Age to Role using loc() fucntion") print(colDrop)
Output
Employee ID Age Salary Role Nimesh CIR45 25 200000 Junior Developer Arjun CIR12 28 250000 Analyst Mohan CIR18 27 180000 Programmer Ritesh CIR50 26 300000 Senior Developer Raghav CIR28 25 280000 HR Dropping a range of columns from Age to Role using loc() fucntion Employee ID Nimesh CIR45 Arjun CIR12 Mohan CIR18 Ritesh CIR50 Raghav CIR28
Conclusion
This article focuses on the simple operation of dropping columns from a pandas data frame. We discussed the two techniques i.e., "dropping by label names" and "dropping by index values". We also used "loc()" and "iloc()" functions and acknowledged their application on a pandas data frame.