
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Drop Empty Columns in Pandas
Pandas data frame is a very powerful data manipulation tool. It is a tabular data structure consisting of rows and columns. The size of this 2-D matrix can be variable depending upon the complexity of the dataset. We can use different type of sources to create a data frame ranging from databases to files.
The columns in a pandas data frame represents a series of information and it can be an integer, float, or string. We can perform numerous operations on these columns including deletion, indexing, filtering etc. In this article, we will perform one such basic operation of dropping/removing of empty columns from a pandas data frame.
Firstly, let's understand what empty columns are in a data frame.
Creating the Data Frame with Empty Columns
We create a data frame for analysing the data by taking advantage of the computing techniques. Each column consists of a piece of data and it holds some significance. In case of complex datasets, the generated data frame might contain some empty columns which degrades the relevance of the data frame. In order to produce an optimized data frame, we tend to eliminate this kind of unnecessary data from it.
If a column consists of "NaN" (Not a number) values, then it is considered as "empty". A column consisting of "empty spaces" and "zero" values are not "empty" in nature because an "empty space" and a "zero value" both signifies something about the dataset.
When we create a data frame and do not pass any data to the column, an empty column is created. We can drop both regular and empty columns with the help of "dataframe.drop()" method but for specific dropping of empty columns we use "dataframe.dropna()" method. Let's create a data frame with "NaN" values and then begin with the dropping operation.
Example
We imported the "pandas" and "numpy" libraries and then passed a dictionary dataset consisting of information related to different hostels.
We created the data frame with the help of "DataFrame()" function and passed a list of values for labelling the rows.
In the dataset we assigned NaN values to the "Hostel location" column with the help of numpy library and finally printed the data frame.
import pandas as pd import numpy as np dataset = {"Hostel ID":["DSC224", "DSC124", "DSC568", "DSC345"], "Hostel Rating":[8, 6, 10, 5], "Hostel price":[35000, 32000, 50000, 24000], "Hostel location": [np.nan, np.nan, np.nan, np.nan]} dataframe = pd.DataFrame(dataset, index= ["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"]) print(dataframe)
Output
Hostel ID Hostel Rating Hostel price Hostel location Hostel 1 DSC224 8 35000 NaN Hostel 2 DSC124 6 32000 NaN Hostel 3 DSC568 10 50000 NaN Hostel 4 DSC345 5 24000 NaN
Using dropna() Method to Drop Empty Columns
Let's apply dropna() method to the pervious data frame.
Example
After creating the data frame, we used the "dropna()" function to drop all the columns with NaN values.
Since we are operating on the columns, we specified the axis value as "1" and then the dropping logic was programmed by assigning the "how" value as "all". It means that a column will be dropped only if all of its values are "NaN".
At last, we created and printed a new data frame with non "NaN" values.
import pandas as pd import numpy as np dataset = {"Hostel ID":["DSC224", "DSC124", "DSC568", "DSC345"], "Hostel Rating":[8, 6, 10, 5], "Hostel price":[35000, 32000, 50000, 24000], "Hostel location": [np.nan, np.nan, np.nan, np.nan]} dataframe = pd.DataFrame(dataset, index= ["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"]) print(dataframe) Emp_drop = dataframe.dropna(how= "all", axis=1) print("After dropping the empty columns using dropna() we get: -") print(Emp_drop)
Output
Hostel ID Hostel Rating Hostel price Hostel location Hostel 1 DSC224 8 35000 NaN Hostel 2 DSC124 6 32000 NaN Hostel 3 DSC568 10 50000 NaN Hostel 4 DSC345 5 24000 NaN After dropping the empty columns using dropna() we get: - Hostel ID Hostel Rating Hostel price Hostel 1 DSC224 8 35000 Hostel 2 DSC124 6 32000 Hostel 3 DSC568 10 50000 Hostel 4 DSC345 5 24000
Note ? If we want to make changes to the current data frame instead of creating a new one, we use the "inplace" clause.
dataframe.dropna(how= "all", axis=1, inplace=True) print(dataframe)
Using notnull() Method to Drop Empty Columns
After creating the data frame, we used the notnull() method along with the loc() function to filter and select those columns with "NaN" values. We specified the axis of evaluation and printed the data frame with non "NaN" values.
Example
import pandas as pd import numpy as np dataset = {"Hostel ID":["DSC224", "DSC124", "DSC568", "DSC345"], "Hostel Rating":[8, 6, 10, 5], "Hostel price":[35000, 32000, 50000, 24000], "Hostel location": [np.nan, np.nan, np.nan, np.nan]} dataframe = pd.DataFrame(dataset, index= ["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"]) print(dataframe) dataframe = dataframe.loc[:, dataframe.notnull().any(axis=0)] print("Using notnull() method to remove empty columns: -") print(dataframe)
Output
Hostel ID Hostel Rating Hostel price Hostel location Hostel 1 DSC224 8 35000 NaN Hostel 2 DSC124 6 32000 NaN Hostel 3 DSC568 10 50000 NaN Hostel 4 DSC345 5 24000 NaN Using notnull() method to remove empty columns: - Hostel ID Hostel Rating Hostel price Hostel 1 DSC224 8 35000 Hostel 2 DSC124 6 32000 Hostel 3 DSC568 10 50000 Hostel 4 DSC345 5 24000
Conclusion
In this article, we strolled through the different methods of dropping empty columns i.e., columns consisting of "NaN" values. We discussed about the "dropna()" method and "notnull()" method and how they are implemented to remove empty columns from the data frame. We also understood the importance of getting rid of this unnecessary data and how it increases the relevance of the data frame.