
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Fill NA in Multiple Columns In-Place in Python Pandas
Python has an open-source built-in library called Pandas for data analysis and manipulation. It has a well-defined data structure called DataFrame, similar to a table. It can also be used for writing and reading data from various types of files like CSV, Excel, SQL databases, etc.
fillna() is a method which is used to fill missing (NaN/Null) values in a Pandas DataFrame or Series. The missing values are filled with a definite value or another specified method along with the method call.
Syntax
object_name.fillna(value, method, limit, axis, inplace, downcast)
The fillna() method returns the same input DataFrame or Series with the missing values filled.
Example 1
We use fillna() to fill missing values in a pandas DataFrame and a CSV file. The fillna() method with the same parameters can be used for both the objects.
Algorithm
-
Step 1 ?Identify the missing values (NaN/Null) in the specified DataFrame or Series.
-
Step 2 ? Based on the arguments passed to the fillna() method fill in the identified missing values. If an integer value is passed, it will be used to replace all missing values. If a method is passed, it will be used to fill missing values. Also, fill in the values on the axis and downcast mentioned.
-
Step 3 ? Return a new DataFrame or Series with the missing values filled.
import pandas as pd # Create a sample DataFrame df = pd.DataFrame({'C1': [5, 23, 33, np.NaN], 'C2': [26, np.NaN, 7, 18], 'C3': [11, 30, np.NaN,112]}) print(df) #Or read a dataset from a csv or any other file df1=pd.read_csv("sample_data.csv") # Fill NaN values in C1 and C2 with 0, and in C3 with 1 df.fillna(value={'C1': 0, 'C2': 0, 'C3': 1}, inplace=True) #Filling NaN values in df1 with a random integer df1.fillna(111) # Print the updated DataFrame to see the difference print(df)
Output
#Before filling missing values C1 C2 C3 0 5.0 NaN 11.0 1 23.0 89.0 30.0 2 33.0 7.0. NaN 3 NaN 18.0 112.0 #After filling missing values C1 C2 C3 0 5.0 0.0 11.0 1 23.0 89.0 30.0 2 33.0 7.0 1.0 3 0.0 18.0 112.0
Example 2
We'll be working with a dataset containing information about school students, and we will use the fillna() method to fill in missing values with the mean of the column values. We randomly take up the dataset rather than importing from the CSV file, as in Example 1.
import numpy as np import pandas as pd # Create a sample DataFrame with missing values data = { 'RollNo': [1, 2, 3, 4, 5], 'Age': [10, np.NaN, 5, 8, 12], 'Marks': [100, 200,np.NaN, 150,np.NaN] } data= pd.DataFrame(data) #Original DataFrame with missing values print(data) # Fill missing values with mean values data1 = data.fillna(data.mean()) print(data1)
Output
RollNo Age Marks 0 1 10.0 100.0 1 2 NaN 200.0 2 3 5.0 NaN 3 4 8.0 150.0 4 5 12.0 NaN RollNo Age Marks 0 1 10.00 100.0 1 2 8.75 200.0 2 3 5.00 150.0 3 4 8.00 150.0 4 5 12.00 150.0
Conclusion
You can use the fillna() method in Pandas to fill missing values in single or multiple columns of a DataFrame, or can be used to fill missing values in a series too. You can specify the value to be used for filling and how to fill the values with various arguments.
Pandas have other methods like replace(), which replaces the missing values with mean, median, mode, or any such values. The difference between the two is that fillna() is specifically designed to handle missing values whereas replace is more universal and can be used to fill any values in the object. Thus making the fillna() method a better choice to deal with missing values in your data.