How to count number of NaN values in Pandas?
Last Updated :
15 Nov, 2024
Let's discuss how to count the number of NaN values in Pandas DataFrame. In Pandas, NaN (Not a Number) values represent missing data in a DataFrame.
Counting NaN values of Each Column of Pandas DataFrame
To find the number of missing (NaN) values in each column, use the isnull() function followed by sum(). This will provide a count of NaN values per column.
Python
import pandas as pd
import numpy as np
# Example dataset
data = {
'A': [1, 2, np.nan, 4],
'B': [np.nan, 2, np.nan, 3],
'C': [1, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(data)
# Count NaNs in each column
column_nan_count = df.isnull().sum()
print("NaN count per column:")
print(column_nan_count)
Output:
Pandas DataFrame with NaN Values NaN count per column:
A 1
B 2
C 3
dtype: int64
Counting NaN Values of Specific Rows
To count NaNs in specific rows, use loc
or iloc to select the row and then call isnull().sum()
.
Python
# Count NaNs in the first row
row_nan_count = df.iloc[0].isnull().sum()
print("NaN count in the first row:", row_nan_count)
Output:
NaN count in the first row: 1
Counting NaN Values in the Entire DataFrame
To get the total count of NaN values across the entire DataFrame, use isnull().sum().sum()
. This performs a summation of NaNs per column, then sums these totals to get an overall count.
Python
# Count total NaNs in the DataFrame
total_nan_count = df.isnull().sum().sum()
print("Total NaN count:", total_nan_count)
Output:
Total NaN count: 6
Using isna()
as an Alternative
The isna()
function works similarly to isnull()
for detecting NaN values, and you can use it interchangeably for the same results.
Python
# Using isna() to count NaNs in each column
column_nan_count_isna = df.isna().sum()
print("NaN count per column using isna():")
print(column_nan_count_isna)
Output:
NaN count per column using isna():
A 1
B 2
C 3
dtype: int64
Using describe() to find non-NaN Values in Each Column
The describe()
method provides a quick overview of each column, including the non-NaN count. Subtracting this count from the total number of rows can give you the NaN count.
Python
# Using describe() for additional insights
non_nan_count = df.describe().loc['count']
nan_count_using_describe = len(df) - non_nan_count
print("NaN count per column using describe():")
print(nan_count_using_describe)
Output:
NaN count per column using describe():
A 1.0
B 2.0
C 3.0
Name: count, dtype: float64
This approach provides flexibility for deciding whether to drop rows, drop columns, or fill missing values based on the proportion of NaNs in each feature.
Identifying Rows or Columns with NaN Values
Sometimes you might need to identify which rows or columns contain any NaN values, rather than counting them.
1. Check for Columns with Any NaN values
To check for columns that contain at least one NaN value, use isna().any() on the DataFrame
Python
columns_with_nan = df.isna().any()
print("Columns with NaN values:")
print(columns_with_nan)
Output:
Columns with NaN values:
A True
B True
C True
dtype: bool
2. Check for Rows with any NaN Values
To check for rows that contain NaNs, use isna().any(axis=1), which checks along the row axis.
Python
rows_with_nan = df.isna().any(axis=1)
print("Rows with NaN values")
print(rows_with_nan)
Output:
Rows with NaN values
0 True
1 True
2 True
3 True
dtype: bool
Knowing how to count and locate NaNs in your data is essential for cleaning and preprocessing.