
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Replace NaN Values with Average of Columns in Python
In this article we will see method to replace NaN (Not a Number) value with the average of columns. If we talk about data analysis then handling NaN value is very crucial step. So, here you will learn about various methods using which we can replace NaN (Not a Number) value with the average of columns.
Method 1: Using Numpy.nanmean().
Example
import numpy as np arr = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]]) col_means = np.nanmean(arr, axis=0) arr_filled = np.where(np.isnan(arr), col_means, arr) print("Column mean: ",col_means) print("Final array: \n", arr_filled)
Output
Column mean: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Explanation
Here in the above example we use the numpy.nanmean() function to calculate the column mean value of the numpy array along with specific axis (here axis=0 means columns). To identify the NaN value in the array we used numpy.isnan() function and using numpy.where() we replaced the NaN value with the columns means. arr_filled is the resultant value after replacing the NaN value with column means.
Method 2: Using Traversal and Column Mean.
Example
import numpy as np arr = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]]) for i in range(arr.shape[1]): column = arr[:, i] column_mean = np.nanmean(column) column[np.isnan(column)] = column_mean print("Column mean: ",column_mean) print("Final array: \n", arr)
Output
Column mean: 2.5 Column mean: 5.0 Column mean: 7.5 Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Explanation
Here in the above example we traversed using loop through each column in the NumPy array. For every column we calculate the column mean using that column with the mean value. We assigned the value of column_mean to column[np.isnan(column)].
Method 3: Using Numpy.nan_to_num() and Numpy.mean().
Example
import pandas as pd import numpy as np arr = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]]) col_means = np.nanmean(arr, axis=0) arr_filled = np.nan_to_num(arr, nan=col_means) print("Column mean: ",col_means) print("Final array: \n", arr_filled)
Output
Column mean: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Explanation
Here in the above example we used the numpy.nan_to_num() method which is used to replace NaN values with any value by passing the column means as value which we want to replace. In the arr_filled resultant the replaced column values will be there in the place of NaN.
Method 4: Numpy.apply_along_axis() and Column Mean.
Example
import pandas as pd import numpy as np arr = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]]) col_means = np.nanmean(arr, axis=0) def replace_nan(column): column[np.isnan(column)] = np.nanmean(column) return column arr_filled = np.apply_along_axis(replace_nan, axis=0, arr=arr) print("Column mean: ",col_means) print("Final array: \n", arr_filled)
Output
Column mean: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Explanation
Here in the above example we used the numpy.apply_along_axis() method to apply replace_nan() function on every column of the NumPy array with specific axis (here axis=0 means columns). The replace_nan() function here replaces the NaN value in every column with the column mean.
Method 5: Numpy.nanmean() and Fancy Indexing.
Example
import pandas as pd import numpy as np arr = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]]) col_means = np.nanmean(arr, axis=0) mask = np.isnan(arr) arr[mask] = col_means[np.newaxis, :].repeat(arr.shape[0], axis=0)[mask] print("Column mean: ",col_means) print("Final array: \n", arr)
Output
Column mean: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Explanation
Here in the above example we used the numpy.repeat() function to repeat the column means with the NumPy row array for matching the shape of the original array. Then we use fancy indexing to replace NaN values with the column mean in the array. This process requires no extra space as it performs the modification in-pleace.
Method 6: Numpy.nanmean() and Broadcasting.
Example
import pandas as pd import numpy as np arr = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]]) col_means = np.nanmean(arr, axis=0) mask = np.isnan(arr) arr[mask] = col_means print("Column mean: ",col_means) print("Final array: \n", arr)
Output
Column mean: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Explanation
Here in the above example we used the broadcasting method to replace the NaN value with the column means in the NumPy array. In the program mask variable is created to identify the NaN values and we assign the column means value to the desired location in the array.
So, we get to know about different methods using which we can replace the NaN value with the average of columns in the NumPy array. Every method provides us with unique approach to replace the NaN value. You can choose any method according to your requirement and ease of use.