
- NumPy - Home
- NumPy - Introduction
- NumPy - Environment
- NumPy Arrays
- NumPy - Ndarray Object
- NumPy - Data Types
- NumPy Creating and Manipulating Arrays
- NumPy - Array Creation Routines
- NumPy - Array Manipulation
- NumPy - Array from Existing Data
- NumPy - Array From Numerical Ranges
- NumPy - Iterating Over Array
- NumPy - Reshaping Arrays
- NumPy - Concatenating Arrays
- NumPy - Stacking Arrays
- NumPy - Splitting Arrays
- NumPy - Flattening Arrays
- NumPy - Transposing Arrays
- NumPy Indexing & Slicing
- NumPy - Indexing & Slicing
- NumPy - Indexing
- NumPy - Slicing
- NumPy - Advanced Indexing
- NumPy - Fancy Indexing
- NumPy - Field Access
- NumPy - Slicing with Boolean Arrays
- NumPy Array Attributes & Operations
- NumPy - Array Attributes
- NumPy - Array Shape
- NumPy - Array Size
- NumPy - Array Strides
- NumPy - Array Itemsize
- NumPy - Broadcasting
- NumPy - Arithmetic Operations
- NumPy - Array Addition
- NumPy - Array Subtraction
- NumPy - Array Multiplication
- NumPy - Array Division
- NumPy Advanced Array Operations
- NumPy - Swapping Axes of Arrays
- NumPy - Byte Swapping
- NumPy - Copies & Views
- NumPy - Element-wise Array Comparisons
- NumPy - Filtering Arrays
- NumPy - Joining Arrays
- NumPy - Sort, Search & Counting Functions
- NumPy - Searching Arrays
- NumPy - Union of Arrays
- NumPy - Finding Unique Rows
- NumPy - Creating Datetime Arrays
- NumPy - Binary Operators
- NumPy - String Functions
- NumPy - Matrix Library
- NumPy - Linear Algebra
- NumPy - Matplotlib
- NumPy - Histogram Using Matplotlib
- NumPy Sorting and Advanced Manipulation
- NumPy - Sorting Arrays
- NumPy - Sorting along an axis
- NumPy - Sorting with Fancy Indexing
- NumPy - Structured Arrays
- NumPy - Creating Structured Arrays
- NumPy - Manipulating Structured Arrays
- NumPy - Record Arrays
- Numpy - Loading Arrays
- Numpy - Saving Arrays
- NumPy - Append Values to an Array
- NumPy - Swap Columns of Array
- NumPy - Insert Axes to an Array
- NumPy Handling Missing Data
- NumPy - Handling Missing Data
- NumPy - Identifying Missing Values
- NumPy - Removing Missing Data
- NumPy - Imputing Missing Data
- NumPy Performance Optimization
- NumPy - Performance Optimization with Arrays
- NumPy - Vectorization with Arrays
- NumPy - Memory Layout of Arrays
- Numpy Linear Algebra
- NumPy - Linear Algebra
- NumPy - Matrix Library
- NumPy - Matrix Addition
- NumPy - Matrix Subtraction
- NumPy - Matrix Multiplication
- NumPy - Element-wise Matrix Operations
- NumPy - Dot Product
- NumPy - Matrix Inversion
- NumPy - Determinant Calculation
- NumPy - Eigenvalues
- NumPy - Eigenvectors
- NumPy - Singular Value Decomposition
- NumPy - Solving Linear Equations
- NumPy - Matrix Norms
- NumPy Element-wise Matrix Operations
- NumPy - Sum
- NumPy - Mean
- NumPy - Median
- NumPy - Min
- NumPy - Max
- NumPy Set Operations
- NumPy - Unique Elements
- NumPy - Intersection
- NumPy - Union
- NumPy - Difference
- NumPy Random Number Generation
- NumPy - Random Generator
- NumPy - Permutations & Shuffling
- NumPy - Uniform distribution
- NumPy - Normal distribution
- NumPy - Binomial distribution
- NumPy - Poisson distribution
- NumPy - Exponential distribution
- NumPy - Rayleigh Distribution
- NumPy - Logistic Distribution
- NumPy - Pareto Distribution
- NumPy - Visualize Distributions With Sea born
- NumPy - Matplotlib
- NumPy - Multinomial Distribution
- NumPy - Chi Square Distribution
- NumPy - Zipf Distribution
- NumPy File Input & Output
- NumPy - I/O with NumPy
- NumPy - Reading Data from Files
- NumPy - Writing Data to Files
- NumPy - File Formats Supported
- NumPy Mathematical Functions
- NumPy - Mathematical Functions
- NumPy - Trigonometric functions
- NumPy - Exponential Functions
- NumPy - Logarithmic Functions
- NumPy - Hyperbolic functions
- NumPy - Rounding functions
- NumPy Fourier Transforms
- NumPy - Discrete Fourier Transform (DFT)
- NumPy - Fast Fourier Transform (FFT)
- NumPy - Inverse Fourier Transform
- NumPy - Fourier Series and Transforms
- NumPy - Signal Processing Applications
- NumPy - Convolution
- NumPy Polynomials
- NumPy - Polynomial Representation
- NumPy - Polynomial Operations
- NumPy - Finding Roots of Polynomials
- NumPy - Evaluating Polynomials
- NumPy Statistics
- NumPy - Statistical Functions
- NumPy - Descriptive Statistics
- NumPy Datetime
- NumPy - Basics of Date and Time
- NumPy - Representing Date & Time
- NumPy - Date & Time Arithmetic
- NumPy - Indexing with Datetime
- NumPy - Time Zone Handling
- NumPy - Time Series Analysis
- NumPy - Working with Time Deltas
- NumPy - Handling Leap Seconds
- NumPy - Vectorized Operations with Datetimes
- NumPy ufunc
- NumPy - ufunc Introduction
- NumPy - Creating Universal Functions (ufunc)
- NumPy - Arithmetic Universal Function (ufunc)
- NumPy - Rounding Decimal ufunc
- NumPy - Logarithmic Universal Function (ufunc)
- NumPy - Summation Universal Function (ufunc)
- NumPy - Product Universal Function (ufunc)
- NumPy - Difference Universal Function (ufunc)
- NumPy - Finding LCM with ufunc
- NumPy - ufunc Finding GCD
- NumPy - ufunc Trigonometric
- NumPy - Hyperbolic ufunc
- NumPy - Set Operations ufunc
- NumPy Useful Resources
- NumPy - Quick Guide
- NumPy - Cheatsheet
- NumPy - Useful Resources
- NumPy - Discussion
- NumPy Compiler
NumPy - Handling Missing Data
Handling Missing Data in Arrays
Handling missing data is a common challenge in data analysis and processing. Missing data in arrays can arise due to various reasons, such as incomplete data collection, errors during data entry, or intentional omission.
In NumPy and data analysis, dealing with missing values involves identifying, handling, and processing them effectively to ensure data integrity and accurate results.
Identifying Missing Data
To handle missing data, the very first step is to identify it. In NumPy, missing values are often represented as np.nan in floating-point arrays. You can use specific functions such as np.isnan() to detect these missing values.
Example
In the following example, we create an array with missing values represented by np.nan. We then use the np.isnan() function to create a mask that identifies these missing values −
import numpy as np # Creating an array with missing values arr = np.array([1, 2, np.nan, 4, np.nan, 6]) # Checking for missing values is_nan = np.isnan(arr) print("Array with Missing Values:\n", arr) print("Missing Value Mask:\n", is_nan)
Following is the output obtained −
Array with Missing Values: [ 1. 2. nan 4. nan 6.] Missing Value Mask: [False False True False True False]
Removing Missing Data
Removing missing data involves eliminating parts of your dataset where data is missing.
In NumPy, you can use boolean indexing to exclude NaN values from arrays. For example, creating a mask that identifies missing values and then using it to filter out those values.
Example
In this example, we start with an array that contains missing values represented by "np.nan". We then remove these missing values using boolean indexing using the np.isnan() function to filter out the np.nan entries −
import numpy as np # Creating an array with missing values arr = np.array([1, 2, np.nan, 4, np.nan, 6]) # Removing missing values cleaned_arr = arr[~np.isnan(arr)] print("Original Array:\n", arr) print("Array with Missing Values Removed:\n", cleaned_arr)
This will produce the following result −
Original Array: [ 1. 2. nan 4. nan 6.] Array with Missing Values Removed: [1. 2. 4. 6.]
Replacing Missing Data
Replacing missing data means filling in the gaps where data is missing with a substitute value. In NumPy, you can use the np.nan_to_num() function to replace NaN values with a specific number, such as zero or the mean of the other values. Following is the syntax −
numpy.nan_to_num(x, copy=True, nan=0.0, posinf=None, neginf=None)
Where,
- x: The input array containing NaN values, infinities, or other numerical values.
- copy: A boolean indicating whether to make a copy of the array (True by default). If False, the operation may be performed in place.
- nan: The value to replace NaN values with. The default is 0.0.
- posinf: The value to replace positive infinity (inf) with. If not specified, it defaults to a very large number.
- neginf: The value to replace negative infinity (-inf) with. If not specified, it defaults to a very small (negative) number.
Example
In the example below, we create an array that contains missing values represented by "np.nan". We then replace these missing values with zero using the np.nan_to_num() function, which fills np.nan entries with the specified value −
import numpy as np # Creating an array with missing values arr = np.array([1, 2, np.nan, 4, np.nan, 6]) # Replacing missing values with zero filled_arr = np.nan_to_num(arr, nan=0) print("Original Array:\n", arr) print("Array with Missing Values Replaced:\n", filled_arr)
Following is the output of the above code −
Original Array: [ 1. 2. nan 4. nan 6.] Array with Missing Values Replaced: [1. 2. 0. 4. 0. 6.]
Interpolating Missing Data
Interpolating missing data involves estimating and filling in missing values within a dataset based on the surrounding data.
Instead of replacing missing values with a constant like the mean, interpolation predicts what the missing value should be by analyzing the trend or pattern in the data.
For example, if a value is missing between "4" and "8", interpolation might estimate it as "6".
Example
In the following example, we handle an array with missing values (np.nan) by applying linear interpolation using "interp1d" from SciPy. This function estimates and fills the missing values based on the non-missing data, resulting in a complete array −
import numpy as np from scipy.interpolate import interp1d # Creating an array with missing values arr = np.array([1, 2, np.nan, 4, np.nan, 6]) # Creating an index array indices = np.arange(len(arr)) # Creating a mask for non-missing values mask = ~np.isnan(arr) # Performing linear interpolation interp_func = interp1d(indices[mask], arr[mask], kind='linear', fill_value='extrapolate') filled_arr = interp_func(indices) print("Original Array:\n", arr) print("Array with Interpolated Missing Values:\n", filled_arr)
The output obtained is as shown below −
Original Array: [ 1. 2. nan 4. nan 6.] Array with Interpolated Missing Values: [1. 2. 3. 4. 5. 6.]