NumPy - Filtering Arrays



Filtering Arrays in NumPy

Filtering arrays in NumPy involves allows you to select and work with subsets of data based on specific conditions. This process is useful for extracting relevant data, performing conditional operations, and analyzing subsets of data.

We can perform filtering in NumPy by creating a Boolean array (mask) where each element indicates whether the corresponding element in the original array meets the specified condition. This mask is then used to index the original array, extracting the elements that satisfy the condition.

NumPy provides various ways to filter arrays through Boolean indexing and conditional operations.

Basic Filtering with Boolean Indexing

Boolean indexing allows you to filter array elements based on conditions. By applying a condition to an array, you obtain a Boolean array that you can use to index the original array.

Example

In the following example, we are filtering elements greater than the value "10" from the given array −

import numpy as np

# Creating an array
array = np.array([1, 5, 8, 12, 20, 3])

# Define the condition
condition = array > 10

# Apply the condition to filter the array
filtered_array = array[condition]

print("Original Array:", array)
print("Filtered Array (elements > 10):", filtered_array)

Following is the output obtained −

Original Array: [ 1  5  8 12 20  3]
Filtered Array (elements > 10): [12 20]

Filtering with Multiple Conditions

Filtering with multiple conditions allows you to select elements from a NumPy array that meet more than one criterion simultaneously. This is achieved by combining multiple Boolean conditions using logical operators as follows −

  • AND (&) − Select elements that satisfy both conditions.
  • OR (|) − Select elements that satisfy at least one of the conditions.
  • NOT (~) − Select elements that do not satisfy the condition.

The resulting Boolean array, representing the combined conditions, is then used to index the original array, extracting the elements that satisfy all specified criteria.

Example

In this example, we are filtering elements within a range using multiple conditions −

import numpy as np

# Creating an array
array = np.array([1, 5, 8, 12, 20, 3])

# Define multiple conditions
condition = (array > 5) & (array < 15)

# Apply the conditions to filter the array
filtered_array = array[condition]

print("Original Array:", array)
print("Filtered Array (5 < elements < 15):", filtered_array)  

This will produce the following result −

Original Array: [ 1  5  8 12 20  3]
Filtered Array (5 < elements < 15): [ 8 12]

Filtering with Functions

When filtering with functions, you generally define a function that takes array elements as input and returns a Boolean value (True or False) indicating whether each element should be included in the result.

This function is then applied to the array, and the resulting Boolean array is used to index and filter the original data.

Example: Filtering Using where() Function

In the example below, we are using the where() function to filter elements in NumPy −

import numpy as np

# Creating an array
array = np.array([1, 5, 8, 12, 20, 3])

# Define the condition
condition = array > 10

# Filter elements
filtered_indices = np.where(condition)
filtered_array = array[filtered_indices]

print("Original Array:", array)
print("Filtered Array (elements > 10) using np.where:", filtered_array)

This function returns the indices where the condition is "True". These indices are used to extract the filtered elements as shown in the output below −

Original Array: [ 1  5  8 12 20  3]
Filtered Array (elements > 10) using np.where: [12 20]

Example: Filtering with a Custom Function

Let us go through an example where we use a custom function to filter an array based on a specific criterion −

import numpy as np

# Create a NumPy array
array = np.array([10, 15, 20, 25, 30, 35])

# Define a custom function for filtering
def is_prime(num):
   """Return True if num is a prime number, False otherwise."""
   if num <= 1:
      return False
   for i in range(2, int(np.sqrt(num)) + 1):
      if num % i == 0:
         return False
   return True

# Apply the function to each element of the array
mask = np.array([is_prime(x) for x in array])

# Use the mask to filter the array
filtered_array = array[mask]

print("Original Array:", array)
print("Mask (prime numbers):", mask)
print("Filtered Array (prime numbers):", filtered_array)                                

The output obtained is as shown below −

Original Array: [10 15 20 25 30 35]
Mask (prime numbers): [False False False False False False]
Filtered Array (prime numbers): []

Filtering in Multi-dimensional Arrays

In multi-dimensional arrays, filtering can be done using Boolean indexing, similar to one-dimensional arrays. However, you need to ensure that the filtering conditions are applied appropriately to handle the dimensions of array.

Following are the steps involved for filtering in multi-dimensional arrays −

  • Define Filtering Conditions − Create Boolean conditions that apply to elements in the array. These conditions can be based on values or other criteria.
  • Apply Conditions Across Dimensions − Use these conditions to index and select elements. For multi-dimensional arrays, you may need to handle conditions for specific dimensions or apply the condition across all dimensions.

Example

Consider a 2D array where we want to filter out rows based on a condition applied to elements in a specific column −

import numpy as np

# Create a 2D NumPy array
array = np.array([[10, 20, 30],
                  [15, 25, 35],
                  [20, 30, 40]])

# Define a condition for filtering
# Select rows where the value in the second column is greater than 25
condition = array[:, 1] > 25  

# Use the condition to filter the array
filtered_array = array[condition]

print("Original Array:\n", array)
print("Condition (values in second column > 25):", condition)
print("Filtered Array:\n", filtered_array)                               

After executing the above code, we get the following output −

Original Array:
[[10 20 30]
 [15 25 35]
 [20 30 40]]
Condition (values in second column > 25): [False False  True]
Filtered Array:
 [[20 30 40]]
Advertisements