NumPy - Slicing with Boolean Arrays



Slicing with Boolean Arrays in NumPy

Slicing with Boolean arrays in NumPy allows you to select elements from an array based on a criteria. Instead of using specific indices or multiple elements, we provide a Boolean array in which True indicates the elements to be selected and False indicates those should be ignored.

This method is useful for filtering elements from the array based on conditions instead of explicit loops making it easier, cleaner and concise to apply the conditions directly to the array. It is also useful for removing invalid values and modifying elements based on a condition.

Selecting Positive Numbers

In the below example we have created a condition array > 0 which creates a boolean array [False False True True False True False]. Then this boolean array is used to select elements from the array where the condition is true and gives [1, 3, 4].

import numpy as np
array = np.array([-5, -2, 1, 3, -7, 4, -8])
positive_arr = array > 0
print("The boolean array is : ", positive_arr)
print("positive numbers :" , array[positive_arr])

Following is the output of the above code −

The boolean array is :  [False False  True  True False  True False]
positive numbers : [1 3 4]

Masking Data Based on a Condition

Let us create a 1D array with the condition arr > 30 = 0. This condition affects all elements in the array when the condition is True. This condition replaces the elements that are greater than 30 into 0.

import numpy as np
arr_1D= np.array([100, 28, 10, 34, 20, 15, 25])
arr_1D[arr_1D > 30] = 0
print("modified data :", arr_1D)

Following is the output of the above code −

modified data : [ 0 28 10  0 20 15 25]

Filtering Data Using Logical Operators

In the example below, we have an array that contains details about a company's sales in a month, and we need to find the biggest sales, which are between $2500 and $3000 or more than $5,000 using logical operators. Following is the code −

import numpy as np
data = np.array([1200, 3400, 3500, 5500, 3400, 2300, 2600, 2900, 4500])
highest_sales = data[(data >= 2500) & (data<=3000) | (data > 5000) ]
print("Highest sales in the month is :", highest_sales)

Following is the output of the above code −

Highest sales in the month is : [5500 2600 2900]
Advertisements