NumPy - Structured Arrays



Structured Arrays in NumPy

A structured array in NumPy is an array where each element is a compound data type. This compound data type can consist of multiple fields, each with its own data type, similar to a table or a record.

For example, you can have an array where each element holds both a name (as a string) and an age (as an integer). This helps you to work with complex data more flexibly, as you can access and manipulate each field separately.

Creating Structured Arrays

The first step in creating a structured array is defining the data type (dtype) that specifies the structure of each element. The dtype is defined as a list of tuples or a dictionary, where each tuple or dictionary entry defines a field name and its data type.

Following are the data types available in structured arrays −

  • 'U10': Unicode string of length 10
  • 'i4': 4-byte integer
  • 'f8': 8-byte floating point number
  • 'b': Boolean value

Using a List of Tuples

You can define the dtype and create the structured array using a list of tuples, where each tuple represents a field. Each tuple contains two elements: the first element is the name of the field, and the second element is the data type of that field.

Example

In the following example, we are defining a structured array with fields for "name", "age", and "height" using a specified dtype. We then create this array with corresponding data −

import numpy as np

# Define the dtype
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]

# Define the data
data = [('Alice', 30, 5.6), ('Bob', 25, 5.8), ('Charlie', 35, 5.9)]

# Create the structured array
structured_array = np.array(data, dtype=dtype)

print("Structured Array:\n", structured_array)

Following is the output obtained −

Structured Array:
[('Alice', 30, 5.6) ('Bob', 25, 5.8) ('Charlie', 35, 5.9)]

Using a Dictionary

Alternatively, you can define the data and dtype using a dictionary to clearly specify the names and types of fields. Each key in the dictionary represents a field name, and the value associated with each key defines the data type of that field.

Example

In this example, we are defining the dtype for a structured array using a dictionary format to specify fields for "name", "age", and "height". We then create and display this structured array with the corresponding data, organizing it into a format that supports multiple data types within each record −

import numpy as np

# Define the dtype using a dictionary
dtype = np.dtype([('name', 'U10'), ('age', 'i4'), ('height', 'f4')])

# Define the data
data = [('Alice', 30, 5.6), ('Bob', 25, 5.8), ('Charlie', 35, 5.9)]

# Create the structured array
structured_array = np.array(data, dtype=dtype)

print("Structured Array from Dictionary:\n", structured_array)

This will produce the following result −

Structured Array from Dictionary:
[('Alice', 30, 5.6) ('Bob', 25, 5.8) ('Charlie', 35, 5.9)]

Accessing Fields in Structured Arrays

You can access individual fields in a structured array using field names. This is done by indexing the array with the field name as a string.

Example: Accessing Individual Fields

In the example below, we are defining a structured array with fields for 'name', 'age', and 'height', and then accessing each of these fields separately −

import numpy as np

# Define a dtype and data for a structured array
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]
data = [('Alice', 30, 5.6), ('Bob', 25, 5.8), ('Charlie', 35, 5.9)]
structured_array = np.array(data, dtype=dtype)

# Access the 'name' field
names = structured_array['name']
print("Names:", names)

# Access the 'age' field
ages = structured_array['age']
print("Ages:", ages)

# Access the 'height' field
heights = structured_array['height']
print("Heights:", heights)

Following is the output of the above code −

Names: ['Alice' 'Bob' 'Charlie']
Ages: [30 25 35]
Heights: [5.6 5.8 5.9]

Example: Accessing Rows

You can access specific rows of the structured array using indexing. This allows you to retrieve complete records. Here, we retrieve the first and second rows of the structured array −

import numpy as np

# Define a dtype and data for a structured array
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]
data = [('Alice', 30, 5.6), ('Bob', 25, 5.8), ('Charlie', 35, 5.9)]
structured_array = np.array(data, dtype=dtype)

# Access the first row
first_row = structured_array[0]
print("First Row:", first_row)

# Access the second row
second_row = structured_array[1]
print("Second Row:", second_row)

Following is the output of the above code −

First Row: ('Alice', 30, 5.6)
Second Row: ('Bob', 25, 5.8)

Modifying Fields of Structured Arrays

You can modify the values of individual fields in the structured array by indexing and assigning new values to them.

To add new fields to a structured array, you can use a combination of np.concatenate() function and creating a new dtype that includes the additional fields.

NumPy does not support adding fields directly to an existing structured array.

Example: Updating Fields

In the example below, we are updating the 'age' field of the first record in a structured array by directly assigning a new value −

import numpy as np

# Define a dtype and data for a structured array
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]
data = [('Alice', 30, 5.6), ('Bob', 25, 5.8), ('Charlie', 35, 5.9)]
structured_array = np.array(data, dtype=dtype)

# Update the 'age' of the first record
structured_array[0]['age'] = 31
print("Updated Structured Array:\n", structured_array)

The output obtained is as shown below −

Updated Structured Array:
[('Alice', 31, 5.6) ('Bob', 25, 5.8) ('Charlie', 35, 5.9)]

Example: Adding New Fields

Here, we are extending a structured array by adding a new field, 'weight', to its dtype and updating the data to include this field −

import numpy as np

# Define a dtype and data for the original structured array
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4')]
data = [('Alice', 30, 5.6), ('Bob', 25, 5.8), ('Charlie', 35, 5.9)]
structured_array = np.array(data, dtype=dtype)

# Define a new dtype with an additional field 'weight'
new_dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f4'), ('weight', 'f4')]

# Define new data including the additional field
new_data = [('Alice', 30, 5.6, 55.0), ('Bob', 25, 5.8, 70.0), ('Charlie', 35, 5.9, 80.0)]

# Create a new structured array with the additional field
new_structured_array = np.array(new_data, dtype=new_dtype)
print("New Structured Array with Additional Field:\n", new_structured_array)

After executing the above code, we get the following output −

New Structured Array with Additional Field:
 [('Alice', 30, 5.6, 55.) ('Bob', 25, 5.8, 70.) ('Charlie', 35, 5.9, 80.)]

Sorting Structured Arrays

Sorting structured arrays in NumPy means arranging the elements of an array based on the values of one or more fields.

Since structured arrays have multiple fields, sorting can be based on the values in these fields. For example, you might sort an array of people by their age or height.

Example

In the following example, we are sorting a structured array based on the 'age' field by first obtaining the indices that would arrange the ages in ascending order. We then use these indices to reorder the entire array −

import numpy as np

# Define a structured array
dtype = [('name', 'U10'), ('age', 'i4')]
data = [('Alice', 30), ('Bob', 25), ('Charlie', 35)]
structured_array = np.array(data, dtype=dtype)

# Sort the array by 'age'
sorted_indices = np.argsort(structured_array['age'])
sorted_array = structured_array[sorted_indices]
print("Sorted by Age:\n", sorted_array)

The result produced is as follows −

Sorted by Age:
[('Bob', 25) ('Alice', 30) ('Charlie', 35)]

Filtering Structured Arrays

Filtering structured arrays involves applying conditions to one or more fields and retrieving elements that satisfy these conditions.

This is useful when you want to retrieve records that meet certain criteria, such as extracting all entries where a specific field exceeds a threshold or matches a certain value.

Example

In this example, we are filtering a structured array to include only the records where the 'age' field is greater than 30 −

import numpy as np

# Define a structured array
dtype = [('name', 'U10'), ('age', 'i4')]
data = [('Alice', 30), ('Bob', 25), ('Charlie', 35)]
structured_array = np.array(data, dtype=dtype)

# Filter array for ages greater than 30
filtered_array = structured_array[structured_array['age'] > 30]
print("Filtered Array (Age > 30):\n", filtered_array)

We get the output as shown below −

Filtered Array (Age > 30):[('Charlie', 35)]

Combining Structured Arrays

Combining structured arrays involves merging or concatenating arrays that have a defined dtype with named fields. In NumPy, this can be done using the np.concatenate() function.

Example

In the example below, we are combining two structured arrays with the same dtype into a single array using np.concatenate() function −

import numpy as np

# Define two structured arrays
dtype = [('name', 'U10'), ('age', 'i4')]
data1 = [('Alice', 30), ('Bob', 25)]
data2 = [('Charlie', 35), ('Dave', 40)]
structured_array1 = np.array(data1, dtype=dtype)
structured_array2 = np.array(data2, dtype=dtype)

# Combine the arrays
combined_array = np.concatenate((structured_array1, structured_array2))
print("Combined Structured Array:\n", combined_array)

This results in a new structured array that includes all the records from both original arrays as shown below −

Combined Structured Array:
[('Alice', 30) ('Bob', 25) ('Charlie', 35) ('Dave', 40)]
Advertisements