NumPy - Vectorized Operations with Datetimes



NumPy Vectorized Operations with Datetimes

Vectorized operations in NumPy allow you to perform operations on entire arrays of data without the need for explicit loops.

When dealing with datetime data, NumPy's vectorized operations enable you to perform time-based calculations across entire arrays of datetime values at once, without the need for manually iterating over each element.

Using the datetime64 type, you can perform various arithmetic and comparison operations across datetime arrays, such as adding or subtracting time intervals, comparing dates, or performing conditional operations.

Adding or Subtracting Time Intervals

One of the most common operations with datetime data is adding or subtracting time intervals. NumPy allows you to perform these operations in a vectorized manner, meaning you can add or subtract time deltas from an entire array of datetime values at once.

To add or subtract a time interval, you use the timedelta64 object, which represents a time difference. This object can be added to or subtracted from a datetime64 object to shift the date or time by the specified interval.

Example

In this example, we are adding 5 days to each date in a datetime array −

import numpy as np

# Define a datetime array
dates = np.array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[D]')

# Define a time delta of 5 days
time_delta = np.timedelta64(5, 'D')

# Add the time delta to the datetime array
new_dates = dates + time_delta

print(new_dates)

Following is the output obtained −

['2024-01-06' '2024-01-07' '2024-01-08']

Subtracting Dates and Calculating Differences

Another common operation is calculating the difference between two dates, which results in a timedelta64 object. This is useful when you need to find the time difference between two points in time, such as the number of days between two dates.

In NumPy, you can subtract one datetime array from another to get an array of timedeltas, representing the difference between corresponding dates in the arrays.

Example

In this example, we calculate the difference between two dates in a datetime array −

import numpy as np

# Define two datetime arrays
dates1 = np.array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[D]')
dates2 = np.array(['2024-01-04', '2024-01-05', '2024-01-06'], dtype='datetime64[D]')

# Subtract the arrays to get the difference
time_diff = dates2 - dates1

print(time_diff)

The output will show the differences in days −

[3 3 3]

Comparing Dates in a Vectorized Manner

NumPy allows you to perform element-wise comparisons between datetime arrays, enabling you to filter or analyze data based on time conditions. Vectorized comparison operations can be used to compare datetime values to a fixed point in time or to each other.

You can compare datetime arrays using standard comparison operators, such as >, <, >=, <=, ==, and !=, which return a boolean array indicating whether the condition is met for each element.

Example

In this example, we filter dates that are greater than a specific date using vectorized comparison −

import numpy as np

# Define a datetime array
dates = np.array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[D]')

# Define the filter condition (dates greater than '2024-01-02')
filtered_dates = dates[dates > np.datetime64('2024-01-02')]

print(filtered_dates)

This will produce the following result −

['2024-01-03']

Vectorized Operations with Timedelta Arrays

In addition to working with datetime arrays, you can also perform vectorized operations with timedelta64 arrays, which represent differences between datetime values. These operations are useful when working with durations or intervals of time.

You can perform arithmetic operations, such as addition or subtraction, on timedelta arrays to calculate the total duration between multiple time intervals, or you can compare them to other time intervals.

Example

In this example, we add two timedelta arrays to get the total duration −

import numpy as np

# Define two timedelta arrays
delta1 = np.array([np.timedelta64(5, 'D'), np.timedelta64(10, 'D')], dtype='timedelta64[D]')
delta2 = np.array([np.timedelta64(2, 'D'), np.timedelta64(3, 'D')], dtype='timedelta64[D]')

# Add the timedelta arrays
total_delta = delta1 + delta2

print(total_delta)

Following is the output of the above code −

[ 7 13]

Working with Different Time Units

NumPy supports a variety of time units, including years, months, days, hours, minutes, and seconds. You can perform vectorized operations with datetime arrays across different time units, depending on your needs.

This is particularly useful when dealing with data that spans multiple time scales or when you need to convert between different units.

Example

In this example, we work with a datetime array and a timedelta array with different time units −

import numpy as np

# Define a datetime array
dates = np.array(['2024-01-01', '2024-01-02', '2024-01-03'], dtype='datetime64[D]')

# Define a timedelta array with hours
timedelta = np.array([np.timedelta64(10, 'h'), np.timedelta64(5, 'h'), np.timedelta64(20, 'h')])

# Add the timedelta array to the datetime array
new_dates = dates + timedelta

print(new_dates)

After executing the above code, we get the following output −

['2024-01-01T10' '2024-01-02T05' '2024-01-03T20']
Advertisements