Replace `np.nanstd` with Faster `rolling_nanstd` #273

seanlaw · 2020-10-26T18:50:07Z

When computing a standard deviation with np.nanstd, it is both computationally and memory intensive for large arrays. There is a faster implementation using convolutions:

def nanstd_1d(a, w):
    k = np.ones(w, dtype=int)
    m = ~np.isnan(a)
    a0 = np.where(m, a,0)
    
    n = np.convolve(m,k,'valid')
    c1 = np.convolve(a0, k,'valid')
    f2 = c1**2
    p2 = f2/n**2
    f1 = np.convolve((a0**2)*m,k,'valid')+n*p2
    out = np.sqrt((f1 - (2/n)*f2)/n)
    return out


def rolling_nanstd(a, m, axis=0):
    axis = axis - 1  # Account for rolling
    return np.apply_along_axis(lambda a_row, m: nanstd_1d(a_row, m), axis=axis, arr=a, m=m)

T = np.random.rand(100)
m = 50
out_std = rolling_nanstd(T, m)


T = np.random.rand(5, 1_000_000)
m = 50
out_std = rolling_nanstd(T, m, axis=T.ndim)

This implementation is around 5x faster than np.nanstd and uses about 5-10x less memory

The text was updated successfully, but these errors were encountered:

seanlaw · 2020-10-26T19:17:05Z

@mexxexx FYI

mihailescum · 2020-10-26T19:49:29Z

Looks interesting! It looks like the computation is done using the regular variance computation s_n = (sum x_i^2 - N mu) / N which suffers from catastrophic cancellation (I think that's what it' called) if the variance is small compared to the mean. Can't say how much of a problem this is.

Maybe one could have a look at the method described here, which might be pretty fast if implemented using numba, since it would be one giant for loop.

seanlaw · 2020-10-26T23:44:25Z

Yeah, the catastrophic cancellation concerns me.

On the flip side, the article that you pointed to uses Welford'd method which I now have experience with and can probably parallelize it with Numba.

The only thing that I haven't thought through is how to handle the NaN values. At least, I can verify the results. Do you know how this works in np.nanstd?

Do you have a simple test case that might cause catastrophic cancellation?

seanlaw · 2020-10-26T23:46:38Z

Btw, how were you able to identify that it was using variance? I couldn't actually decipher the formulation since it was using convolutions.

seanlaw · 2020-10-27T02:56:35Z

I have this roughly:

def _welford_variance(a, w, a_isfinite):
    all_window_variances = np.empty(a.shape[0] - w + 1)
    
    last_window_mean = np.nanmean(a[:w])
    last_window_variance = np.nanstd(a[:w])
    last_window_variance = last_window_variance ** 2
    all_window_variances[0] = last_window_variance
    
    if np.sum(a_isfinite[:w]) == 0:
        last_window_mean = 0.0
        last_window_variance = 0.0
    
    for last_idx, curr_idx in enumerate(range(w, a.shape[0])):
        curr_denom = np.sum(a_isfinite[last_idx + 1: last_idx + 1 + w])
        last_denom = np.sum(a_isfinite[last_idx: last_idx + w])
        last_mean = last_window_mean
        
        if not a_isfinite[curr_idx] and not a_isfinite[last_idx]:
            curr_mean = last_mean * last_denom / curr_denom
            #last_window_variance += ??
        elif not a_isfinite[curr_idx]:
            curr_mean = last_mean * last_denom / curr_denom - a[last_idx] / curr_denom
            #last_window_variance += ??
        elif not a_isfinite[last_idx]:
            curr_mean = last_mean * last_denom / curr_denom + a[curr_idx] / curr_denom
            #last_window_variance += ???
        else:
            curr_mean = last_mean + (a[curr_idx] - a[last_idx]) / curr_denom
            last_window_variance += (a[curr_idx] - a[last_idx]) * (a[curr_idx] - curr_mean + a[last_idx] - last_mean) / curr_denom
        
        last_window_mean = curr_mean
        all_window_variances[last_idx + 1] = last_window_variance
        
    return all_window_variances

def welford_nanstd(a, w):
    a_isfinite = np.isfinite(a)
    
    return np.sqrt(_welford_variance(a, w, a_isfinite))

if __name__ == "__main__":
    T = np.random.rand(10)
    m = 5
    out = welford_nanstd(T, m)

This works when there are no NaN in the input data but, otherwise, I don't know how to handle it when there are NaN in the data. Additionally, all of the curr_mean values should be correct. @mexxexx Any chance you can figure out what to do here?

seanlaw · 2020-10-27T04:14:18Z

Pandas' rolling variance implementation may be useful here as a reference

mihailescum · 2020-10-27T08:13:43Z

Btw, how were you able to identify that it was using variance?

Some experience with convolutions 😄 Basically, convolving with a sequence of m ones, as done there, is equal to a rolling sum of window size m (write it down and you will see it immediately). So f2 is a rolling sum squared, which is the first term in he naive way to calculate the variance and starting from there you can figure out what the other terms do. If you need further explanation I can help!

Do you have a simple test case that might cause catastrophic cancellation?

According to wikipedia, one could take the following test case: (10^9 + 4, 10^9 + 7, 10^9 + 13, 10^9 + 16). The variance should be 30, but the naive approach will completely fail (giving −170.66666666666666 in their case, but that could depend on the implementation, so I would just calculate the error and see how large it is).

The only thing that I haven't thought through is how to handle the NaN values. At least, I can verify the results. Do you know how this works in np.nanstd?

I don't know welfords algorithm to well, but my idea would go like this: Treat a NaN value as 0 and reduce the number of observations used by one (effectively ignoring the new datum). On wikipedia there is one implementation in python, which I adapted for NaNs. I didn't try it out, so it might be incomplete, but maybe you can see where I'm going.

# For a new value newValue, compute the new count, new mean, the new M2.
# mean accumulates the mean of the entire dataset
# M2 aggregates the squared distance from the mean
# count aggregates the number of samples seen so far
def update(existingAggregate, newValue):
    (count, mean, M2) = existingAggregate
    if (not np.isnan(newValue):
        count += 1
    else:
        newValue = 0
    delta = newValue - mean

    if (count != 0):
        mean += delta / count
    else:
        mean = np.nan
    delta2 = newValue - mean
    M2 += delta * delta2
    return (count, mean, M2)

# Retrieve the mean, variance and sample variance from an aggregate
def finalize(existingAggregate):
    (count, mean, M2) = existingAggregate
    if count < 2:
        return np.nan
    else:
        (mean, variance, sampleVariance) = (mean, M2 / count, M2 / (count - 1))
        return (mean, variance, sampleVariance)

seanlaw · 2020-10-27T17:04:25Z

Btw, thanks for the info above!

Okay, I think I understand the math and should be able to make this work. I'm deriving it from scratch right now...

seanlaw · 2020-11-06T21:47:40Z

@mexxexx I've been working on this and I can't seem to get the math to work nicely. It appears that the rolling window only works for me if the number of data points used to compute the mean and variance in the last window is the same in the current window. In other words, if your previous window contained one or more np.nan then it is not possible to re-use the variance from the last window to compute the variance of the current window.

seanlaw · 2020-11-07T02:29:14Z

Okay, I thought about it more and realized that I don't actually have to make the math work out nicely. Instead, I just need to detect when and where np.nan is present. So, this hybrid Welford method is actually faster than the FFT version:

@njit(fastmath={'nsz', 'arcp', 'contract', 'afn', 'reassoc'})
def _welford_variance(a, w, subseq_a_isfinite):
    all_window_variances = np.empty(a.shape[0] - w + 1)
    
    last_window_mean = np.nanmean(a[:w])
    last_window_variance = np.nanvar(a[:w])
    all_window_variances[0] = last_window_variance
        
    for last_idx, curr_idx in enumerate(range(w, a.shape[0])):
        last_mean = last_window_mean
        
        if not subseq_a_isfinite[last_idx] or not subseq_a_isfinite[last_idx+1]:
            curr_mean = np.nanmean(a[last_idx+1:last_idx+1+w])
            last_window_variance = np.nanvar(a[last_idx+1:last_idx+1+w])
        else:
            curr_mean = last_mean + (a[curr_idx] - a[last_idx]) / w
            last_window_variance += (a[curr_idx] - a[last_idx]) * (a[curr_idx] - curr_mean + a[last_idx] - last_mean) / w
        
        last_window_mean = curr_mean
        all_window_variances[last_idx + 1] = last_window_variance
        
    return all_window_variances

def welford_nanstd(a, w):
    subseq_a_isfinite = np.all(np.isfinite(stumpy.core.rolling_window(a, w)), axis=1)
    
    return np.sqrt(_welford_variance(a, w, subseq_a_isfinite))

T, m = np.random.rand(1000000), 50
# T, m = np.random.rand(20), 10
T[1] = np.nan
T[10] = np.nan
T[13:18] = np.nan
start = time.time()
welford_std = welford_nanstd(T, m)
print(time.time() - start)
start = time.time()
np_std = np.nanstd(stumpy.core.rolling_window(T, m), axis=T.ndim)
print(time.time() - start)
npt.assert_almost_equal(welford_std, np_std)

Additionally, there's an opportunity to make this parallelized by splitting up the array into chunks.

seanlaw · 2020-11-09T21:31:18Z

I can also confirm that while the original FFT nanstd suffers from catastrophic cancellation, the Welford method is safe!

mihailescum · 2020-11-10T07:04:24Z

Great to hear! Do you have any timings to compare it to the original rolling approach?
I don't think this is a time critical part, so probably parallelization would be an overkill, but just out of interest.

seanlaw · 2020-11-10T13:37:25Z

Yes! For a time series with T = np.random.rand(10_000_000)

Welford:  0.46 seconds
np.nanstd:  4.96 seconds
FFT:  1.10 seconds

So, Welford is faster than FFT and around 10x faster than np.nanstd. And for a longer time series with T = np.random.rand(100_000_000)

Welford:  5.31 seconds
np.nanstd:  N/A
FFT:  13.77 seconds

For the longer time series, Welford required ~4.75GB of memory while FFT requires ~8.42GB (np.nanstd required too much memory).

In case you were wondering, the timing was about the same when I inserted a bunch of np.nan into the time series so this is actually a really performant algorithm that also does not have catastrophic cancellation problems!

seanlaw · 2020-11-10T18:01:32Z

I don't think this is a time critical part, so probably parallelization would be an overkill, but just out of interest.

You're absolutely right and your point is well taken. We can certainly parallelize this by splitting up the time series into chunks/sections equal to the number of threads and then we'd be even faster on longer time series.

mihailescum · 2020-11-11T07:42:35Z

I'm impressed by the timings! Great that we have something fast and stable implemented, because I remember this variance computation giving me headaches on large time serieses 😄

seanlaw · 2020-11-11T12:12:27Z

Yeah, I was surprised too. Mind you, the timing does not include the overhead to compile the function which is about 0.1 seconds. So, for small (less than 1 million) data sets, it may be worth using np.nanstd instead. I did try to implement parallelization and it compiled and produced the correct result but it wasn't faster. I suspect that there was something that was blocking the pure parallelization. Numba seems to be particular about these things.

seanlaw self-assigned this Oct 26, 2020

seanlaw added the refactor label Oct 26, 2020

seanlaw closed this as completed in e9150ae Oct 26, 2020

seanlaw added a commit that referenced this issue Nov 10, 2020

Fixed #273 Replaced with Welford nanstd

0056eb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace `np.nanstd` with Faster `rolling_nanstd` #273

Replace `np.nanstd` with Faster `rolling_nanstd` #273

seanlaw commented Oct 26, 2020 •

edited

Loading

seanlaw commented Oct 26, 2020

Uh oh!

mihailescum commented Oct 26, 2020

Uh oh!

seanlaw commented Oct 26, 2020 •

edited

Loading

Uh oh!

seanlaw commented Oct 26, 2020

Uh oh!

seanlaw commented Oct 27, 2020 •

edited

Loading

Uh oh!

seanlaw commented Oct 27, 2020

Uh oh!

mihailescum commented Oct 27, 2020 •

edited

Loading

Uh oh!

seanlaw commented Oct 27, 2020

Uh oh!

seanlaw commented Nov 6, 2020

Uh oh!

seanlaw commented Nov 7, 2020 •

edited

Loading

Uh oh!

seanlaw commented Nov 9, 2020

Uh oh!

mihailescum commented Nov 10, 2020

Uh oh!

seanlaw commented Nov 10, 2020 •

edited

Loading

Uh oh!

seanlaw commented Nov 10, 2020 •

edited

Loading

Uh oh!

mihailescum commented Nov 11, 2020

Uh oh!

seanlaw commented Nov 11, 2020

Uh oh!

Replace np.nanstd with Faster rolling_nanstd #273

Replace np.nanstd with Faster rolling_nanstd #273

Comments

seanlaw commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

seanlaw commented Oct 26, 2020

Uh oh!

mihailescum commented Oct 26, 2020

Uh oh!

seanlaw commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanlaw commented Oct 26, 2020

Uh oh!

seanlaw commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanlaw commented Oct 27, 2020

Uh oh!

mihailescum commented Oct 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanlaw commented Oct 27, 2020

Uh oh!

seanlaw commented Nov 6, 2020

Uh oh!

seanlaw commented Nov 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanlaw commented Nov 9, 2020

Uh oh!

mihailescum commented Nov 10, 2020

Uh oh!

seanlaw commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seanlaw commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihailescum commented Nov 11, 2020

Uh oh!

seanlaw commented Nov 11, 2020

Uh oh!

Replace `np.nanstd` with Faster `rolling_nanstd` #273

Replace `np.nanstd` with Faster `rolling_nanstd` #273

seanlaw commented Oct 26, 2020 •

edited

Loading

seanlaw commented Oct 26, 2020 •

edited

Loading

seanlaw commented Oct 27, 2020 •

edited

Loading

mihailescum commented Oct 27, 2020 •

edited

Loading

seanlaw commented Nov 7, 2020 •

edited

Loading

seanlaw commented Nov 10, 2020 •

edited

Loading

seanlaw commented Nov 10, 2020 •

edited

Loading