Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

QST/Feature/Bug: On the performance of a rolling window operation when the window is a column of arbitrary integers #46716

Copy link
Copy link
Closed
@epigramx

Description

@epigramx
Issue body actions

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/a/71803558/277716

Question about pandas

I linked a specific complete answer at stackoverflow which tackles the problem of deriving the equivalent of pandas.core.window.rolling.Rolling.max but the window is an arbitrary column of integers in the same dataframe; however: even if that solution strives to be vectorized: it's extremely slow to the point of becoming unusable for large dataframes compared to the basic case of a constant window size; I suspect it may be impossible to be fast because SIMD hardware may prefer a constant nature of window size.

However: I wonder if the devs of the pandas software itself may have ideas of how to do that since they are the ones who have coded the extremely fast (vectorized) pandas.core.window.rolling.Rolling.max.

It would normally be a feature request for pandas.DataFrame.rolling to accept arbitrary integers from a column in the dataframe as a window but I don't know if it's even performant to do that.

Bug related to later comments below

import pandas as pd
from pandas import api
import numpy as np

class MultiWindowIndexer(api.indexers.BaseIndexer):
    def __init__(self, window):
        self.window = np.array(window)
        super().__init__()

    def get_window_bounds(self, num_values, min_periods, center, closed):
        end = np.arange(num_values, dtype='int64') + 1
        start = np.clip(end - self.window, 0, num_values)
        return start, end

np.random.seed([3,14])
a = np.random.randn(20).cumsum()
w = np.minimum(
    np.random.randint(1, 4, size=a.shape),
    np.arange(len(a))+1
)

df = pd.DataFrame({'Data': a, 'Window': w})

df['max1'] = df.Data.rolling(MultiWindowIndexer(df.Window)).max(engine='cython')

print(df)

Expected outcome: index 18 max1 should be -1.487828 instead of -1.932612

source of code and further discussion on the bug at stackoverflow

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs TriageIssue that has not been reviewed by a pandas team memberIssue that has not been reviewed by a pandas team memberUsage Question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.