Description
Research
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/a/71803558/277716
Question about pandas
I linked a specific complete answer at stackoverflow which tackles the problem of deriving the equivalent of pandas.core.window.rolling.Rolling.max
but the window is an arbitrary column of integers in the same dataframe; however: even if that solution strives to be vectorized: it's extremely slow to the point of becoming unusable for large dataframes compared to the basic case of a constant window size; I suspect it may be impossible to be fast because SIMD hardware may prefer a constant nature of window size.
However: I wonder if the devs of the pandas software itself may have ideas of how to do that since they are the ones who have coded the extremely fast (vectorized) pandas.core.window.rolling.Rolling.max
.
It would normally be a feature request for pandas.DataFrame.rolling
to accept arbitrary integers from a column in the dataframe as a window
but I don't know if it's even performant to do that.
Bug related to later comments below
import pandas as pd
from pandas import api
import numpy as np
class MultiWindowIndexer(api.indexers.BaseIndexer):
def __init__(self, window):
self.window = np.array(window)
super().__init__()
def get_window_bounds(self, num_values, min_periods, center, closed):
end = np.arange(num_values, dtype='int64') + 1
start = np.clip(end - self.window, 0, num_values)
return start, end
np.random.seed([3,14])
a = np.random.randn(20).cumsum()
w = np.minimum(
np.random.randint(1, 4, size=a.shape),
np.arange(len(a))+1
)
df = pd.DataFrame({'Data': a, 'Window': w})
df['max1'] = df.Data.rolling(MultiWindowIndexer(df.Window)).max(engine='cython')
print(df)
Expected outcome: index 18 max1 should be -1.487828 instead of -1.932612
source of code and further discussion on the bug at stackoverflow