Open
Description
hello, it would be useful if the HistGradientBoostingRegressor or HistGradientBoostingClassifier model had the ability to avoid data shuffling when using the early_stopping and validation_fraction parameters, since maintaining data order is a basic requirement in case you work with TimeSeries
if sample_weight is None:
X_train, X_val, y_train, y_val = train_test_split(
X,
y,
test_size=self.validation_fraction,
stratify=stratify,
random_state=self._random_seed,
)
sample_weight_train = sample_weight_val = None
else:
# TODO: incorporate sample_weight in sampling here, as well as
# stratify
(
X_train,
X_val,
y_train,
y_val,
sample_weight_train,
sample_weight_val,
) = train_test_split(
X,
y,
sample_weight,
test_size=self.validation_fraction,
stratify=stratify,
random_state=self._random_seed,
)
Describe your proposed solution
it would be sufficient to add an additional parameter to control whether or not to shuffle the data
Describe alternatives you've considered, if relevant
No response
Additional context
No response