Open
Description
Description
While it is possible to create feature interactions with the same individual preprocessing in ColumnTransformer
via PolynomialFeatures
, I find no (convincing) solution for interactions of features with different individual preprocessing, e.g. categorical column with a continuous numerical column.
Such interactions might improve models from sklearn.linear_model
.
Code Example
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, PolynomialFeatures
df = pd.DataFrame({'a': ['red', 'red', 'blue', 'blue'],
'b': ['high', 'low', 'high', 'low'],
'x': [1, 1, 1, 2],
'y': [2, 3, 4, 2]
})
# interactions for features with no individual preprocessing works fine,
# i.e. numerical ones
column_trans = ColumnTransformer(
[('xy_num',
PolynomialFeatures(degree=2, interaction_only=True, include_bias=False),
['x', 'y'])],
remainder='drop')
column_trans.fit_transform(df)
# interactions for ohe encoded also works with helper function
cat_cat = make_pipeline(
OneHotEncoder(),
PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
)
column_trans = ColumnTransformer(
[('ab_cat', cat_cat, ['a', 'b'])],
remainder='drop')
column_trans.fit_transform(df)
Expected Results
# no clue for interactions between one-hot-encoded 'a' and 'x'
column_trans = ColumnTransformer(
[('a_x',
magic_pipeline(OneHotEncoder(), 'passthrough'),
['a', 'x'])],
remainder='drop')
Versions
sklearn version 0.21