Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit ae009d8

Browse filesBrowse files
Merge branch 'feature-engine:main' into profiling_functionality
2 parents e98fe3c + feddb06 commit ae009d8
Copy full SHA for ae009d8

File tree

Expand file treeCollapse file tree

24 files changed

+550
-18
lines changed
Open diff view settings
Filter options
Expand file treeCollapse file tree

24 files changed

+550
-18
lines changed
Open diff view settings
Collapse file

‎.circleci/config.yml‎

Copy file name to clipboardExpand all lines: .circleci/config.yml
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,4 @@ workflows:
151151
filters:
152152
branches:
153153
only:
154-
- 1.5.X
154+
- 1.6.X
Collapse file

‎docs/whats_new/index.rst‎

Copy file name to clipboardExpand all lines: docs/whats_new/index.rst
+1Lines changed: 1 addition & 0 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Find out what's new in each new version release.
88
.. toctree::
99
:maxdepth: 2
1010

11+
v_160
1112
v_150
1213
v_140
1314
v_130
Collapse file

‎docs/whats_new/v_160.rst‎

Copy file name to clipboard
+94Lines changed: 94 additions & 0 deletions
  • Display the source diff
  • Display the rich diff
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
Version 1.6.X
2+
=============
3+
4+
Version 1.6.0
5+
-------------
6+
7+
Deployed: 16th March 2023
8+
9+
Contributors
10+
~~~~~~~~~~~~
11+
12+
- `Gleb Levitski <https://github.com/GLevv>`_
13+
- `Morgan Sell <https://github.com/Morgan-Sell>`_
14+
- `Alfonso Tobar <https://github.com/datacubeR>`_
15+
- `Nodar Okroshiashvili <https://github.com/Okroshiashvili>`_
16+
- `Luís Seabra <https://github.com/luismavs>`_
17+
- `Kyle Gilde <https://github.com/kylegilde>`_
18+
- `Soledad Galli <https://github.com/solegalli>`_
19+
20+
In this release, we make Feature-engine transformers compatible with the `set_output`
21+
API from Scikit-learn, which was released in version 1.2.0. We also make Feature-engine
22+
compatible with the newest direction of pandas, in removing the `inplace` functionality
23+
that our transformers use under the hood.
24+
25+
We introduce a major change: most of the **categorical encoders can now encode variables
26+
even if they have missing data**.
27+
28+
We are also releasing **3 brand new transformers**: One for discretization, one for feature
29+
selection and one for operations between datetime variables.
30+
31+
We also made a major improvement in the performance of the `DropDuplicateFeatures` and some
32+
smaller bug fixes here and there.
33+
34+
We'd like to thank all contributors for fixing bugs and expanding the functionality
35+
and documentation of Feature-engine.
36+
37+
Thank you so much to all contributors and to those of you who created issues flagging bugs or
38+
requesting new functionality.
39+
40+
New transformers
41+
~~~~~~~~~~~~~~~~
42+
43+
- **ProbeFeatureSelection**: introduces random features and selects variables whose importance is greater than the random ones (`Morgan Sell <https://github.com/Morgan-Sell>`_ and `Soledad Galli <https://github.com/solegalli>`_)
44+
- **DatetimeSubtraction**: creates new features by subtracting datetime variables (`Kyle Gilde <https://github.com/kylegilde>`_ and `Soledad Galli <https://github.com/solegalli>`_)
45+
- **GeometricWidthDiscretiser**: sorts continuous variables into intervals determined by geometric progression (`Gleb Levitski <https://github.com/GLevv>`_)
46+
47+
New functionality
48+
~~~~~~~~~~~~~~~~~
49+
50+
- Allow categorical encoders to encode variables with NaN (`Soledad Galli <https://github.com/solegalli>`_)
51+
- Make transformers compatible with new `set_output` functionality from sklearn (`Soledad Galli <https://github.com/solegalli>`_)
52+
- The `ArbitraryDiscretiser()` now includes the lowest limits in the intervals (`Soledad Galli <https://github.com/solegalli>`_)
53+
54+
New modules
55+
~~~~~~~~~~~
56+
57+
- New **Datasets** module with functions to load specific datasets (`Alfonso Tobar <https://github.com/datacubeR>`_)
58+
- New **variable_handling** module with functions to automatically select numerical, categorical, or datetime variables (`Soledad Galli <https://github.com/solegalli>`_)
59+
60+
Bug fixes
61+
~~~~~~~~~
62+
63+
- Fixed bug in `DropFeatures()` (`Luís Seabra <https://github.com/luismavs>`_)
64+
- Fixed bug in `RecursiveFeatureElimination()` caused when only 1 feature remained in data (`Soledad Galli <https://github.com/solegalli>`_)
65+
66+
Documentation
67+
~~~~~~~~~~~~~
68+
69+
- Add example code snippets to the selection module API docs (`Alfonso Tobar <https://github.com/datacubeR>`_)
70+
- Add example code snippets to the outlier module API docs (`Alfonso Tobar <https://github.com/datacubeR>`_)
71+
- Add example code snippets to the transformation module API docs (`Alfonso Tobar <https://github.com/datacubeR>`_)
72+
- Add example code snippets to the time series module API docs (`Alfonso Tobar <https://github.com/datacubeR>`_)
73+
- Add example code snippets to the preprocessing module API docs (`Alfonso Tobar <https://github.com/datacubeR>`_)
74+
- Add example code snippets to the wrapper module API docs (`Alfonso Tobar <https://github.com/datacubeR>`_)
75+
- Updated documentation using new Dataset module (`Alfonso Tobar <https://github.com/datacubeR>`_ and `Soledad Galli <https://github.com/solegalli>`_)
76+
- Reorganized Readme badges (`Gleb Levitski <https://github.com/GLevv>`_)
77+
- New Jupyter notebooks for `GeometricWidthDiscretiser` (`Gleb Levitski <https://github.com/GLevv>`_)
78+
- Fixed typos (`Gleb Levitski <https://github.com/GLevv>`_)
79+
- Remove examples using the boston house dataset (`Soledad Galli <https://github.com/solegalli>`_)
80+
- Update sponsor page and contribute page (`Soledad Galli <https://github.com/solegalli>`_)
81+
82+
83+
Deprecations
84+
~~~~~~~~~~~~
85+
86+
- The class `PRatioEncoder` is no longer supported and was removed from the API (`Soledad Galli <https://github.com/solegalli>`_)
87+
88+
Code improvements
89+
~~~~~~~~~~~~~~~~~
90+
91+
- Massive improvement in the performance (speed) of `DropDuplicateFeatures()` (`Nodar Okroshiashvili <https://github.com/Okroshiashvili>`_)
92+
- Remove `inplace` and other issues related to pandas new direction (`Luís Seabra <https://github.com/luismavs>`_)
93+
- Move most docstrings to dedicated docstrings module (`Soledad Galli <https://github.com/solegalli>`_)
94+
- Unnest tests for encoders (`Soledad Galli <https://github.com/solegalli>`_)
Collapse file

‎feature_engine/VERSION‎

Copy file name to clipboard
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.5.2
1+
1.6.0
Collapse file

‎feature_engine/datetime/datetime_subtraction.py‎

Copy file name to clipboardExpand all lines: feature_engine/datetime/datetime_subtraction.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ def _sub(self, dt_df: pd.DataFrame):
318318
new_df[new_varnames] = (
319319
dt_df[self.variables_]
320320
.sub(dt_df[reference], axis=0)
321-
.apply(lambda s: s / np.timedelta64(1, self.output_unit))
321+
.div(np.timedelta64(1, self.output_unit).astype("timedelta64[ns]"))
322322
)
323323

324324
if self.new_variables_names is not None:
Collapse file

‎feature_engine/imputation/drop_missing_data.py‎

Copy file name to clipboardExpand all lines: feature_engine/imputation/drop_missing_data.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def return_na_data(self, X: pd.DataFrame) -> pd.DataFrame:
205205
idx = pd.isnull(X[self.variables_]).mean(axis=1) >= self.threshold
206206
idx = idx[idx]
207207
else:
208-
idx = pd.isnull(X[self.variables_]).any(1)
208+
idx = pd.isnull(X[self.variables_]).any(axis=1)
209209
idx = idx[idx]
210210

211211
return X.loc[idx.index, :]
Collapse file

‎feature_engine/outliers/artbitrary.py‎

Copy file name to clipboardExpand all lines: feature_engine/outliers/artbitrary.py
+21Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,27 @@ class ArbitraryOutlierCapper(BaseOutlier):
9191
transform:
9292
Cap the variables.
9393
94+
Examples
95+
--------
96+
97+
>>> import pandas as pd
98+
>>> from feature_engine.outliers import ArbitraryOutlierCapper
99+
>>> X = pd.DataFrame(dict(x1 = [1,2,3,4,5,6,7,8,9,10]))
100+
>>> aoc = ArbitraryOutlierCapper(max_capping_dict=dict(x1 = 8),
101+
>>> min_capping_dict=dict(x1 = 2))
102+
>>> aoc.fit(X)
103+
>>> aoc.transform(X)
104+
x1
105+
0 2
106+
1 2
107+
2 3
108+
3 4
109+
4 5
110+
5 6
111+
6 7
112+
7 8
113+
8 8
114+
9 8
94115
"""
95116

96117
def __init__(
Collapse file

‎feature_engine/outliers/trimmer.py‎

Copy file name to clipboardExpand all lines: feature_engine/outliers/trimmer.py
+55Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,61 @@ class OutlierTrimmer(WinsorizerBase):
8989
transform:
9090
Remove outliers.
9191
92+
Examples
93+
--------
94+
95+
>>> import pandas as pd
96+
>>> from feature_engine.outliers import OutlierTrimmer
97+
>>> X = pd.DataFrame(dict(x = [0.49671,
98+
>>> -0.1382,
99+
>>> 0.64768,
100+
>>> 1.52302,
101+
>>> -0.2341,
102+
>>> -17.2341,
103+
>>> 1.57921,
104+
>>> 0.76743,
105+
>>> -0.4694,
106+
>>> 0.54256]))
107+
>>> ot = OutlierTrimmer(capping_method='gaussian', tail='left', fold=3)
108+
>>> ot.fit(X)
109+
>>> ot.transform(X)
110+
x
111+
0 0.49671
112+
1 -0.13820
113+
2 0.64768
114+
3 1.52302
115+
4 -0.23410
116+
5 -17.23410
117+
6 1.57921
118+
7 0.76743
119+
8 -0.46940
120+
9 0.54256
121+
122+
>>> import pandas as pd
123+
>>> from feature_engine.outliers import OutlierTrimmer
124+
>>> X = pd.DataFrame(dict(x = [0.49671,
125+
>>> -0.1382,
126+
>>> 0.64768,
127+
>>> 1.52302,
128+
>>> -0.2341,
129+
>>> -17.2341,
130+
>>> 1.57921,
131+
>>> 0.76743,
132+
>>> -0.4694,
133+
>>> 0.54256]))
134+
>>> ot = OutlierTrimmer(capping_method='mad', tail='left', fold=3)
135+
>>> ot.fit(X)
136+
>>> ot.transform(X)
137+
x
138+
0 0.49671
139+
1 -0.13820
140+
2 0.64768
141+
3 1.52302
142+
4 -0.23410
143+
6 1.57921
144+
7 0.76743
145+
8 -0.46940
146+
9 0.54256
92147
"""
93148

94149
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
Collapse file

‎feature_engine/outliers/winsorizer.py‎

Copy file name to clipboardExpand all lines: feature_engine/outliers/winsorizer.py
+42Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,48 @@ class Winsorizer(WinsorizerBase):
9797
transform:
9898
Cap the variables.
9999
100+
Examples
101+
--------
102+
103+
>>> import numpy as np
104+
>>> import pandas as pd
105+
>>> from feature_engine.outliers import Winsorizer
106+
>>> np.random.seed(42)
107+
>>> X = pd.DataFrame(dict(x = np.random.normal(size = 10)))
108+
>>> wz = Winsorizer(capping_method='mad', tail='both', fold=3)
109+
>>> wz.fit(X)
110+
>>> wz.transform(X)
111+
x
112+
0 0.496714
113+
1 -0.138264
114+
2 0.647689
115+
3 1.523030
116+
4 -0.234153
117+
5 -0.234137
118+
6 1.579213
119+
7 0.767435
120+
8 -0.469474
121+
9 0.542560
122+
123+
>>> import numpy as np
124+
>>> import pandas as pd
125+
>>> from feature_engine.outliers import Winsorizer
126+
>>> np.random.seed(42)
127+
>>> X = pd.DataFrame(dict(x = np.random.normal(size = 10)))
128+
>>> wz = Winsorizer(capping_method='mad', tail='both', fold=3)
129+
>>> wz.fit(X)
130+
>>> wz.transform(X)
131+
x
132+
0 0.496714
133+
1 -0.138264
134+
2 0.647689
135+
3 1.523030
136+
4 -0.234153
137+
5 -0.234137
138+
6 1.579213
139+
7 0.767435
140+
8 -0.469474
141+
9 0.542560
100142
"""
101143

102144
def __init__(
Collapse file

‎feature_engine/preprocessing/match_categories.py‎

Copy file name to clipboardExpand all lines: feature_engine/preprocessing/match_categories.py
+21Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,27 @@ class MatchCategories(
8888
8989
transform:
9090
Enforce the type of categorical variables as dtype `categorical`.
91+
92+
Examples
93+
--------
94+
95+
>>> import pandas as pd
96+
>>> from feature_engine.preprocessing import MatchCategories
97+
>>> X_train = pd.DataFrame(dict(x1 = ["a","b","c"], x2 = [4,5,6]))
98+
>>> X_test = pd.DataFrame(dict(x1 = ["c","b","a","d"], x2 = [5,6,4,7]))
99+
>>> mc = MatchCategories(missing_values="ignore")
100+
>>> mc.fit(X_train)
101+
>>> mc.transform(X_train)
102+
x1 x2
103+
0 a 4
104+
1 b 5
105+
2 c 6
106+
>>> mc.transform(X_test)
107+
x1 x2
108+
0 c 5
109+
1 b 6
110+
2 a 4
111+
3 NaN 7
91112
"""
92113

93114
def __init__(

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.