Transformative get_feature_names for various transformers #6425

jnothman · Feb 23, 2016

yenchenlin · Feb 23, 2016

@jnothman May I try this?

jnothman · Feb 23, 2016

On which (family of) estimator?

iamved · Feb 23, 2016

Hi @jnothman, I am interested in taking this issue. Could you please suggest how I can get started on this issue?

nelson-liu · Feb 23, 2016

I'll handle implementing this for FunctionTransformer for now, and we'll see if there's more classes to implement this in after I'm done :)

yenchenlin · Feb 23, 2016

@jnothman I'll modify the FeatureUnion.

including feature selectors, feature agglomeration, FunctionTransformer, and perhaps even PCA

Is feature agglomeration here refering to cluster.FeatureAgglomeration?

nelson-liu · Feb 23, 2016

@yenchenlin1994 I assume so?

yenchenlin · Feb 23, 2016

@nelson-liu Thx!

If so, I would also love to implement it for cluster.FeatureAgglomeration.

jnothman · Feb 23, 2016

I have added an extended list of transformers where this may apply and noted the default feature naming convention (though maybe its generation belongs in utils)

yenchenlin · Feb 23, 2016

Hello @jnothman ,

What should preprocessing.Normalizer do when input_features passed into get_feature_names is None?

PolynomialFeatures doesn't suffer from this since it set both self.n_input_features_ and self.n_output_features_ during fit().

Maybe preprocessing.Normalizer should set self.n_input_features_ too during fit()?

jnothman · Feb 23, 2016

Fair question, which I don't currently have an answer for. One option is for it to just return feature_names even if that means returning None.

yenchenlin · Feb 23, 2016

Oh and even if input_features passed into get_feature_names of preprocessing.Normalizer is not None,
I guess what it can do is to return feature_names, which is the same with input_features in this case?

jnothman · Feb 23, 2016

yes, trivial, as noted in the issue description

On 24 February 2016 at 00:01, Yen notifications@github.com wrote:

Oh and even if input_features passed into get_feature_names of
preprocessing.Normalizer is not None,
I guess what it can only do is to return feature_names, which is the same
with input_features in this case?

—
Reply to this email directly or view it on GitHub
#6425 (comment)
.

yenchenlin · Feb 23, 2016

Oh okay!
I will also do scalars, normalizers and imputers and Binarizer.
Will send a PR right away.

Thanks for your clarification.

yenchenlin · Feb 24, 2016

Hello @jnothman ,
about

feature selection and randomized L1

Do you mean all classes listed here:
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection

It seems that all these classes may be put into Pipeline and therefore need get_feature_names too.
Please correct me if I'm wrong. Thanks!

jnothman · Feb 24, 2016

Yes, I mean those.

On 24 February 2016 at 17:44, Yen notifications@github.com wrote:

Hello @jnothman https://github.com/jnothman ,
about

feature selection and randomized L1

Do you mean all classes listed here:

http://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection

It seems that all these classes may be put into Pipeline and therefore
need get_feature_names too.
Please correct me if I'm wrong. Thanks!

—
Reply to this email directly or view it on GitHub
#6425 (comment)
.

maniteja123 · Feb 24, 2016

Hi everyone, if it is fine I too would like to work on this issue. Would be helpful if the estimators which are currently worked on could be mentioned, so that I can try something which does not overlap. Thanks !

yenchenlin · Feb 24, 2016

I think I can also work on

feature selection and randomized L1

@maniteja123 from PCA to the end of the issue description is not yet done

maniteja123 · Feb 24, 2016

@yenchenlin1994, thanks for letting me know.

maniteja123 · Feb 24, 2016

@jnothman It would be of great help if you could confirm if the output for PCA needs to have shape n_components where each element is the input feature having the maximum contribution. Should the case of multiple features having high contribution along one component be handled ? Thank you !

jnothman · Feb 24, 2016

I'm really not sure about PCA. Try make something useful. If you think it
will be helpful to users to have names for projection-style features,
submit a PR. There is definitely a component of art to this.

On 25 February 2016 at 01:12, Maniteja Nandana notifications@github.com
wrote:

@jnothman https://github.com/jnothman It would be of great help if you
could confirm if the output for PCA needs to have shape n_components
where each element is the input feature having the maximum variance. Should
the case of multiple features having high variance along one component be
handled ? Thank you !

—
Reply to this email directly or view it on GitHub
#6425 (comment)
.

maniteja123 · Feb 24, 2016

Thanks for the reply. My doubt is mainly about choosing dominant features and also that all the components are not equally significant. Since multiple features can have almost same contribution along a component, there might be need for some threshold to figure out the number of input features to be considered. Anyway I will create a initial PR with just the most dominant feature along the component and continue the discussion there. Hope it is fine.

maniteja123 · Feb 24, 2016

@yenchenlin1994 one more question. I am not sure how to handle this for SparseRandomProjection and GaussianRandomProjection. Have you worked already on all of these ?

feature selection and randomized L1
feature agglomeration
FeatureUnion

If you have already started working, will be waiting for your PRs :) Thanks !

amueller · Dec 12, 2017

I would like to build a transformer which selects (or excludes) features by name.

Can you get rid of that with a ColumnTransformer? I guess the question is a bit whether it's always possible to have the ColumnTransformer be right at the beginning of the pipeline, where we still know the names / positions of the columns.

amueller · Dec 12, 2017

@kmike Are the structured annotations mostly needed because of the ranges?

Also, @GaelVaroquaux any more opinions on this? I doing the easy cases like feature selection and imputation (which might drop columns), in addition to having some support in FeatureUnion and Pipeline (and ColumnTransformer) will be very useful.

kmike · Dec 12, 2017

@amueller once feature names get more complex (e.g. pca on top of tf*idf), showing them as a text gets more and more opinionated, and maybe problem-specific. How concise should be a feature name, e.g. should we show only top PCA components (how many?), or all of them? Note the amount of bikeshedding @jnothman got from me at TeamHG-Memex/eli5#208.

It seems the root of the problem is that formatting a feature name is not the same as figuring out where feature comes from.

This is where structured representation helps. Full information - all PCA components, or (start, end) ranges in case of text vectorizers - can be excessive for a default feature name, but it allows richer display: highlighting features in text, showing the rest of the components on mouse hover / click.

amueller · Dec 14, 2017

@kmike thanks for the explanation :) Maybe doing strings first would still work. For PCA I would just basically do pca1, pca2 etc for now (aka punt)

amueller · Jun 2, 2018

@jnothman @GaelVaroquaux should we include this in the "Townhall" meeting?

jnothman · Jun 3, 2018

feature names / DataFrames come together to some extent, so yes.

GaelVaroquaux · Jun 6, 2018

@jnothman @GaelVaroquaux should we include this in the "Townhall" meeting?

Yes

kiros32 · Mar 3, 2019

Any news ?

adrinjalali · Mar 3, 2019

@kiros32 #13307

tgy · May 1, 2020

i think OrdinalEncoder can be added to this list

jnothman added Easy Enhancement Need Contributor labels Feb 23, 2016

This was referenced Feb 23, 2016

[MRG] ENH Add get_feature_names for various transformers #6431

Open

[MRG] ENH Add get_feature_names for Binarizer #6432

Closed

nelson-liu mentioned this issue Feb 23, 2016

feature: add get_feature_names() and tests to FunctionTransformer #6436

Closed

yenchenlin mentioned this issue Feb 24, 2016

[MRG] ENH Add get_feature_names for OneHotEncoder #6441

Closed

maniteja123 mentioned this issue Feb 24, 2016

[MRG] Add get_feature_names to PCA #6445

Open

jnothman added Question Stalled and removed Easy Need Contributor labels Jul 18, 2017

amueller mentioned this issue Nov 21, 2017

add get_feature_names to CategoricalEncoder #10181

Closed

amueller mentioned this issue Dec 12, 2017

Imputer to maintain missing collumns #8613

Open

amueller mentioned this issue Jun 1, 2018

[MRG] Add experimental.ColumnTransformer #9012

Merged

eyadsibai mentioned this issue Jun 7, 2018

Add get_feature_names() method scikit-learn-contrib/category_encoders#79

Closed

jorisvandenbossche mentioned this issue Jun 27, 2018

[MRG] Add get_feature_names to OneHotEncoder #10198

Merged

amueller mentioned this issue Nov 7, 2018

Cannot get feature names after ColumnTransformer #12525

Closed

adrinjalali added this to To do in Sample/Feature/Target props Oct 21, 2019

ageron mentioned this issue Nov 4, 2019

Ch2: returning a dataframe after the ColumnTransformer ageron/handson-ml#507

Closed

amueller moved this from Design phase to PR phase in Andy's pets May 13, 2020

cmarmo added API and removed Question Waiting for Reviewer labels Aug 10, 2020

cmarmo removed the Stalled label Aug 22, 2020

amueller linked a pull request that will close this issue Sep 16, 2020

RFC Implement Pipeline get feature names #12627

Open

0 of 3 tasks complete

NicolasHug mentioned this issue Sep 23, 2020

Add a get_transformed_matrix_feature_names to ColumnTransformer #18439

Closed

thomasjpfan linked a pull request that will close this issue Sep 23, 2020

ENH Implements get_feature_names_out for transformers #18444

Open

scikit-learn / scikit-learn

Transformative get_feature_names for various transformers #6425

Transformative get_feature_names for various transformers #6425

jnothman commented Feb 23, 2016 •

edited

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

iamved commented Feb 23, 2016

nelson-liu commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

nelson-liu commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

yenchenlin commented Feb 24, 2016

jnothman commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

yenchenlin commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

jnothman commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

amueller commented Dec 12, 2017

amueller commented Dec 12, 2017

kmike commented Dec 12, 2017 •

edited

amueller commented Dec 14, 2017 •

edited

amueller commented Jun 2, 2018

jnothman commented Jun 3, 2018

GaelVaroquaux commented Jun 6, 2018

kiros32 commented Mar 3, 2019

adrinjalali commented Mar 3, 2019

tgy commented May 1, 2020

scikit-learn / scikit-learn

Join GitHub today

Transformative get_feature_names for various transformers #6425

Transformative get_feature_names for various transformers #6425

Comments

jnothman commented Feb 23, 2016 • edited

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

iamved commented Feb 23, 2016

nelson-liu commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

nelson-liu commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

jnothman commented Feb 23, 2016

yenchenlin commented Feb 23, 2016

yenchenlin commented Feb 24, 2016

jnothman commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

yenchenlin commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

jnothman commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

maniteja123 commented Feb 24, 2016

amueller commented Dec 12, 2017

amueller commented Dec 12, 2017

kmike commented Dec 12, 2017 • edited

amueller commented Dec 14, 2017 • edited

amueller commented Jun 2, 2018

jnothman commented Jun 3, 2018

GaelVaroquaux commented Jun 6, 2018

kiros32 commented Mar 3, 2019

adrinjalali commented Mar 3, 2019

tgy commented May 1, 2020

Essential cookies

Always active

Analytics cookies

jnothman commented Feb 23, 2016 •

edited

kmike commented Dec 12, 2017 •

edited

amueller commented Dec 14, 2017 •

edited