Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upTransformative get_feature_names for various transformers #6425
Comments
|
@jnothman May I try this? |
|
On which (family of) estimator? |
|
Hi @jnothman, I am interested in taking this issue. Could you please suggest how I can get started on this issue? |
|
I'll handle implementing this for |
|
@jnothman I'll modify the
Is feature agglomeration here refering to |
|
@yenchenlin1994 I assume so? |
|
@nelson-liu Thx! If so, I would also love to implement it for |
|
I have added an extended list of transformers where this may apply and noted the default feature naming convention (though maybe its generation belongs in |
|
Hello @jnothman , What should
Maybe |
|
Fair question, which I don't currently have an answer for. One option is for it to just return |
|
Oh and even if |
|
yes, trivial, as noted in the issue description On 24 February 2016 at 00:01, Yen notifications@github.com wrote:
|
|
Oh okay! Thanks for your clarification. |
|
Hello @jnothman ,
Do you mean all classes listed here: It seems that all these classes may be put into |
|
Yes, I mean those. On 24 February 2016 at 17:44, Yen notifications@github.com wrote:
|
|
Hi everyone, if it is fine I too would like to work on this issue. Would be helpful if the estimators which are currently worked on could be mentioned, so that I can try something which does not overlap. Thanks ! |
|
I think I can also work on
@maniteja123 from |
|
@yenchenlin1994, thanks for letting me know. |
|
@jnothman It would be of great help if you could confirm if the output for PCA needs to have shape |
|
I'm really not sure about PCA. Try make something useful. If you think it On 25 February 2016 at 01:12, Maniteja Nandana notifications@github.com
|
|
Thanks for the reply. My doubt is mainly about choosing dominant features and also that all the components are not equally significant. Since multiple features can have almost same contribution along a component, there might be need for some threshold to figure out the number of input features to be considered. Anyway I will create a initial PR with just the most dominant feature along the component and continue the discussion there. Hope it is fine. |
|
@yenchenlin1994 one more question. I am not sure how to handle this for
If you have already started working, will be waiting for your PRs :) Thanks ! |
Can you get rid of that with a |
|
@kmike Are the structured annotations mostly needed because of the ranges? Also, @GaelVaroquaux any more opinions on this? I doing the easy cases like feature selection and imputation (which might drop columns), in addition to having some support in FeatureUnion and Pipeline (and ColumnTransformer) will be very useful. |
|
@amueller once feature names get more complex (e.g. pca on top of tf*idf), showing them as a text gets more and more opinionated, and maybe problem-specific. How concise should be a feature name, e.g. should we show only top PCA components (how many?), or all of them? Note the amount of bikeshedding @jnothman got from me at TeamHG-Memex/eli5#208. It seems the root of the problem is that formatting a feature name is not the same as figuring out where feature comes from. This is where structured representation helps. Full information - all PCA components, or (start, end) ranges in case of text vectorizers - can be excessive for a default feature name, but it allows richer display: highlighting features in text, showing the rest of the components on mouse hover / click. |
|
@kmike thanks for the explanation :) Maybe doing strings first would still work. For PCA I would just basically do |
|
@jnothman @GaelVaroquaux should we include this in the "Townhall" meeting? |
|
feature names / DataFrames come together to some extent, so yes.
|
|
@jnothman @GaelVaroquaux should we include this in the "Townhall" meeting?
Yes
|
|
Any news ? |
|
i think |


#6372 adds
get_feature_namestoPolynomialFeatures. It accepts a list of names ofinput_features(or substitutes with defaults) and constructs feature name strings that are human-readable and informative. Similar support should be available for other transformers, including feature selectors, feature agglomeration,FunctionTransformer, and perhaps even PCA (giving the top contributors to each component).FeatureUnionshould be modified to handle the case where an argument is supplied. A proposal for support inPipelineis given in #6424.Modelled on #6372, each enhancement can be contributed as a separate PR. Note that default names for features are
[x0, x1, ...]PolynomialFeatures@amueller #6372FunctionTransformer@nelson-liu #6431FunctionTransformer#6431Binarizer#6431OneHotEncoder#6441FeatureUnion@yenchenlin1994SparseRandomProjection?GaussianRandomProjection??