Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add missing values and categorical features when generating datasets #28952

Copy link
Copy link
Open
@lcrmorin

Description

@lcrmorin
Issue body actions

Describe the workflow you want to enable

I am often using random datasets (typically with make_classification). However I often find myself having to add more realistic features to the dataset:

  • missing data, sometime just to test the pipeline (missing at random would be fine), or sometimes to look for more complex phenomenons (missingnes not at random, possibly depending on the target)
  • categorical: categoricals variables often need to be handled specifically. I usually introduce categoricals with binning a continuous value, then transforming to strings.
    It would be nice to have both of those in datasets generation.

Describe your proposed solution

Introduce parameters to allow for generation of missing data (proportion of missingness, type of missingness - at random, not at random).
Introduce parameters to allow for generation of categorical features (number of features, type of repartition in categories - even - uneven - pareto.

Describe alternatives you've considered, if relevant

I usually handle this by hand.

Additional context

Could be used to illustrate imputing techniques, encoding techniques.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practicesNew Feature

    Type

    No type

    Projects

    Status

    Todo
    Show more project fields

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.