Variable Selection using Python — Vote based approach

3 min read

May 8, 2018

Variable selection is one of the key process in predictive modeling process. It is an art. To put is simple terms, variable selection is like picking a soccer team to win the World cup. You need to have the best player in each position and you don’t want two or many players who plays the same position.

In python, we have different techniques to select variables. Some of them include Recursive feature elimination, Tree based selection, L1 based feature selection. The scikit-learn documentation provided below walks through these techniques.

1.13. Feature selection - scikit-learn 0.19.1 documentation

As an example, suppose that we have a dataset with boolean features, and we want to remove all features that are either…

scikit-learn.org

Vote based approach for variable selection

The idea here is to apply a variety of techniques to select variables. When a algorithm picks a variable, we give a vote for the variable. At the end, we calculate the total votes for each variables and then pick the best ones based on votes. This way, we end up picking the best variables with minimum effort in the variable selection process.

Github Code

Sundar0989/Variable-Selection-Using-Python

Contribute to Variable-Selection-Using-Python development by creating an account on GitHub.

github.com

The following process happens during variable selection

Information Value using Weight of evidence
Variable Importance using Random Forest
Recursive Feature Elimination
Variable Importance using Extra trees classifier
Chi Square best variables
L1 based feature selection

Once these process are completed, we pick the best variables selected by each algorithm (vote). We count the total number of votes and then perform multicollinearity check on the variables selected. We can extend the process further to include other variable selection techniques before we count the vote.

Have fun!

Also, please look at my other article which uses this code in a end to end python modeling framework.

End to End — Predictive model using Python framework

Predictive modeling is always a fun task. The major time spent is to understand what the business needs and then frame…

medium.com

I released a python package which does variable selection using vote based approach. If you are interested to use the package version read the article below.

Introducing Xverse! — A python package for feature selection and transformation

Xverse short for X Universe is a python package for machine learning to assist Data Scientists with feature…

towardsdatascience.com