Description
Hey guys,
This is a proposal to add confidence intervals to linear models in scikit-learn.
This would be useful for people because stats-models only works on small datasets and is not as user friendly.
Useful in situations where one has to put more trust the estimated probabilities. Particularly where very low FP rate or FN rate is desired.
This is a rough example of how to do it with Logistic Regression with a test on some text data from twenty newsgroups:
https://gist.github.com/lqdc/1ea1682ad1214956d95904ebde3134a5
There are some limitations:
Standard Errors have to be estimated on validation set or some other non-training set. n-fold X-validation on train set would work.
Doesn't work well on data that is far from normally distributed, so wouldn't work on sparse/non-scaled data.