Confidence intervals for linear models

Hey guys,

This is a proposal to add confidence intervals to linear models in scikit-learn.

This would be useful for people because stats-models only works on small datasets and is not as user friendly.
Useful in situations where one has to put more trust the estimated probabilities. Particularly where very low FP rate or FN rate is desired.

This is a rough example of how to do it with Logistic Regression with a test on some text data from twenty newsgroups:
https://gist.github.com/lqdc/1ea1682ad1214956d95904ebde3134a5

There are some limitations:
Standard Errors have to be estimated on validation set or some other non-training set. n-fold X-validation on train set would work.

Doesn't work well on data that is far from normally distributed, so wouldn't work on sparse/non-scaled data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Confidence intervals for linear models #6773

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Confidence intervals for linear models #6773

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions