-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Modifications to LinearRegression documentation. #22561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Modifications to LinearRegression documentation. #22561
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @thomasoliveira !
@@ -13,7 +13,7 @@ value. | ||
|
||
.. math:: \hat{y}(w, x) = w_0 + w_1 x_1 + ... + w_p x_p | ||
|
||
Across the module, we designate the vector :math:`w = (w_1, | ||
Across the module, we designate the vector :math:`(w_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have 2 options:
- Still assign
w = (w_1, ...)
because this is used in the penalty terms. Then we need to addw_0
in many places. - Adjust the penalty terms, because we can't use
w
anymore as the intercept is not penalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current documentation is formulated around X
being a design matrix (including the constant 1 column), which implies that w_0
is in the w
vector. In general, I think using X
as the design matrix is consistent with literature around linear regression.
With that in mind, I prefer option 2 where we rewrite the penalty terms with summations that do not include the w_0
term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we should state that by :math:`Xw` we mean w_0 + w_1 x_1 + ...
. A priori, I'm fine with all ways as long as we are consistent throughout the user guide (within chapter linear models).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With option 2, how do we write an L2 penalty? Should we define a new intercept-less vector? With which name? (I would use \tilde{w}
, but this only works in math mode not in Python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion which option to choose as long as it is correct and consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using \tilde{w}
would be a nice solution from a LaTeX notation perspective. If we want to keep it simple, we could also name the vector u
, however that would be a departure from other literature (not a bad one imo). I prefer the prior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lorentzenchr @thomasjpfan @Micky774 Regarding this, I feel \tilde{w} is better option for Penalty term. I can take this forward from here if everyone agrees and I can make a new PR with all the changes.
Reference Issues/PRs
Fixes #22551.
What does this implement/fix? Explain your changes.
From the documentation of https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares, one could understand that LinearRegression fits a model where the intercept$w_0$ is absent, that is, where $w_0=0$ .
To prevent this misinterpretation, the present PR rephrases:
Any other comments?