Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Support orthogonal polynomial features (via QR decomposition) in PolynomialFeatures #31223

Copy link
Copy link
Open
@cottnich

Description

@cottnich
Issue body actions

Describe the workflow you want to enable

I want to introduce support for orthogonal polynomial features via QR decomposition in PolynomialFeatures, closely mirroring the behavior of R's poly() function.

In regression modeling, using orthogonal polynomials can often lead to improved numerical stability and reduced multi-collinearity among polynomial terms

As an example of what the difference looks like in R,

#fits raw polynomial data without an orthogonal basis
model_raw <- lm(y ~ I(x) + I(x^2) + I(x^3), data = data)
#model_raw <- lm(y ~poly(x,3,raw=TRUE), data = data)

#fits the same degree-3 polynomial using an orthogonal basis
model_poly <- lm(y ~ poly(x, 3), data = data)

This behavior cannot currently be replicated with scikit-learn's PolynomialFeatures, which only produces the raw monomial terms. As a result transitioning from R to Python often leads to discrepancies in model behavior and performance.

Describe your proposed solution

I propose extending PolynomialFeatures with a new parameter:

PolynomialFeatures(..., method="raw")

Accepted values:

  • "raw" (default): retains existing behavior, returning standard raw terms
  • "qr": applies QR decomposition to each feature to generate orthogonal polynomial features.

Because R's poly() only operates on 1D input vectors, my thought was to apply QR decomposition feature by feature when the input is multi-dimensional. Each column is processed independently, mirroring R's approach.

This feature would interact with other parameters as follows:

  • include_bias: When method="qr", The orthogonal polynomial basis inherently includes a transformed first column. However, this column is not a plain column of ones. Therefore, the concept of include_bias=True (which appends a column of ones) becomes redundant or misleading in this context. One option is to always set include_bias=False if method=qr and always return orthogonal columns only, or raise a warning.

  • interaction_only: This would be incompatible with method="qr" since the QR-based transformation does not naturally support selective inclusion of interaction terms.

Describe alternatives you've considered, if relevant

Currently, users must implement QR decomposition manually when orthogonal polynomials are needed. This is a common pattern in statistical workflows but lacks "off the shelf" support in any major python library. This feature would eliminate the need to do this decomposition manually and would improve workflows for researchers who are used to R's statistical tools.

Additional context

This idea stemmed from a broader effort to convert statistical modeling pipelines from R to python, where discrepencies in regression results were traced to the lack of orthogonal polynomial support in PolynomialFeatures.

I have drafted and tested a 1D implementation of this feature but wanted feedback on whether this idea aligns with scikit-learn's scope before moving on. In particular, I'd appreciate input on

  • Acceptability of feature-wise orthogonalization for multi-feature input.
  • Preferred parameter naming (e.g., method="qr" vs. orthogonal=True).
  • Compatibility decisions around parameters like include_bias and interaction_only.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.